Appending data to an existing table in Snowflake
Data can be appended to Relational Database Management System tables (also known as SQL-based data targets). Alternative techniques are possible for appending data within Paxata and other data targets; however, the purpose of this document is to demonstrate SQL-based appending. This document will demonstrate:
- How to manually append Paxata data to an existing table in Snowflake and,
- How to append Paxata data to an existing table in Snowflake through using Automated Project Flows (APF).
Paxata version 2020.2 (although previous versions will work as this functionality has been supported for many releases), and the SaaS version of Snowflake as December 9, 2020.
- Query the Snowflake table that you want to append; in this case, we’re appending the "DEMO_DB"."CALLUM"."2019_Q4_WebCampaigns" table. Currently, this table contains 337,799 rows:
- Within Paxata, add a new connection configuration with the Snowflake credentials. For our example, we’re calling this connector config “Snowflake - Append”:
- Importantly select the checkbox in this configuration and deselect the option called “Automatically create table.”
- Go to the Library then Data Sources and add a new Data Source. Make sure to select the Connector type of the previously created configuration Snowflake - Append. We’re naming this new data source Snowflake (Append),
- Click Test Data Source, and confirm success with the below message:
- Create a new project:
- Select the data source for the data. For our example, we selected a SQL Server data source; although this is a different RDBMS from Snowflake, it could be the same or a non-relational data source like SFTP, S3, Salesforce, or a combination of multiple sources. Complete the project and create a Lens called Q3_data_to_append:
- Manually publish the Q3_data_to_append lens, which creates a new item in the library:
- Select Export when hovering over the library item:
- Select the Snowflake (Append) connector and then make sure to enter the name of the table you want to append. For this example, we’re using the table we specified in step 1 so we select the schema "DEMO_DB"."CALLUM," enter the table name “2019_Q4_WebCampaigns,” and then click Export:
- If you view the Export Logs, you see the data has been exported successfully to the Export Destination Snowflake (Append):
- Review the Snowflake UI to check that the data has been appended. In this case, we see there are now 671,734 rows (337,799 rows originally + 333,935 rows from the manual append step):
- Now let’s look at Automated Project Flows (APF). I have added a step to the existing project to always output 100 rows to make it easier to see the updated row counts. In the Snowflake table currently there are 671,734 rows, after the next two runs of APF there will be 671,834 rows and then 671,934 rows. To set up the project flow click on the Create Project Flow:
- Set up the Project Flow. (In this demonstration we set it for every 10 minutes, but you can set it to whatever latency is appropriate.)
- Make sure to tell Paxata to re-import the dataset on each run. You do this by clicking on the Inputs tab and selecting Reimport dataset on run. (Optionally, you could click Configure reimport options but in this case it is not necessary as everything is the same.)
- Lastly, choose the existing Snowflake table to append. (Note that this can be a new table, but for our purposes we’re demonstrating how to append to an existing table.) To choose an existing table, click Configure Lens and select the Library and Export option in the dropdown. Select the Snowflake (Append) datasource and then enter the File Path. In this case it will be /DEMO_DB/CALLUM and the table is the one we’ve been appending: 2019_Q4_WebCampaigns.
- Now monitor Automated Project Flows (APF) by clicking the orange three-circle icon:
This will open this view of the APF:
- After waiting for two minutes, you can see that it has been run:
- And in the Snowflake row count UI, we see the row count is 671,834 rows as expected:
Testing table updates
- Now we can manually insert new data into the SQL Server table, called “Web_Campaign_Data_Daily_Extract,” to emulate it being updated on some kind of schedule outside of Paxata. After inserting the data, wait 10 minutes for the job to run again:
- When we rerun the snowflake query we see we have 671,934 rows:
If you have any questions/suggestions/feedback about this article, you can send me a PM in the community, @calamari, or send me email at email@example.com.