cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental Refresh / Appending / Stacking Data in Paxata

Incremental Refresh / Appending / Stacking Data in Paxata

Morning,

Quick question. I was wondering if Paxata has the capability to perform an incremental refresh. I have a set of excel files I receive every month and would like to stack it on a monthly basis. This would be considered an incremental refresh. Unfortunately, I haven't figured out a way to complete this task in Paxata. 

It doesn't seem paxata supports this. I tried creating a standard excel file then adding a version, but that didn't work. Then tried creating a foundation data set then created another data set which was set to automate (new data) and then appending that on the foundational data set within a project hoping that project would keep all the appended data. However, no dice there as well.

Any thoughts?
Labels (1)
13 Replies

Hello,
I may need a bit of clarification in order to answer your question - are you importing the Excel files locally or from a shared area like Sharepoint?  Paxata does not support the automation of local file imports into our Library (don't want the server reaching out to individual desktops) but we do support automated import from enterprise sources (SFTP, Sharepoint, S3, WASB, databases, etc.).  If you load the latest version of the Excel file into the Library, either on demand if a local file or on a schedule if enterprise system then you can create a project which starts with the original file and Appends the updated file.  When there is a new updated file in the Library you will notice a "refresh datasets" button at the bottom of the Steps panel in the project turns green.  You can select this option and then choose to update the latest version of any new datasets in the project.  You can also set automation options to use "latest version".  Does this help?  Please let me know if I can provide additional clarification.

Thanks!
Martha

Morning Martha,

I am importing the excel files locally, which I receive via email. Is there a way to do it with a local excel file?

No, it is not possible to automate the import of local files from your desktop into the Paxata Library.  If you click on a dataset in the Library you do have the option to add a version - this will help simplify the Library so that you can view each version of a dataset vs. adding new versions as a separate/distinct dataset in the Library.Image: https://us.v-cdn.net/6030933/uploads/editor/w3/bvy8fay66ybt.png

Interesting. Would it work from a share drive? I could load it to a network share.



Hello Ychamb,

Paxata does support the Network Share (SMB) connector, so automation can be done if the files are on a network share. I hope this helps! 

Thanks,
Akshay

Is it possible to have an incremental refresh from an Enterprise data source such as a Hive table or SQL Server database? 

Import can be based on either the entire table or a query.  If using a query then you could incorporate a condition in the where clause based on current day or some other flag that would capture updated rows.  You can then automate this to run.  I hope this helps!

Martha

Hey Martha,  we came up with a question related to the start of this thread. Is there any documentation that explains how to incrementally append data to a dataset?  This becomes circular at some point and we're having trouble figuring it out.  It seems like you would have to be able to modify a current dataset with a project without having the project create a new dataset?
sayyar
Linear Actuator

Hi @bella21,
Here's one way to accomplish this in Paxata:
  1. Import version 1 of the dataset
  2. Create a project with the dataset imported in step 1
  3. Perform necessary actions like shaping, computed columns, etc. 
  4. Add a lens, and publish the output of the lens to Library 
  5. Add a new version of the dataset imported in Step 1
  6. Edit the steps to add an Append step just before the Lens you published
  7. In the append step, Select the output of lens you published in Step 4 back into the project
  8. Use the refresh datasets button to refresh the dataset you started with the latest (version imported in step 5 will replace version imported in step 1)
  9. If incremental data has a tendency to receive duplicates, add necessary Deduplicate step to address them
  10. Automate the project
  11. Select "Use Latest Version" for both inputs datasets
Please let me know if you need additional help with this based on the specific use-case, Aaron.  

Regards,
Shyam Ayyar