cancel
Showing results for 
Search instead for 
Did you mean: 

download intermediate data

valleyneo
Blue LED

How to download the data in the middle of data processing?

For example, in the blue print, data goes into categorical variable & numerical variable branches for data pre-processing, then combined together to enter the final xgboost model before prediction.

Can we download the processed data right before it enter xgboost model? 

Labels (1)
5 Replies
Tony
Data Scientist
Data Scientist

Hi @valleyneo 

 

You can download datasets from Automated Feature Discovery projects, and Spark SQL code used to create the dataset. However we don't offer the ability to download data at specific points throughout the blueprint. 

 

Are you trying to put the preprocessed dataset into another algorithm? 

valleyneo
Blue LED

I am trying to understand the data processing in details. For example, when there are both Onehot encoding and ordinal encoding in the blueprint(two branches), we don't which categorical variables go to which branch

0 Kudos
Tony
Data Scientist
Data Scientist

Is your OHE and Ordinal Encodings similar to the image here? If so, all of the categoricals in your dataset will be One Hot Encoded, as well as Ordinal Encoded. Some data processing steps in the Blueprint do best when the data is formatted in certain ways, and in this case the categorical data is being run through PCA, as well as directly to XGBoost, which is why it has two different encodings in the Blueprint. 

 

Encodings.png

0 Kudos
valleyneo
Blue LED

Got it. My blueprint it very similar to your image.

 

There are two "missing value imputation" branch on this image. Do all numeric variables go to both branches for missing imputation?

 

Also, can we at least access/download the final data right before it enters the Xgboost? 

0 Kudos
Tony
Data Scientist
Data Scientist

Correct, each numeric variable would go to each of the missing value imputation steps. 

 

We have the ability to download the feature engineered dataset, but not the ability to download data at given points in the Blueprint.