Solved: Re: download intermediate data - DataRobot Community

valleyneo · ‎07-20-2021

How to download the data in the middle of data processing?

For example, in the blue print, data goes into categorical variable & numerical variable branches for data pre-processing, then combined together to enter the final xgboost model before prediction.

Can we download the processed data right before it enter xgboost model?

Tony · ‎07-23-2021

Correct, each numeric variable would go to each of the missing value imputation steps.

We have the ability to download the feature engineered dataset, but not the ability to download data at given points in the Blueprint.

View solution in original post

Tony · ‎07-23-2021

Hi @valleyneo

You can download datasets from Automated Feature Discovery projects, and Spark SQL code used to create the dataset. However we don't offer the ability to download data at specific points throughout the blueprint.

Are you trying to put the preprocessed dataset into another algorithm?

valleyneo · ‎07-23-2021

I am trying to understand the data processing in details. For example, when there are both Onehot encoding and ordinal encoding in the blueprint(two branches), we don't which categorical variables go to which branch

Tony · ‎07-23-2021

Is your OHE and Ordinal Encodings similar to the image here? If so, all of the categoricals in your dataset will be One Hot Encoded, as well as Ordinal Encoded. Some data processing steps in the Blueprint do best when the data is formatted in certain ways, and in this case the categorical data is being run through PCA, as well as directly to XGBoost, which is why it has two different encodings in the Blueprint.

valleyneo · ‎07-23-2021

Got it. My blueprint it very similar to your image.

There are two "missing value imputation" branch on this image. Do all numeric variables go to both branches for missing imputation?

Also, can we at least access/download the final data right before it enters the Xgboost?

Tony · ‎07-23-2021

Correct, each numeric variable would go to each of the missing value imputation steps.

We have the ability to download the feature engineered dataset, but not the ability to download data at given points in the Blueprint.

download intermediate data

download intermediate data

Data

How to stop uploading

How do I upload a JDBC driver

Paxata Cache Folder

how to transform the var type in workbench

Understanding Model