cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

πŸ—“ ASK THE EXPERT: Let's discuss Data Prep -Feb 24

Highlighted
Image Sensor

Krupa,

So how does the integration between data prep and DataRobot modeling actually work?

Thanks!

0 Kudos
Highlighted
Image Sensor

Hi @knat ,

Why is it important to work on your full dataset at prep time instead of a sample?

Thanks,

Dave

0 Kudos
Highlighted
DataRobot Alumni

Hi @annapeters0n !

DataRobot Paxata has a tool named 'Predict tool' on the tool panel along side other tools such as Joins, Aggregates, Remove Rows etc. At any point in the Data Prep Project you can add the Predict tool to your Project steps - you will be required to provide your DataRobot API token and with just that the tool will fetch all Deployments from DataRobot along with a desc of the Deployment. You can choose a Deployment from the list and tell the tool if you need prediction explanations returned along with the scores. And that's pretty much all you need to do. You will see the prediction scores come back into the DataPrep Project into the rows of data - at this point you can proceed to add additional Data Prep steps on the data that includes the prediction scores and explanations. You can also spin up Filtergrams (interactive histograms) to explore the prediction scores alongside other columns in the data.

 

0 Kudos
Highlighted
DataRobot Alumni

Hi @DaveTheMaster !

Great question. But, my question will be 'why not?'... if you can explore your entire dataset and derive insights instantly, why would you want to be limited to samples?. This is especially helpful if your data has anomalies or characteristics that may potentially be missed in the sample. 

Also the key difference between workflow driven data preparation and data driven data preparation exercises is that in the former, the requirements or logic guide your work and in the latter your data prep steps are guided by the actual data. If that's the case, then it is helpful to be guided by the entire data as opposed to being led by a sample.

0 Kudos