Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

πŸ—“ ASK THE EXPERT: Let's discuss Data Prep -Feb 24

DataRobot Alumni

πŸ—“ ASK THE EXPERT: Let's discuss Data Prep -Feb 24

Welcome to this Ask the Expert discussion

Data scientists spend over 80% of their time collecting, cleansing, and preparing data for machine learning. You can significantly simplify this with DataRobot Paxata. Using "clicks instead of code" reduces your data prep time from months to minutes and gets you to reliable predictions faster.

In this Ask the Expert event, you will be able to chat with Krupa and ask your questions about data prep. On this interesting and important topic, Krupa is available to help clarify and answer your questions.



Krupa Natarajan is a Product Management leader at DataRobot. Krupa has spent over a decade leading multiple Data Management products and has deep expertise in the space. She has a passion for product innovations that deliver customer value and a proven track record of driving vision to execution.



Hi Everyone,

This Ask the Expert event is now closed. 

Thank you Krupa for being a terrific event host!

Let us know your feedback on this event, suggestions for future events, and look for our next Ask the Expert event coming soon.

Thanks everyone!

Labels (2)
28 Replies
Image Sensor


So how does the integration between data prep and DataRobot modeling actually work?


0 Kudos
Image Sensor

Hi @knat ,

Why is it important to work on your full dataset at prep time instead of a sample?



0 Kudos

Hi @annapeters0n !

DataRobot Paxata has a tool named 'Predict tool' on the tool panel along side other tools such as Joins, Aggregates, Remove Rows etc. At any point in the Data Prep Project you can add the Predict tool to your Project steps - you will be required to provide your DataRobot API token and with just that the tool will fetch all Deployments from DataRobot along with a desc of the Deployment. You can choose a Deployment from the list and tell the tool if you need prediction explanations returned along with the scores. And that's pretty much all you need to do. You will see the prediction scores come back into the DataPrep Project into the rows of data - at this point you can proceed to add additional Data Prep steps on the data that includes the prediction scores and explanations. You can also spin up Filtergrams (interactive histograms) to explore the prediction scores alongside other columns in the data.


0 Kudos

Hi @DaveTheMaster !

Great question. But, my question will be 'why not?'... if you can explore your entire dataset and derive insights instantly, why would you want to be limited to samples?. This is especially helpful if your data has anomalies or characteristics that may potentially be missed in the sample. 

Also the key difference between workflow driven data preparation and data driven data preparation exercises is that in the former, the requirements or logic guide your work and in the latter your data prep steps are guided by the actual data. If that's the case, then it is helpful to be guided by the entire data as opposed to being led by a sample.

0 Kudos