Data scientists spend over 80% of their time collecting, cleansing, and preparing data for machine learning. You can significantly simplify this with DataRobot Paxata. Using "clicks instead of code" reduces your data prep time from months to minutes and gets you to reliable predictions faster.
In this Ask the Expert event, you will be able to chat with Krupa and ask your questions about data prep. On this interesting and important topic, Krupa is available to help clarify and answer your questions.
Krupa Natarajan is a Product Management leader at DataRobot. Krupa has spent over a decade leading multiple Data Management products and has deep expertise in the space. She has a passion for product innovations that deliver customer value and a proven track record of driving vision to execution.
DataRobot Paxata has a tool named 'Predict tool' on the tool panel along side other tools such as Joins, Aggregates, Remove Rows etc. At any point in the Data Prep Project you can add the Predict tool to your Project steps - you will be required to provide your DataRobot API token and with just that the tool will fetch all Deployments from DataRobot along with a desc of the Deployment. You can choose a Deployment from the list and tell the tool if you need prediction explanations returned along with the scores. And that's pretty much all you need to do. You will see the prediction scores come back into the DataPrep Project into the rows of data - at this point you can proceed to add additional Data Prep steps on the data that includes the prediction scores and explanations. You can also spin up Filtergrams (interactive histograms) to explore the prediction scores alongside other columns in the data.
Great question. But, my question will be 'why not?'... if you can explore your entire dataset and derive insights instantly, why would you want to be limited to samples?. This is especially helpful if your data has anomalies or characteristics that may potentially be missed in the sample.
Also the key difference between workflow driven data preparation and data driven data preparation exercises is that in the former, the requirements or logic guide your work and in the latter your data prep steps are guided by the actual data. If that's the case, then it is helpful to be guided by the entire data as opposed to being led by a sample.