Data scientists spend over 80% of their time collecting, cleansing, and preparing data for machine learning. You can significantly simplify this with DataRobot Paxata. Using "clicks instead of code" reduces your data prep time from months to minutes and gets you to reliable predictions faster.
In this Ask the Expert event, you will be able to chat with Krupa and ask your questions about data prep. On this interesting and important topic, Krupa is available to help clarify and answer your questions.
|
Krupa Natarajan is a Product Management leader at DataRobot. Krupa has spent over a decade leading multiple Data Management products and has deep expertise in the space. She has a passion for product innovations that deliver customer value and a proven track record of driving vision to execution. |
=============================================================================
Hi Everyone,
This Ask the Expert event is now closed.
Thank you Krupa for being a terrific event host!
Let us know your feedback on this event, suggestions for future events, and look for our next Ask the Expert event coming soon.
Thanks everyone!
Hi @knat ,
That you for taking my question. My question is what's the difference between data prep for business intelligence / data warehousing and data prep for machine learning / AI?
Thank you, Nicole
Hi Krupa - thanks for taking my question!
I'm very interested in the capabilities around data prep, as it's a critical step in the process for everyone. My question is, can I run a real-time prediction pipeline in Paxata?
Hi Nicole! A number of steps are common while there are some key differences.
Both BI and ML/AI use cases require that the user has access to data from a variety of data sources and ability to work with a variety of data formats, join datasets together, cleanse and standardize the data (this step is very important to ensure prediction quality), perform transformations, aggregations and such.
In addition, data prep exercises for ML/AI, can be split into two distinctive life cycles: (a) Training Dataset preparation and (b) Inference/Prediction Dataset Preparation
For Training datasets, the Data Scientist/Business Analyst preparing data should address the following critical aspects based on the business value they are trying to achieve
For Prediction time Data prep, you will need the data prep tool to operationalize and potentially automate as many of the data acquisition, data merging, cleansing and transformation steps, before the data can be sent to deployed models for generating prediction scores. In many cases, after scores are returned, more data prep steps may be applied
Thank you for your interest.
Paxata has a new 'Predict Tool' that allows DataRobot deployments to be invoked directly from Data Prep projects. The data acquisition + data prep steps + prediction scoring can all be operationalized using the Intelligent Automation capability that exists in Paxata and scheduled to run automatically or on-demand.
The is accessible through REST API as well, enabling near real time predictions
Thank you Krupa!
Krupa, How is DataRobot Paxata Data Prep different from the other data prep tools on the market with regards to data prep for AI?
Thanks!
Great question! Paxata has been the leader in the Data Prep market (by major Analyst reports such as the Gartner Magic Quadrant), and now with the merger of DataRobot and Paxata, DataRobot combines the best in class in Data Prep with the best in class in Enterprise AI platform.
DataRobot Paxata is the only Data Prep offering that enables Data Scientists and Business Analysts to interact with their full scale of data without being limited to small samples. This is a key differentiator when it comes to enabling users to identify data quality issues and cleanse the data for ML exercises.
DataRobot Paxata also has unique intelligence capabilities such as the patented Join detection. DataRobot Paxata automatically identifies how datasets Join together for Feature Enrichment. Algorithmic Fuzzy Join is supported for scenarios where the enrichment data coming from different systems and applications may be represented in different ways making exact matches nearly impossible - in common scenarios such as this, DataRobot Paxata's Fuzzy matching allows for Feature Enrichment regardless of the variation in data.
Another important capability is DataRobot Paxata's algorithmic standardization - with a single click, Paxata will identify similar values (example: misspellings in City names) in Categorical variables and standardize them leading to better training data and hence better prediction quality
DataRobot Paxata is closely integrated with the DataRobot core allowing for exploration of the AI Catalog from within the Data Prep experience, ability to invoke deployed models for prediction scoring from within the Data Prep project using the Predict tool and exploration of Prediction results including Prediction explanations in the Data Prep Project for better conversion of prediction to value
Thanks for the response @knat .
Could you please explain in a bit more detail how fuzzy join works?
Hi,
Regarding the data prep files, are there "any" file size limitations or file type limitations?