Hey all, i'm very new to data science and have also just been introduced to datarobot. I have a few very basic questions that i hope you guys can help me to solve. 1. What is the maximum amount of targets datarobot can handle. Eg i have a file with 1000 possible targets which are all fairly distinguishable. Is there a limit as to how many DR can process and predict? 2. In the online promo video it said the AI can keep learning by refreshing models with new data. How then can i add data to an existing model? 3. Is it possible to expand an existing model in the case where more targets are added to the existing dataset? I'll leave it with that, i hope the questions make sense. I'm really impressed as to what DR is capable off. It has successfully predicted 5 different targets in a big datase with a pretty good accuracy. I'm still really green on how to tune the predictions, so you'll probably will hear more from me over time when i get into detail. Cheers, Tobe
1. DataRobot can only predict one target feature a time. This can be a continuous (regression), binary categorical (classification), or more diverse categorical (multiclass) feature.
If you had a dataset with 1000 possible targets and wanted to predict them all, then you would have to create 1000 different projects.
2. DataRobot does keep learning from new data. After you build your model, you will make predictions on new data. DataRobot will track the distribution of the training data and the new data for the top ten most impactful features. If the new scoring data looks very different from your training data, then DataRobot will flag this and notify you. This is called data drift.
So for example, if you are trying to predict loan default, and when you trained the data your firm only offered 36 and 72 month loans, and your firm adds a new loan length option, say 60 months - then this is going to change the distribution of that feature. If this is an important feature in your model, then you are going to want to detect that change and retrain your model. This allows you to be strategic about when you retrain your models.
3. When DataRobot creates a model it looks at key patterns in the data related to the target. If you want add new training data to a model then the platform will need to start at the beginning. You will do this by creating a new project.
Hi Emily, thanks for the response. I used the wrong wording for Q1. I wanted to know how many different solutions can data robot handle within one target selected. Eg. Target is which fruit the customer likes most. Can be apples, bananas, oranges, etc...now lets say it's really easy for DR to distinguish between each solution because of location, price, customer profile etc. In the prediction file DR will give me an overview in % which fruit the customer will buy most likely. One prediction file i ran had 5 columns. So my question is how many fruits/columns can DR handle? Q2. I might have missed the very basic. The models are stored in DR, so i guess i just need to work out how to extract that model into Alteryx etc. and run the prediction on new incoming data straight in Alteryx or a dashboard? Would that work or do i have to go back to DR everytime to run a prediction? Q3. Understand. Thanks Appreciate your quick reply. Cheers, Thomas
I think I understand now TobeT, sorry for the confusion.
If I am correct, then you are asking how many different types of fruit could one predict if that was the categorical target feature?
See example below: the goal here is to use location and age to predict type of fruit purchase:
This would be multiclass problem, and with DataRobot you can have a target with up to 100 classes. However, the more classes you have the harder the problem becomes.
2. There are a few ways you can get predictions out of DataRobot.
The first is to use the GUI to simply upload scoring data and compute results. This is good for ad-hoc analyses, or scoring you don't have to do very often.
The second is to create a deployment with the API. You can then send the data to the deployment using a script in the integrations tab. This will allow you to track data drift. This is the most common way that people deploy models with DataRobot. We have customer facing data scientists and field engineers that could help you set something like this up.
You can also download stand alone scoring code. This will allow you to score your data off of a network or at very low latency.