I have a trail version of data robot. I have the following questions.
1)After building model in DataRobot, can I convert into either pmml or pickle an score it externally ?
2)How to create the final features used by the final model if I need to score the model externally ?
3)I did read about DataRobot prime where you can download the model as jar file. How do I do that using the trial version ?
Thanks,
PK
Hi PK,
I've responded to your comment touching all of these topics similarly in your other thread here:
Thanks for the response!
I have further few questions.
1)Can I prevent autofeature creation while creating models in Data Robot ? ie use features as it is and not create extra new features as part of feature engineering
2)How do I run single model instead of automl.Say I want to run a xgboost and not any other model. How do I do that ? Is it possible ?
3)Do you have examples of java jar files as part of model export. Can you please give me any examples of documents regarding this ?
4)Regarding scoring again ie "Standalone scoring engine" as in DataRobot document wiki, Can you please point or give me an example ?
5)Regarding scoring again ie "Spark scoring as in DataRobot document wiki, Can you please point or give me an example ?
What are the differences between 3,4,5 ? We are trying to understand DataRobot and this will help us understanding this more.
Regards,
PK
No problem PK.
Thanks!!
I have few follow up questions
1)Regarding downloading the model as a jar file and scoring outside , if the a particular feature
was created while creating the model using automatic feature creation say feature1= feature2*sqrt(feature3) ,
would this new feature be created automatically when using the jar file while scoring ?
2)Can you please give an example of python scoring too or point ot some documentation on this ?
3)Were any performance issues noted when using java jar files ? This is because when we use blended models or ensembles then
would not jar file size increase ?What's the maximum size of the jar file that would be created ?
We are looking and evaluating Data Robot and trying to understand its nuances and hence these questions .
Regards,
PK
It might be worthwhile setting up a call with your team and one of our account teams to better understand your needs and how we can partner up to solve them together.
Thanks for the replies.
I have one more question regarding the features created by Datarobot automated feature engineering.
If I need to know how features are constructed say just for example: feat1= feat2*(feat3) . Then How would I access this information as I might need to document this as part of regulatory and mandatory requirements. I will need to document the feature construct or the formulae.
Thanks
PK
Hi PK -
DataRobot shows blueprints of what is going on within the data prep and training/scoring pipeline for a DataRobot model. If features are created using some of the more advanced feature discovery methods inside DataRobot, for example, aggregates derived from joining multiple tables, the lineage of those features is available to see as well. As far as documents and regulatory requirements, for each model inside DataRobot, you can create compliance documentation. This documentation can be templated as well, so you can construct a specific template for DataRobot to populate per your reporting needs as well. There isn't a sample on the community of a compliance document, although you can see the blueprints and the doc is referenced in this article as well. https://community.datarobot.com/t5/resources/describing-and-evaluating-models/ta-p/1491
Thanks once again.
But would blueprint give me the level of details like say for feat1 if feat1 is constructed as feat1= feat2*(feat3) ?Would it display this formulae as feat2*(feat3) ?
Regards,
PK
The blueprint would tell you things like one hot encoding was used or missing values were imputed, with links to documentation as to how. Leveraging automated feature discovery, where we might have the sum(last 7 days of sales) is shown in a feature lineage graph; which dataset(s) were used and how the data was used to construct the feature.