cancel
Showing results for 
Search instead for 
Did you mean: 

Download model

Highlighted
Image Sensor

I have a trail version of data robot. I have the following questions.

1)After building model in DataRobot, can I convert into either pmml or pickle  an score it externally ?

2)How to create the final features used by the  final model  if I need to score the model externally ?

3)I did read about DataRobot prime where you can download the model as jar file. How do I do that using the trial version ?

Thanks,

PK

 

9 Replies
Highlighted
DataRobot Employee
DataRobot Employee

Hi PK,

I've responded to your comment touching all of these topics similarly in your other thread here:

https://community.datarobot.com/t5/automated-machine-learning/download-features/m-p/8611/highlight/f...

Highlighted
Image Sensor

Thanks for the response!

I have further few questions. 

1)Can  I  prevent autofeature creation while creating models in Data Robot ? ie use features as it is and not create extra new features as part of feature engineering

2)How do I run single model instead of automl.Say I want to run a xgboost and not any other model. How do I do that ? Is it possible ?

3)Do you have examples of java jar files as part of model export. Can you please give me any examples of documents regarding this ?

4)Regarding scoring again ie "Standalone scoring engine"   as in DataRobot document  wiki, Can you please point or give me an example  ?

5)Regarding scoring again ie "Spark scoring   as in DataRobot document  wiki, Can you please point or give me an example  ?

What are the differences between 3,4,5 ? We are trying to understand DataRobot and this will help us understanding this more.

Regards,

PK

 

Highlighted
DataRobot Employee
DataRobot Employee

No problem PK.

  1. You can see on the blueprint for an individual model what is done - both data prep/engineering steps as well as the feed into an algorithm.  If you want to use your own imputed value rather than have DataRobot create them for example, then you should do it in your training dataset prior to project creation (as well as future scoring) - although the blueprint steps will be run for fields where the logic is applicable; I don't believe there is a way to do something like turn off one-hot encoding for a text field, for example.
  2. There are several options before hitting the start button for a project - if you choose the Manual mode, you can go into the model repository and pull out all the blueprints to try that you are interested in.  You could limit this to only xgboost blueprints.
  3. The java export can be used in many ways.  There is an API to interact with it programmatically here.  It can also be used as a command line executable to score csv files.  You can see an overview article about it in this article - Codegen/Scoring Code Feature.  There is a webinar available as well.
  4. The Standalone Scoring Engine (SSE) is a dedicated prediction server for hosting models that is separate from the DataRobot platform cluster; however it is being retired in favor of the Portable Prediction Server (PPS) - which offer greater flexibility and functionality.  These are docker containers of models that can be brought up on demand.  We have some articles on how to host these in AWS, Azure, and GCP.
  5. It is mentioned in the webinar above, and we have some examples of scoring on Spark with the java model as well.  How to Monitor Spark Models with DataRobot MLOps has an example on the Databricks platform, also leveraging agent code to send scoring data to DataRobot so that it can monitor the Spark deployed model for drift.  Scoring Snowflake Data via DataRobot Models on AWS EMR Spark provides an example on AWS with an on demand creation of a Spark environment to score data from a Snowflake database and write it back, although this example does not include the connection back for model monitoring.  Note that this is just an example and not required to score Snowflake data; typically the expectation when using this approach is that there is a large amount of data, and that data is undergoing a lot of prep within the Spark platform.  After which we would use the java code to score a Spark dataframe.
Highlighted
Image Sensor

Thanks!! 

I have few follow up questions

1)Regarding downloading the model as a jar file and scoring outside , if the a particular feature
was created while creating the model using automatic feature creation say feature1= feature2*sqrt(feature3) ,
would this new feature be created automatically when using the jar file while scoring ?

2)Can you please give an example of python scoring too or point ot some documentation on this ?

3)Were any performance issues noted when using java jar files ? This is because when we use blended models or ensembles then
would not jar file size increase ?What's the maximum size of the jar file that would be created ?

We are looking and evaluating Data Robot and trying to understand its nuances and hence these questions .

Regards,
PK

 

Highlighted
DataRobot Employee
DataRobot Employee
  1. If the feature was created by DataRobot inside DataRobot, you will not need to provide it; DataRobot will perform the necessary steps to create it itself.  This is true regardless of chosen deployment method of a model.
  2. I've constructed this link which I think will get you to in-app documentation on the AI Trial to DataRobot Prime.  You can also learn some more about it in this community article - Exporting models with DataRobot Prime
  3. For eligible algorithms, the java jar is available for both individual models in a project as well as blenders/ensembles of models.  In my experience the jars tend to be 15 MB and up, although I'm not too sure how big a large one would be; although I do not think it has much if any impact on performance.  The jar is highly performant when used as a command line csv batch scorer.  It is also quick when imported into projects and used to score records.  Is there a particular SLA you are trying to meet?  Is this a real-time use case, or a batch one, or just general questions?

It might be worthwhile setting up a call with your team and one of our account teams to better understand your needs and how we can partner up to solve them together.

Highlighted
Image Sensor

Thanks for the replies.

I have one more question regarding the features created by Datarobot  automated feature engineering. 
If I need to know how features are constructed say just for  example: feat1= feat2*(feat3) . Then How would I access this information as I might need to document this as part of regulatory and mandatory requirements. I will need to document  the feature construct or the formulae.

Thanks

PK

Highlighted
DataRobot Employee
DataRobot Employee

Hi PK -

DataRobot shows blueprints of what is going on within the data prep and training/scoring pipeline for a DataRobot model.  If features are created using some of the more advanced feature discovery methods inside DataRobot, for example, aggregates derived from joining multiple tables, the lineage of those features is available to see as well.  As far as documents and regulatory requirements, for each model inside DataRobot, you can create compliance documentation.  This documentation can be templated as well, so you can construct a specific template for DataRobot to populate per your reporting needs as well.  There isn't a sample on the community of a compliance document, although you can see the blueprints and the doc is referenced in this article as well.  https://community.datarobot.com/t5/resources/describing-and-evaluating-models/ta-p/1491

Highlighted
Image Sensor

Thanks once again.

But would blueprint give me  the level of details like  say for feat1 if feat1 is constructed as feat1= feat2*(feat3)  ?Would it display this formulae as  feat2*(feat3) ? 

Regards,

PK

 

 

 

Highlighted
DataRobot Employee
DataRobot Employee

The blueprint would tell you things like one hot encoding was used or missing values were imputed, with links to documentation as to how.  Leveraging automated feature discovery, where we might have the sum(last 7 days of sales) is shown in a feature lineage graph; which dataset(s) were used and how the data was used to construct the feature.