I have created a K-Means clustering model with the default blueprint that was provided by DataRobot, which contains blocks for imputation, standardizing the data, etc. The training data used for the model had many missing values and was not scaled, and the blueprint covered these transformations and I created a model.
I am now using the "Predict" tab and uploading my dataset for testing which is also not scaled and has missing values, similar to the training dataset used. Is DataRobot performing the same transformations performed on my training data, on the data uploaded for Predictions too? (missing value imputation, standard scalar)
Yes, everything should be applied the same, as the same blueprint will be used, you don't need to extra prepare your data because of steps DataRobot made.
Hi @Bogdan Tsal-Tsalko ,
Thank you! I have a couple more questions.
1. Since the output from Predictions is a .csv file with the default row id as the index, I wanted to know whether the order of the rows in the output .csv is the same as the order of the rows in the dataset passed for prediction, so that I can map it to the required entity after downloading it.
2. Is it possible to download the model as a .pkl file in case I want to load it in a different environment(Jupyter Notebook)? If not, then what are the other methods I can use to export the DataRobot model for loading it in another environment?
3. Is there any way to set a particular column as the index for clustering instead of using the default index?
Again, thank you for your helpful and timely replies to all of my questions! I really appreciate it. 🙂
1. Yes DataRobot orders predictions in same order as you upload them there. If you use predictions tab in app to do predictions (we have a ton of other options to do predictions) you can verify that: use your ID column as Optional Features. Otherwise (if you will use MLOps or API) there will be another feature provided to ensure order.
2. No, we don't provide provide models in .pkl format (mainly because it is really hard to ensure same performance under different platforms it will be used) but you can find other portable options:
Scoring Code to get a model in form of java file
Rating Tables some models might have simple interpretable results in forms of Rating table
DataRobot Prime retrains rule-based model exportable to Python or java
Portable Prediction Server docker image of model that might be run as prediction server outside the DataRobot
3. I'm not sure if I understand question, as clustering doesn't use any index. If you mean default cluster names it produces, than you can rename them by pressing 'Rename clusters'
Feel free to ask if any questions left