In DataRobot I see some blueprints use ‘Ridit Transformation’ for the numeric features. How does this transformation work?
I’m planning to implement a coefficient-based model to a low latency environment that is isolated from the DataRobot environment. In order to operationalize the model, I need to replicate the feature engineering steps and apply the DataRobot coefficient estimation to get the predicted probability of the positive event. If I want to perform Ridit transformation in my own data preparation pipeline, how would I do that?
thanks @RayMi.
(Also FYI all: the link Ray posted is accessible only to managed AI cloud users of DataRobot (i.e., app.datarobot.com). If you’re using an on-prem installation, just modify the URL to match your instance. For example, https://app.domain-name.com/model-docs/tasks/RDT5-Smooth-Ridit-Transform.html).
Smooth Ridit Transform in DataRobot platform documentation: https://app.datarobot.com/model-docs/tasks/RDT5-Smooth-Ridit-Transform.html
DataRobot has its own implementation of Ridit transformation, so the you can’t get exactly the same result if you want to transform features outside DataRobot. Good news is, you can use the scikit-learn modules below to get something very similar:
QuantileTransformer(New in sklearn version 0.19): https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.quantile_transform.html
or
quantile_transform(Equivalent function without the estimator API): https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html
Below is an illustration of how to mimic DataRobot’s implementation of Ridit transformation (100 quantiles between [-1,1]) in a binary classification project and the test result of the difference between predicted probabilities(In this example, by applying the same coefficients to the manually Ridit transformed feature in holdout set, we can get very similar predictions compared against DataRobot predictions)