In DataRobot I see some blueprints use ‘Ridit Transformation’ for the numeric features. How does this transformation work?
I’m planning to implement a coefficient-based model to a low latency environment that is isolated from the DataRobot environment. In order to operationalize the model, I need to replicate the feature engineering steps and apply the DataRobot coefficient estimation to get the predicted probability of the positive event. If I want to perform Ridit transformation in my own data preparation pipeline, how would I do that?
Smooth Ridit Transform in DataRobot platform documentation: https://app.datarobot.com/model-docs/tasks/RDT5-Smooth-Ridit-Transform.html
DataRobot has its own implementation of Ridit transformation, so the you can’t get exactly the same result if you want to transform features outside DataRobot. Good news is, you can use the scikit-learn modules below to get something very similar:
QuantileTransformer(New in sklearn version 0.19): https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.quantile_transform.html
quantile_transform(Equivalent function without the estimator API): https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html
Below is an illustration of how to mimic DataRobot’s implementation of Ridit transformation (100 quantiles between [-1,1]) in a binary classification project and the test result of the difference between predicted probabilities(In this example, by applying the same coefficients to the manually Ridit transformed feature in holdout set, we can get very similar predictions compared against DataRobot predictions)
(Also FYI all: the link Ray posted is accessible only to managed AI cloud users of DataRobot (i.e., app.datarobot.com). If you’re using an on-prem installation, just modify the URL to match your instance. For example, https://app.domain-name.com/model-docs/tasks/RDT5-Smooth-Ridit-Transform.html).
First off, big shoutout to sergeZ for bringing up this interesting question about Ridit transformation in DataRobot.
So, about the Ridit transformation – it's a way to rank and normalize your numeric features, making them more suitable for models. Essentially, it replaces the values with their relative ranks within each feature, which can help with reducing the impact of outliers and uneven distributions.
Now, sergeZ, I totally get your situation with needing to replicate this outside of DataRobot. If you're working on your own data preparation pipeline, you could consider using libraries like scipy.stats.rankdata in Python. This should help you achieve Ridit transformation for your numeric features (I tried this when I was working on a project at Andersen and it worked, I made it and hit the deadline exactly). Then, for the coefficient-based model, you could use any standard machine learning library like scikit-learn.
In my experience, implementing such transformations manually might require a bit of trial and error to match DataRobot's results, but it's definitely doable. Just make sure to thoroughly test and validate your model's performance.