I have a prediction task where the average level of outcome is likely to vary significantly (and- for the sake of argument- unpredictably) in future, but the relativities between outcomes may not (and it's this that is of interest). Is there any way to select an accuracy tracking metric - like Gini coefficient or RMSE adjusted for mean daily drift - that would be appropriate for this task? at the moment any accuracy degradation figures I get for my model are just dominated by the drift in the mean value.
Hey @Tom B,
I think your question is quite similar to this one 'Custom Metrics for Monitoring'. The default metrics available through the UI are listed in the Accuracy tab. To create your customer metric what you would do is use the DataRobot API to calculate your metric and feed that back to the platform to trigger what ever action your looking for.
Python API example:😞
import datarobot as dr project = dr.Project.get('5cc899abc191a20104ff446a') model = project.get_models() deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.model['id'], deployment.model['type'] >>> ('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)') #Calculate custom metric
My instinct would be re-examine the objective of the modeling task if you already know the outcomes are likely to vary significantly. Perhaps a 2-stage or multi-stage modeling process would be appropriate for this?
I totally get where you're coming from, Tom B. Dealing with fluctuating outcome levels while maintaining the relative patterns is indeed a tricky challenge.
In my experience, tackling this issue might involve a two-pronged approach. First, you could consider normalization techniques to handle the mean drift. This could involve scaling your predictions based on the current mean, so your accuracy metric isn't skewed by those variations.
But for your main concern, using a custom loss function could be a game-changer. Since you're more interested in the relative differences, crafting a loss function that gives more weight to preserving those relationships rather than just focusing on absolute values could do the trick. Something like a relative squared error might be worth exploring.
Another thought is adopting a multi-stage modeling process. Training your model in stages, adjusting for mean drift in one stage and focusing on the relative patterns in another, might provide better insights and results.