The subject says it all. In the context of Data Robot, I find that training takes a day - which I can understand, predictions take one x coffee, and explanations take hours. It is the last one I do not see. The model is gradient boosted tree. Surely one gets the explanation from the process of the prediction, more or less.
Hi @Bruce , thanks for the post! Let's make sure we get to this during our next scheduled call, and we can try to provide some more info. We can of course update this thread once we have an answer that explains why prediction explanations are slower than a prediction without!
Hi @Andrew Nicholls ,
Thanks. However, I have already read it. My question above was prompted by what I see as a contradiction between that document and the answer from @j1z0
Hey @Bruce it sounds like you may be interested in the white paper that discusses XEMP and LIME: https://www.datarobot.com/resources/xemp-prediction-explanations/
I suspect that this is the answer. But before I accept it - I do have a couple of follow up questions.
I thought that the distinction between Xemp and Lime was supposed to be that Xemp uses only extant data while Lime is based around synthetic data. But, what you say seems to remove the distinction.
Then the explanations are a form of correlation analysis? 1st, 2nd, 3rd explanations are only distinguished by the strength in some sense - and not in any manner hierarchical?
Aside: is it possible to get the details of the tree structure?
Thanks for the question. The short answer is cause it takes potentially 100 times more predictions to generate explanations. The longer more detailed answer is as follows. DataRobot wants all models to play on a level field and be compared honestly, so it doesn't matter what is the nature of the model. That is to say, we want all model types, like XGBoost, linear regression, and neural network models to all be explainable. To achieve this we use XEMP. XEMP is a model-agnostic explanation method, and therefore can't rely on model architecture internals. To do that we run predictions not just on the original row, but also on the synthetic data, generated from samples of the project dataset - we get samples from up to 50 columns and each column up to 11 samples. The exact amount of columns and samples we use varies but still may require making roughly 100 predictions per row of the original dataset.
So this is why you may notice explanations to be 100 times slower than predictions. I hope this clarifies things. Let me know if you still have questions.