Regarding partial dependency plots. In the Datarobot documentation
It seems to define actual as the mean value of the target in the data for the given fold for the feature of interest and predicted as the mean value of the prediction from the model, likewise.
What it the partial dependency line?
Seems to define it as the expected value varying the other features. Which would be the actual or the predicted line depending on which function was involved. But in the above link to the Datarobot documentation, indicates that the calculation of the partial dependency uses all values of the feature of interest, and then scales using only the values in the fold.
It is unclear to me what that means. Can someone clarify?
Note: I am definitely looking for someone to not attempt to explain to me the language in the paragraph, rather I hope someone knows the actual calculation in some terms that are more clear - such as a direct summation formula, thanks.
Thank you for your clarification Bruce and I think it would be easier to define the partial dependence calculation as an algorithm rather than a mathematical expression.
I think part of the reason for the confusion is it's simplicity and once you have it explained to you verbally or visually it will probably become apparent as to why it is difficult to distill into the standard journal form.
Given this long discussion trail a verbal conversation via one of our instructor-led courses or discussion with one of our Customer Facing Data Scientists is probably the most efficient way of understanding this and many other topics it will probably lead into.
Thank you once again for interest.
Thanks for following up on this.
I have been unable to continue for now due to work priorities. But, since some of these relate to our decisions about which data science platform to go with, I am still interested. And I am personally interested, as I usually make a point to understand all these details in a precise mathematical sense.
Can you suggest a good proper mathematics article or book chapter? My background is that I have a doctorate in mathematics after doing degrees in engineering and then software and working on industrial stochastic control systems. As such, I am used to just getting down to technical hard core. Probability spaces, stochastic process, and so on. I found that people on this forum either cant or wont talk about that - so it is not clear to me that talking to a Data Robot Data Scientist will be any better.
An exact description of the computational process would work - and I know that they were attempting that, but we kept talking at cross purposes. The discussion got entangled with details of sampling that are orthogonal to the matter of what partial dependency is.
My actual problem was that it was very unclear what Datarobot was doing in precise terms. And the terms seem to be used differently in different references and to not be very common. I looked up several references including, for example Practical Statistics for Data Scientists by Peter Bruce - and no such term "partial dependency" appeared in the index. I also asked a colleague who is a career data scientist with a mathematics degree - who was unfamiliar with the term as well.
What I was hoping for when I posted originally was literally just the mathematical definition of what was being calculated. I never got that. The right 10 page article could clear the whole thing up for me in one reading.
Much appreciated as I professionally and personally want to clear this up.
@Bruce I am so sorry you have been frustrated with our attempts at clarification on this.
Pardon my giving a simplistic analogy to this, it's like trying to describe a color verbally when it is much simpler to just see the color. Similarly trying to describe how to ride a bike is rather difficult instead of just sitting on it.
The reason I recommended one of our DataRobot University instructor-led courses is that the instructors provide a detailed explanation of partial dependence along with examples and applications that give the students a much better picture of partial dependence via that verbal and visual instructional mode.
Another alternative is to have one of our Customer Facing Data Scientists explain it to you via the platform with examples and perhaps even your own data for a better appreciation in your domain.
We do hope to be able to better serve you in your machine learning journey and appreciate your queries and consideration.
I have been answering all your questions on this chain. The reason why me and Lukas were trying to explain to you the calculations behind the partial dependence charts instead of just referring to the Christopher Molnar's book you had provided is because your interpretation of the formula in the book was incorrect (Seems to define it as the expected value varying the other features).
I understand that your technical background is different to mine and hence I was trying to explain the maths behind the calculations. Partial dependence plots is common tool used in Machine Learning and there is a lot of literature on this. The reason I explained in detail instead of just referring you the link was to give you more details on how some of the specifics related to how calculations work in the background in DataRobot (sampling etc.).
I am sorry you feel that I am trying to play some game or waste your time here. My only intention was to help and I was never curt. I was trying to help you to the best of my ability based on my own data science experience and familiarity with DataRobot.
Is there some way to get direct technical answers without going through this kind of mess again?
I have been very frustrated by the answers I got here, especially since the eventual answer was simply that the literature I referred to is correct after all. In my opinion Vinay has been playing some silly game. He either is incapable or thinks that I am. Not good. He just belatedly looked in the reference I gave and found the formula I referred to and copied it into his answer. Because of this, I now have zero trust in his abilities and will consider the question unanswered.
Addendum: I am honestly and determinedly looking for an answer here. My background is very strong in technical systems identification in heavy industry from an engineering and mathematical point of view, which is relevant to my current role of data scientist - but, that together with my age means that I have very different cultural expectations. While I have no reason to suppose that you will grok that - I need to find someone who can translate. You said you hope that the answers from your colleagues were clear. Go back and look at the chain of posts - see just how far apart Vinay and myself are in the way in which we are talking and our focus and expectations. He kept answering the wrong question until I forced the issue and then he became curt. I do not want that. I want to come to this forum for a pleasant professional and informative interaction. If you can honestly assist - it would give me a better feeling about Datarobot than I currently have.
@Vinay So, you are saying that the formula I referred to in the reference that I gave up front to the literature is correct and is the way that Datarobot does it? Why did you not just say this in the first place?
Hi @Bruce ,
I hope the answers my colleagues provided are clear, and since the Feature Effects (Partial Dependence) is also explained, along with many other topics, in the DataRobot University instructor-led AutoML I course ( https://university.datarobot.com/automl-i ) you may wish to sign up for it to get a better understanding of how the platform works.
Hi @Bruce ,
I think like you mentioned our technical backgrounds differ. I reread what you were confirming regarding dependency plots
The dependency plot is the expected value of the model fixing the value of the feature of interest?
This is incorrect. The correct explanation is provided in the Christopher Molnar's book you had referenced in your first question (link). An excerpt from the same is given below -
The concept of marginalization is important here. The average needs to be taken after marginalizing.
@Vinay Thanks again.
Is it correct to say
dependency(x) = expected(M(x,Y) )
Given my earlier clarification of the meaning of terms. (how to sample or fold the data or interpret the plot is not part of my question).