cancel
Showing results for 
Search instead for 
Did you mean: 

R square value different from manual calculation

Highlighted
NiCd Battery

 Hi ,

The following validation set and its prediction values shows an R2 value of 0.69 where as Datarobot result shows 0.65. This is not specific to this one data set, whatever model i use, when i check the validation set and calculate its R2, its slightly different from what datarobot is showing. Am i missing anything? 

ActualPrediction
17.9863.6761
35.6140.0788
54.1679.47874
58.6956.8481
77.5797.33846
141.14157.1376
161.05106.6461
178.8127.3321
6 Replies
Highlighted
NiCd Battery

Hi Manoj

 

There are several types of R² - in addition to the calculation that you will have learnt in school there are also:

 

Adjusted R² - which account for the effect of adding more fields to the data (this can "artificially" fit the data)

 

Predicted R² - this will directly check the prediction by rerunning the model with missing data points and checking its prediction against those points.

 

Both these values will be lower than the "vanilla" R² but will be more accurate. I am not sure - trying to check the documentation to see but I imagine that datarobot would use one of those metrics rather than the standard.

 

Erica

Highlighted
NiCd Battery

Well I'm curious now :0)

@emily  or anyone else from Datarobot ...

Can you tell us what kind of R squared is used in the residuals tab? It doesn't specify in the documentation. Thank you!

https://app.eu.datarobot.com/docs/modeling/investigate/evaluate/residuals.html

Highlighted
Data Scientist
Data Scientist

Hi Manojkumar and Erica, 

This was a  good question.  I had to do some digging to find the answer  

There are several methods for computing R2, and their results don’t always match.  We use the most general definition of R2, which you can read about in detail on wikipedia: 1 - (residual sum of squares) / (total sum of squares):

emily_0-1590076704021.png

Here is some R code that explains the calculation more thoroughly: 

 

a <- c(17.98, 35.61, 54.16, 58.69, 77.57, 141.14, 161.05, 178.8)

p <- c(63.6761, 40.0788, 79.47874, 56.8481, 97.33846, 157.1376, 106.6461, 127.3321)

# Manual method
SSE = sum((p - a)^2)

SST = sum((mean(a) - a)^2)

R2 = 1 - SSE/SST

print(R2) # 0.6550015

# Package method
print(MetricsWeighted::r_squared(a, p)) # 0.6550015

 

 

The residual plot uses the same approach, but down samples some of the data. Specifically if there are more than 1000 data points.  So you may see some differences here as well. 

I hope this helps.  Thanks for posting! 

Emily

 

Highlighted
NiCd Battery

Thanks for clarifying @emily ! 

Erica

 

0 Kudos
NiCd Battery
Hi Erica, Thanks for your response !!
Highlighted
NiCd Battery

Hi Emily,

Thanks for the detailed clarification !

0 Kudos