Eureqa modeling of key factors in Porsche performance data
Eureqa users are not just analysts in large corporations. Hobbyists find interesting, innovative ways to apply Eureqa to their daily lives and passions. Below is a guest post from one of our favorite adrenaline-seeking customers in San Diego. If you have a story that you’d like to share, don't hesitate to leave a comment!
Originally posted on 6/16/14
Porsche Club of America (PCA) Club Racing is a beloved sport among many Porsche enthusiasts. However, one issue with PCA Club Racing is how the cars, which span over 50 years, are classified into groups to ensure competitive racing—subjectively. Below is an overview of how retired data scientist William Ripka, Ph.D. leveraged Eureqa to test the validity of the PCA’s 16-group classification system by modeling the relative influence of different car features on performance.
Racing in time trials and autocrosses with the Porsche Club of America Zone 8 is governed by certain “rules” that classify cars into groups to ensure fair competition. These cars range from 1960s models all the way to present 2014 models. Over the years, vast improvements have been made in horsepower, performance, and handling. The problem is how to classify these cars into actual competitive classes, so competition within a class provides a fair competitive environment.
Classes are currently determined by calculating performance level points based on horsepower, weight, year of model introduction, and wheel size. Added to these base points are points assessed for tire size over a specified standard size. Finally, performance points are added for any suspension, power train, fuel injection, and other modifications made over the stock configuration that could improve performance and handling. This results in an overall point score for a particular car that determines its class (CC01 to CC16).
While there is general agreement with regard to the qualitative importance of these various factors, there has been no attempt to quantify them. Rather, the actual points assessed are based on the subjective judgment of a few experienced mechanics and drivers. This has led to some bizarre classifications where in one class, CC05, there is a 1988 924S (160 hp), a 1986 911 Turbo (231 hp), 1993 RS America (247 hp), and a 2010 987 Boxster (255 hp). This would strongly suggest the relative point assessment for the various factors needs to be adjusted.
Determining the relative importance of the various factors and assessing appropriate performance points requires a driver. The problem is that drivers vary widely in their skills; a poor driver in a powerful car may not be competitive with an excellent driver in a much less powerful one. The only way one could objectively determine, for example, the relative importance of a tire size change versus a torsion bar change on performance would be to have the same expert driver drive each car at its limit or have a robot drive the cars. Obviously, that isn’t possible.
In trying to address this problem, data was compiled for the top 15 drivers in the PCA San Diego region at various tracks over the last three years. The assumption is that these drivers are all excellent drivers with comparable skills, driving their various cars at the limit. If one accepts this assumption, this may allow comparisons of different cars with different setups. To do this, an “index” was calculated that was based on a relative score of the best times of these drivers compared to the top time of day (TTOD) at a particular track on a particular day—this takes into account length of the track, direction, conditions, etc. Therefore, someone with an index of .9 was 90% as fast as the driver with the TTOD. The individual indexes for each driver at each track were then averaged over the three years for all the events he participated in to get an overall index. The value of these indexes for all drivers ranged from about 0.8 to 1.0 (TTOD). In fact, over the last two years, the average index for any one driver was essentially constant for every track—independent of the length of the track, the direction driven, and weather conditions—indicating this is a reasonable measure of car/driver performance. With the index for all drivers ranging from 0.8 to 1.0, the average root mean square deviation for any one driver was 0.02 or less index points. That .02 RMS represents about 2–3 seconds at the tracks. The 15 expert drivers had indexes ranging from 0.932 to 0.986. Their cars were in a range of classes (CC09 to CC16) according to the current classification scheme.
Eureqa was used to determine if actual performance, as indicated by the index value, could be correlated with the factors used in the current subjective classification scheme, i.e., did it support the current system (based on a subjective analysis), or might it suggest a different one? The current formula for assessing points to determine class is:
points = (4000/(weight/hp)) + (stock front wheel size(“) + stock rear wheel size (“)-12) + (year-2010) + (sum of front and rear tire size-2*(205)) + (Performance Equipment Points)
Recasting this with coefficients for each term,
points = A*(4000/(weight/hp)) + B*(stock front wheel size(“) + stock rear wheel size (“)-12) + C*(year-2010) + D*(sum of front and rear tire size -2*(205)) + E*(Performance Equipment Points)
where A=B=C=D=E=1.0, and the relative contribution of each coefficient is 1/5 or 20%.
Most of the terms above are self-explanatory. The term with the inverse of the power to weight ratio (PW) and multiplication factor of 4000 is meant to create a steepening curve that assigns progressively higher points for each incremental improvement in the PW ratio.
The points determine the class and are meant to be a measure of car performance. If this is true, the points should have a linear relationship to the index described above. A plot of the total class points for each of the 15 drivers/cars versus the performance index is shown in Table 1. The correlation is not particularly good.
Table 1. Index vs. Total Points (current classification system)
The data used in the Eureqa analysis is shown in Table 2 and normalized in Table 3.
Table 2. Raw Data
Table 3. Normalized Data
The Eureqa analysis was run with the following search formula:
The resulting formula from the analysis is shown below:
Table 4 shows the Eureqa plot of observed vs. predicted index values.
Table 4. Observed vs. Predicted Index Values
Based on the formula found, the relative contribution of each coefficient for each term is:
(frt+rear wheel -12) 42%
(frt+rear tire -2*205) 15.8%
The old formula would have each coefficient equal at 20% (see above). Applying the approximate correction factors of 0.5, 0.5, 2, 0.8, and 1.2 then:
new points = 0.5*(4000/(weight/hp)) + 2*(stock front wheel size(“) + stock rear wheel size (“)-12) + 0.5*(year-2010) + 0.8*(sum of front and rear tire size -2*(205)) + 1.2*(Performance Equipment Points)
This suggests the hp/wt term (4000/(wt/hp)) was overestimated in importance and should be one half this value. On the other hand, the term for wheels (frt+rear-12) should be increased by a factor of two. The tire term (frt+rear-2*205) should be about 3/4 of the value, and performance should be about 1.2x its value.
After feeding race results and car metrics into Eureqa, it was determined that the current car classification system likely misjudged the relative importance of weight-to-horsepower ratio, wheel size, car model year, and equipment modifications, and that a new system with more precise weights for each input may be warranted.