This article provides a quick tour of Eureqa models within DataRobot and their origin and purpose, and explains how to build a project within DataRobot that utilizes these models.
If you look at the Leaderboard (Figure 1), you will see a lot of open source models based on XGboost, Tensorflow, Sklean, etc. However, there are some models on the Leaderboard that are not open source.
Figure 1. Leaderboard showing a number of open source models
Eureqa models are denoted by a blue EQ symbol (Figure 2) and are not based on open source.
Figure 2. Leaderboard showing a Eureqa model and its blueprint
Eureqa algorithms were developed by Nutonian, a company DataRobot acquired in 2017. Their idea was to develop a genetic algorithm that can fit different analytic expressions to trained data and return a formula as a machine learning model. This is a fundamentally different approach compared to traditional supervised machine learning models such as tree-based, regression, or deep learning. The approach has since been cited in over 800 peer-reviewed publications and used in applications ranging from finance to neuroscience.
In essence, Eureqa models are trained just like any other supervised machine learning algorithm. You provide the algorithm with labeled training data representing historic information and the algorithm will fit an analytic expression to that training data. Similar to other models on the Leaderboard, that expression is tested on both validation and holdout data.
Eureqa fits an analytic expression to your data in 3 steps:
Step 1: Eureqa takes in mathematical building blocks such as addition, subtraction, multiplication, or complex relationships such as natural logarithms or cosines.
Step 2: Eureqa conducts an evolutionary model search to find the best combination of the given mathematical building blocks that fit your data. Starting with a series of random expressions, the algorithm combines the best-fitting expressions with each other until it gradually finds a formula that fits your data.
Step 3: Eureqa applies a penalty in proportion to the complexity of the formula so as to prevent overfitting.
Figure 3. Eureqa builds models from training data in three steps
In order to demonstrate how to build a Eureqa model in DataRobot, we will predict the temperature of a motor given predictors like ambient temperature, speed and torque of the motor, etc. (Figure 4). Eureqa will fit an expression to the predictors and find the simplest analytical expression that predicts the target.
Figure 4. Data used to predict the temperature of a motor
As shown in Figure 5, DataRobot displays a number of Eureqa models on the Leaderboard after fitting the data described above.
Figure 5. DataRobot Eureqa models on the Leaderboard
It also plots all the formulas each model found in terms of their complexity (X-axis) and out-of-sample error (Y-axis) in Describe > Eureqa Models page. These formulas are the most accurate (lowest error) models with the least complexity (a measure of the size and mathematical complexity of the analytical model) that a given Eureqa model found, and are shown in the Models by Error vs. Complexity graph. You can click on any of the circles in that graph to see the corresponding analytic expression found (shown in the Selected Model Detail graph).
Figure 6. Sample Eureqa solution for the motor temperature prediction problem
In the motor example, the first and simplest model predicted the average motor temperature (leftmost red circle in the Models by Error vs. Complexity graph). Eureqa gradually fitted more complex formulas until it landed at the most complex model with the lowest error (rightmost green circle in Models by Error vs. Complexity graph). If you click on any of the circles in this graph, the Selected Model Detail graph shows the corresponding analytical expression and how well the data fits it. You can see that each model generated a simple, human-readable and human-interpretable analytical expression.
Let’s look at another example. We will model the acceleration of the lower bar of a double pendulum—specifically the position, velocity, and acceleration of the ends of both the upper bar and the lower bar (as shown in Figure 7).
Figure 7. The motion of a double pendulum
To get an idea of just how complex this motion is, observe the video below showing the motion of the double pendulum.
As the pendulum swung back and forth, the camera was logging the location and movement of each point. We imported that data into DataRobot, selected the target variable (i.e., the acceleration of the lower bar), and fit a Eureqa model to the recorded data.
For this task, we also gave DataRobot Eureqa models a series of mathematical building blocks to use. The models searched for different possible combinations of predictors and different combinations of building blocks, to fit the acceleration of the lower bar. Given enough time to train, the Eureqa models were able to find the real physical formula for the acceleration of the lower bar of a double pendulum (Figure 8).
Figure 8. Eureqa’s analytical expression for the acceleration of the lower bar of the double pendulum
There are a number of advantages to using Eureqa models:
They return human readable and interpretable analytic expressions, which are easily reviewed by subject matter experts. They also tend to deploy easily.
They are very good at feature selection because they are forced to reduce complexity during the model building process. For example, if the data had 20 different columns used to predict the target variable, the search for a simple expression would result in an expression that only uses the strongest predictors.
They work really well in small datasets and are so are very popular with scientific researchers who gather data from physical experiments that don’t produce massive amounts of data. (In such situations, traditional supervised machine learning models may be unable to learn.)
They provide an easy way to incorporate domain knowledge. If you know the underlying relationship in the system that you're modeling, you can actually give Eureqa a hint, e.g., the formula of the heat transfer or how house prices work in a particular neighborhood. You can give Eureqa that known relationship as a building block or a starting point to learn from. Eureqa will build machine learning corrections from there.
You can find a Jupyter notebook and dataset for this example in the DataRobot Community GitHub.
If you’re a licensed DataRobot customer, search the in-app documentation for Eureqa, then locate “Eureqa advanced tuning” for more information.