Custom error metrics and special search relations in Eureqa
Originally posted on 6/28/13 by Michael Schmidt
Eureqa’s Search Relation setting provides quite a bit of flexibility when searching for different types of models. This post describes some advanced techniques of using the Search Relation setting to specify custom error metrics for the search to optimize; or more specifically, arbitrary custom loss functions for the fitness calculation.
Custom fitness using minimize difference
Eureqa has a built-in fitness metric named “minimize difference.” This fitness metric minimizes thesigneddifference between the left- and right-hand sides of the search relationship. For example, specifying:
y = f(x)
with the minimize difference fitness metric selected tells Eureqa to find anf(x) to minimizey – f(x). A trivial solution to this relationship would bef(x) = negative infinity. However, you can enter other relations that are more useful. Consider the follow search relation:
(y – f(x))^4 = 0
Here, the minimize difference fitness would minimize the 4th-power error. In Eureqa this setting looks like:
In fact, you can enter any such expression and thef(x) can appear multiple times. For example:
max( abs(y – f(x)), (y – f(x))^2 ) = 0
would minimize the maximum of the absolute error and squared error, at each data point in the dataset.
There are many other possible ways to alter the fitness metric using the search relationship setting. For example, you could use a normal fitness metric (e.g., absolute or squared error) but scale both sides of the relation. For example, you could wrap each side of the search relation with a sigmoid function liketanh:
tanh(y) = tanh( f(x) )
Now, both the left and right sides get squashed down to a tanh function (an s-shaped curve that ranges -1 to 1) before being compared. This effectively caps large errors, reducing their impact on the fitness.
Even more tricks
You can also use the search relationship to forbid certain values by exploitingNaNvalues (NaN = Not a Number). For example, consider the following search relation, which forbids models with negative values:
y = f(x) + 0*log( f(x) )
Notice the unusual0*log(f(x)) term. Wheneverf(x) is positive, the log is real-valued, and the multiplication with zero reduces the expression toy = f(x). However, wheneverf(x) is negative,log(f(x)) is undefined, and produces aNaNvalue. Whenever aNaNappears in the fitness calculation, Eureqa automatically assigns the solution infinite error. Therefore, this search relationship tells Eureqa to find anf(x) that modelsy, butf(x) must be positive on each point in the dataset.
This behavior can be used in other ways as well. Any operation that would produce an IEEE floating pointNaN, undefined, or infinity will trigger Eureqa to assign infinite error. You can also add multiple terms like this to place multiple different constraints on solutions.