*Originally posted on 6/28/13 by Michael Schmidt*

*----------*

Eureqa’s Search Relation setting provides quite a bit of flexibility when searching for different types of models. This post describes some advanced techniques of using the Search Relation setting to specify custom error metrics for the search to optimize; or more specifically, arbitrary custom loss functions for the fitness calculation.

### Custom fitness using minimize difference

Eureqa has a built-in fitness metric named “minimize difference.” This fitness metric minimizes the *signed *difference between the left- and right-hand sides of the search relationship. For example, specifying:

y = f(x)

with the minimize difference fitness metric selected tells Eureqa to find an f(x) to minimize y – f(x). A trivial solution to this relationship would be f(x) = negative infinity. However, you can enter other relations that are more useful. Consider the follow search relation:

(y – f(x))^4 = 0

Here, the minimize difference fitness would minimize the 4th-power error. In Eureqa this setting looks like:

In fact, you can enter any such expression and the f(x) can appear multiple times. For example:

max( abs(y – f(x)), (y – f(x))^2 ) = 0

would minimize the maximum of the absolute error and squared error, at each data point in the dataset.

### Other methods

There are many other possible ways to alter the fitness metric using the search relationship setting. For example, you could use a normal fitness metric (e.g., absolute or squared error) but scale both sides of the relation. For example, you could wrap each side of the search relation with a sigmoid function like tanh:

tanh(y) = tanh( f(x) )

Now, both the left and right sides get squashed down to a tanh function (an s-shaped curve that ranges -1 to 1) before being compared. This effectively caps large errors, reducing their impact on the fitness.

### Even more tricks

You can also use the search relationship to forbid certain values by exploiting *NaN *values (NaN = Not a Number). For example, consider the following search relation, which forbids models with negative values:

y = f(x) + 0*log( f(x) )

Notice the unusual 0*log(f(x)) term. Whenever f(x) is positive, the log is real-valued, and the multiplication with zero reduces the expression to y = f(x). However, whenever f(x) is negative, log(f(x)) is undefined, and produces a *NaN *value. Whenever a *NaN* appears in the fitness calculation, Eureqa automatically assigns the solution infinite error. Therefore, this search relationship tells Eureqa to find an f(x) that models y, but f(x) must be positive on each point in the dataset.

This behavior can be used in other ways as well. Any operation that would produce an IEEE floating point *NaN*, undefined, or infinity will trigger Eureqa to assign infinite error. You can also add multiple terms like this to place multiple different constraints on solutions.