You can paste, view, and edit your data in the far-left tab. Entering data is very similar to using an ordinary spreadsheet application.
The Eureqa application uses a special layout:
You can paste data into cells from many applications that contain spreadsheets such as Microsoft Excel, MATLAB's array editor, or any tab-separated-value text file.
Double-click on a cell to edit its value. You may also enter simple expressions into a cell to generate additional variables. For example, = x + sin(y) will fill the entire column with the numerical result of that expression on each row using the current variable symbols and values.
If your data is partitioned into discontinuous parts (e.g., two or more independent time series or experiments), they should be separated by a blank row. This tells the program not to smooth or differentiate across discontinuous data points.
This is an optional step where the Eureqa application can automatically smooth your data.
Smoothing data can greatly improve both the speed and the likelihood of finding accurate solutions with the Eureqa formula search. For a user to decide whether to smooth, they must have enough knowledge about their data to determine whether variables are, in fact, smooth signals combined with noise.
To smooth a variable in Eureqa:
Eureqa picks the best smooth using generalized cross-validation among cubic b-splines. If your data can benefit from more sophisticated pre-processing, you are encouraged to do so in another application of your choice and copy the result into Eureqa by hand.
Here you control what type of formula to search for, and how to search for it.
Search for a formula f( ) such that
Edit this expression to specify the type of relationship you want to model. For example, if you want to model the variable z as a function of x and y, you would enter:
z = f(x, y)
To search for a differential equation, you can use the D(x,y) command. For example, to find an ordinary differential equation for y as a function of y, you could enter:
D(y,t) = f(y)
You may also enter more complex target forms. For example, entering z = f(x) + f(y) indicates that you would like to find a function f that is evaluated on both x and y, then added together to model z.
Fitness metric
This specifies what type of error to measure when comparing and optimizing solutions. For example, you may wish to minimize squared error if your data has normally distributed noise, or logarithmic error if it contains many outliers.
The list below describes some of the fitness metrics available in Eureqa. All fitness metrics are normalized based on the target values in the dataset.
Order data points by
This is the variable Eureqa will use to plot your data against by default, and the order used for calculating derivatives if any are used.
Weight errors by
This is the variable each data point is weighted by in the fitness metric.
Using building blocks
Eureqa searches for formulas by combining mathematical building blocks (e.g., add, subtract, multiply, divide). You can limit the set of building blocks that the algorithm uses by checking and unchecking the various built-in operations.
The table below describes some of the Eureqa formula building blocks:
Name |
Usage |
Comments |
constant |
1.234 |
Allows solutions to use numeric constants. |
add |
x + y or add(x,y) |
|
subtract |
x - y or sub(x,y) |
|
multiply |
x * y or mul(x,y) |
|
divide |
x / y or div(x,y) |
y must be non-zero. |
square root |
sqrt(x) |
Returns x^0.5 ,where x must be positive. |
exponential |
exp(x) |
Returns e^x. |
logarithm |
log(x) |
This is the natural logaritm (base e). |
sine |
sin(x) |
The angle is in radians. |
cosine |
cos(x) |
The angle is in radians. |
tangent |
tan(x) |
The angle is in radians. |
absolute value |
abs(x) |
Returns the positive value of x. |
power |
x ^ y or pow(x,y) |
x and y could be any expression. |
power to constant |
powc(x,c) |
Provides a restricted form of the power building block. x can be any expression, c must be a constant. |
time delay |
delay(x,c) |
Returns the value of expression x at c time units in the past. x can be any expression, c must be a positive constant. |
time delay of variable |
delay_var(v,c) |
Provides a restricted form of the delay building block. Returns the value of variable v at c time units in the past. v must be a variable, c must be a positive constant. |
simple moving average |
sma(v,c) or sma_var(v,c) |
Returns the average of the data points within the past c time units. v must be a variable, c must be a positive constant. |
time integral |
integral(x) |
Returns the trapezoidal sum of the expression x, starting at 0 up to the current data point. |
step function |
step(x) |
Returns 1 if x is positive, zero otherwise. |
sign function |
sign(x) |
Returns -1 if x is negative, +1 if x is positive, and 0 if x is zero. |
logistic function |
logistic(x) |
This is a common sigmoid squashing function. Returns 1/(1+ exp(-x)). |
hill function |
hill2(x) |
This is a common saturation function. Returns x^2/(1 + x^2). x must be non-zero. |
gamma function |
gamma(x) |
This is a continuous version of the factorial. It returns the fast approximation pow((x/e)*sqrt(x*sinh(1/x)),x)*sqrt(2*pi/x). x must be non-zero. |
gaussian function |
gauss(x) |
This is a bell-shaped squashing function. Returns exp(-x^2). |
minimum |
min(x,y) |
Returns the minimum (signed) result of x and y for the data point. |
maximum |
max(x,y) |
Returns the maximum (signed) result of x and y for the data point. |
modulo |
mod(x,y) |
Returns the remainder of x/y. |
floor |
floor(x) |
Returns an integer of x rounded down toward -infinity. |
ceiling |
ceil(x) |
Returns an integer of x rounded up toward +infinity. |
less than |
less(x,y) |
Returns 1 if x < y; returns 0 otherwise. |
equal to |
equal(x,y) |
Returns 1 if x equals y numerically; returns 0 otherwise. |
boolean and |
and(x,y) |
Returns 1 if both x and y are greater than 0; returns 0 otherwise. |
boolean or |
or(x,y) |
Returns 1 if either x or y are greater than 0; returns otherwise. |
boolean xor |
xor(x,y) |
Returns 1 if (x <= 0 and y > 0) or (x > 0 and y <= 0), 0 otherwise. |
boolean not |
not(x) |
Returns 0 if x is greater than 0, 1 otherwise. |
inverse sine |
asin(x) |
x must be between -1 and +1. |
inverse cosine |
acos(x) |
x must be between -1 and +1. |
inverse tangent |
atan(x) |
|
inverse tangent (2argument) |
atan2(y,x) |
Returns atan(y/x), respecting the quadrant and sign of the vector. x and y cannot both be zero. |
hyperbolic sine |
sinh(x) |
|
hyperbolic cosine |
cosh(x) |
|
hyperbolic tangent |
tanh(x) |
This is a common squashing function. Returns a value between -1 and +1. |
inverse hyperbolic sine |
asinh(x) |
|
inverse hyperbolic cosine |
acosh(x) |
|
inverse hyperbolic tangent |
atanh(x) |
|
Limiting the building blocks implies some expert knowledge from the user. For example, the user may know that a chemical reaction is unlikely to use trigonometric terms.
Limiting the set of building blocks can greatly improve the speed and likelihood that Eureqa finds an exact solution. However, disabling too many building blocks could preclude the search from finding the exact solution if a necessary operation is disabled.
Using servers
Eureqa is designed to use multiple computers when searching for solutions. This option allows you to specify additional computers that are running the Eureqa Stand-alone Server.
Eureqa servers running on the local network will be listed automatically. You may need to enter other servers manually by right-clicking on the server list and clicking the Add hostname... menu item.
This view allows you to start, pause, and stop the formula search, as well as monitor various performance and progress statistics of the search.
Click on the Start button to begin the search.
After starting, Eureqa will attempt to connect to and initialize the computers selected in the settings view. Important messages will be displayed in the Log messages and events text window.
The Progress and performance statistics window shows several important statistics such as the time duration of the search, the number of servers connected, and the performance speed of the search.
The main plot to the right shows the fitness metric of the best solution Eureqa was able to find since the search began. Long plateaus in progress may indicate that the search for simple explanations of the data has been exhausted.
This view shows the best solutions Eureqa finds in real time.
The best solutions are determined by two factors: their complexity (Size) and their accuracy (Error) on the validation data. Those listed in the List of current best solutions window have the highest accuracy for various complexities/sizes of solutions.
The Training Data is a subset of your data that is being used by the Eureqa algorithm to search for solutions. The Validation Data is a second subset that is only used by this window in order to select the best solutions to display/report to the user.
Several tools are also available for further analysis of the selected solution. Capabilities include finding the global maximum, generating a report of solutions, and suggesting new experiments and data to collect.
Great post!
Hello,
I am a student and I want to buy a license for Eureqa. I contacted with sales department, but they advised me to ask for help here in this section.
Please advice me in this regard.
Thanks!