ELI5: What is GridSearch?

matt_brems · ‎07-05-2022

In machine learning, GridSearch is a tool that is used to find a model that performs well by finding a good set of hyperparameter values. But what does this mean in an intuitive, easy-to-understand way?

An analogy: baking cookies!

Let’s start with an analogy: Let’s say that you’re baking cookies and you want them to taste as good as they possibly can. To keep it simple, let’s say you have exactly two ingredients: flour and sugar. (Realistically, you need more ingredients but just go with me for now.)

How much flour do you add? How much sugar do you add? Maybe you look up recipes online, but they’re all telling you different things. There’s not some magical, perfect amount of flour and sugar that you can just look up online.

So, what do you decide to do? Your strategy is to bake many batches of cookies where each batch has a different amount of flour and sugar in it. Then, you can “taste test” each batch to see what tastes best.

To make this explicit, let’s say that you try having:

1 cup, 2 cups, and 3 cups of sugar.
3 cups, 4 cups, and 5 cups of flour.

In order to see which recipe makes the best cookies, you have to test each possible combination of sugar and flour. You need to try making cookies with 1 cup of sugar and 3 cups of flour, 1 cup of sugar and 4 cups of flour, and 1 cup of sugar and 5 cups of flour, 2 cups of sugar and 3 cups of flour, and so on.

Visualizing GridSearch

A really helpful way to organize this is to draw a grid. Draw a 3x3 grid, kind of like you’re playing the game tic-tac-toe:

Above the first column, put 1 cup of sugar. Above the second and third columns, put 2 and 3 cups of sugar, respectively. (Depending on how you draw this, it might look like you added a fourth row here.)

To the left of the first row, put 3 cups of flour. Put 4 and 5 cups of flour to the left of the second and third rows, respectively.

	1 cup of sugar	2 cups of sugar	3 cups of sugar
3 cups of flour
4 cups of flour
5 cups of flour

Then, fill in each of the squares of the grid with the amounts of sugar and flour corresponding to that row and column.

1 cup of sugar

2 cups of sugar

3 cups of sugar

3 cups of flour

1 cup of sugar &

3 cups of flour

2 cups of sugar &

3 cups of flour

3 cups of sugar &

3 cups of flour

4 cups of flour

1 cup of sugar &

4 cups of flour

2 cups of sugar &

4 cups of flour

3 cups of sugar &

4 cups of flour

5 cups of flour

1 cup of sugar &

5 cups of flour

2 cups of sugar &

5 cups of flour

3 cups of sugar &

5 cups of flour

Notice how this looks like a grid. You are searching this grid for the best combination of sugar and flour. The only way for you to get the best-tasting cookies is to bake cookies with all of these combinations, “taste test” each batch, and decide which batch is best! If you skipped some of the combinations, then it’s possible you’ll miss the best-tasting cookies.

As you can see, you have 2 sets of ingredients (sugar and flour), each with 3 levels:

[3 levels of sugar] * [3 levels of flour] = 9 batches of cookies

Now, what happens when you’re in the real world and you have more than two ingredients? For example, you also have to decide how many eggs to include. Well, your “grid” now becomes a 3-dimensional grid. If you decide between 2 eggs and 3 eggs, then you need to try all nine combinations of sugar & flour for 2 eggs, and you need to try all nine combinations of sugar & flour for 3 eggs:

[3 levels of sugar] * [3 levels of flour] * [2 levels of eggs] = 18 batches of cookies

Obviously, the more of the different types of ingredients you include (e.g. sugar, flour, eggs), the more combinations you have to choose. Also, the more levels of ingredients (e.g. 3 levels for sugar, 3 levels for flour, 2 levels for eggs) you include, the more combinations you have to choose.

So, how does this apply to machine learning?

When you build models, you have lots of choices to make. Some of these choices are called hyperparameters. For example, if you build a random forest, you need to choose things like:

How many decision trees do you want to include in your random forest?
How deep can each individual decision tree grow?
At least how many samples must be in the final “node” of each decision tree?

We call the process of finding the best values of hyperparameters “model tuning.”

The way you test this is just like how you taste-tested all of those different batches of cookies.

You pick which hyperparameters you want to search over. (I listed three above.)
You pick what levels of each hyperparameter you want to search.
You then fit a model separately for each combination of hyperparameter values.
Now it’s time for the “taste test!” You measure each model’s performance using some metric like accuracy or root mean squared error.
Just like your recipe would be the one that gave you the best-tasting cookies, you pick the set of hyperparameters that gave you the best-performing model.

Just like with ingredients, the number of hyperparameters and number of levels you search are important.

From earlier, you saw that we took the number of levels of each hyperparameter that we wanted to test and multiplied those numbers together. That’s the formula for finding how many models you’re fitting via GridSearch. For example, if you want to search 5 hyperparameters, each with 4 different levels, then you’re building:

4 * 4 * 4 * 4 * 4 = 4^5 = 1,024 models

Building models can be time-consuming! If you try too many hyperparameters and too many levels of each hyperparameter, you might get a really high-performing model but it might take a really, really, really long time to get! (You can also run the risk of something called overfitting a model to the training data.)

So, what is GridSearch?

GridSearch is a commonly-used technique in machine learning that is used to find the best set of hyperparameters for a model.

How does DataRobot do GridSearch?

If you were building these models on your own, you would likely have to do this process manually. However, DataRobot will do this for you!

For each of the models on the Leaderboard that contain tunable hyperparameters, DataRobot will automatically search over pre-determined levels of hyperparameters for you. DataRobot is leveraging what we’ve learned by fitting more than 1 billion models — yes, that’s billion with a B! — to help you get the best-performing model as quickly as possible.

DataRobot does this via a smart pattern search. Rather than exhaustively searching every combination of hyperparameter values, DataRobot intelligently emphasizes areas where the model is likely to do well and skips hyperparamet....

As an advanced option, DataRobot also permits you to experiment with your own hyperparameter settings with manual hyperparameter tuning. You can learn how to manually tune your hyperparameters in DataRobot here.

Is GridSearch the same thing as RandomizedSearch?

No! RandomizedSearch is an alternative to GridSearch, with the same goal: RandomizedSearch searches multiple values of hyperparameters to identify a high-performing model.

The difference between GridSearch and RandomizedSearch is the grid. With GridSearch, you specified which levels of hyperparameters you wanted to check, then you checked every possible combination. A grid is a helpful way of visualizing every combination of hyperparameter values that you are going to test.

RandomizedSearch is different. With RandomizedSearch, rather than setting up a grid to check, you might simply specify a range of each hyperparameter. For example, somewhere between 1 and 3 cups of sugar, somewhere between 3 and 5 cups of flour. Then, a computer will randomly generate, say, 5 combinations of sugar and flour to test. One example is:

Batch A: 1.2 cups of sugar & 3.5 cups of flour.
Batch B: 1.7 cups of sugar & 3.1 cups of flour.
Batch C: 2.4 cups of sugar & 4.1 cups of flour.
Batch 😧 2.9 cups of sugar & 3.9 cups of flour.
Batch E: 2.6 cups of sugar & 4.8 cups of flour.

RandomizedSearch would then “taste test” each batch and return the cookies that taste the best — or the model that performs the best! There are advantages and disadvantages to each approach. You can learn more about the differences between RandomizedSearch and GridSearch here.