Popular Algorithms for Automatic Model Tuning
Hyperparameter tuning is an important part of machine learning model development process. Sometimes, we called this as hyperparameter optimization. This is a method that entails searching for the best configuration of hyperparameters to enable optimal performance of a ML model. Machine learning algorithms require user-defined inputs to achieve a balance between accuracy and generalizability. This process is known as hyperparameter tuning. There are various tools and approaches available to tune hyperparameters.
Content Overview
“If you can’t measure it you can’t improve it.”
–Peter Drucker
This certainly applies to machine learning as well. An important step in model development is to evaluate the final model with another holdout dataset called test dataset that your model has never seen before. These final model metrics can be used to compare and contrast competing models. Typically, the higher this final score is, the better is the ability of the model to generalize.
Manual vs Automatic Model Tuning
When you start working on a new ML model, you’re most likely to start with manually selecting hyperparameter values depending on the algorithm that you choose for your use case. For popular algorithms and use cases, you can generally find great guidance on the values of hyperparameters to use from the data science and the research community. This is a great starting point and helps you build your own intuition over time. Once you have validated your choices of algorithm, code, and dataset to solve your machine learning use cases, you can leverage automatic model tuning to fine tune your hyperparameters to find the best performing values.
Below are four popular algorithms for automatic model tuning.
- Grid search
- Random search
- Bayesian optimization
- Hyperband
Grid Search
The first approach is grid search. Here, to tune your model, you start by defining available hyperparameter sets that include both the name of the hyperparameter and the range of values you want to explore for the hyperparameter. The grid search algorithm tests every combination by training the model on each of the hyperparameters and selecting the best possible parameters. Advantage of the grid search is that it allows you to explore all possible combinations.
This idea works really well when you have a small number of hyperparameters and a small range of hyperparameter values to explore for these hyperparameters. However, when the number of hyperparameters increases or the range of values that you want to explore for these hyperparameters increases, this could become very time consuming. The grid search does not scale well to large number of parameters. To address this issue, you can use random search.
Random search
In random search, once again, you start by defining the available hyperparameter sets that consists of the name of the hyperparameters and the values that you want to explore. Here, the algorithm, instead of searching for every single combination, picks random hyperparameter values to explore in the defined search space. Additionally, you can also define stop criteria, such as the time elapsed or the maximum number of trainings to be completed. Once the stop criteria is met, you select the best performing set of hyperparameters from the trained models available so far. An advantage of random search is that it is much more faster when compared to the grid search.
However, due to the randomness involved in the search process, this algorithm might miss the better performing hyperparameters. When you apply the concept of hyperparameter tuning to classification and regression models, it is very similar to finding the best possible model parameters by minimizing the loss function. You might be asking, why can’t we apply the same process to hyperparameters as well? That is the idea behind Bayesian optimization, which is our next algorithm.
Bayesian optimization
In Bayesian optimization, hyperparameter tuning is treated as a regression problem. The hyperparameter values are learned by trying to minimize the loss function of a surrogate model. Here, the algorithm starts with random values for the hyperparameters and continuously narrows down the search space by using the results from the previous searches. The strength of Bayesian optimization is that the algorithm is much more efficient in finding the best possible hyperparameters because it continues to improve on the results from previous searches. However, this also means that the algorithm requires a sequential execution. There is also a possibility that Bayesian optimization could get stuck in a local minima, which is a very prominent problem when you use techniques like gradient descent for minimizing a loss function.
Hyperband
The final algorithm that I will discuss today is a hyperband, which is a relatively new approach towards hyperparameter tuning. Hyperband is based on bandit approach. Bandit approaches typically use a combination of exploitation and exploration to find the best possible hyperparameters. The strength of the bandit approaches is that dynamic pull between exploitation and exploration. When applied to the hyperparameter tuning problem space, this is how the bandit-based hyperband algorithm works. You start with the larger space of random hyperparameter set and then you explore a random subset of these hyperparameters for a few iterations.
After the first few iterations, you discard the worst performing half of the hyperparameter sets. In the subsequent few iterations, you continue to explore the best performing hyperparameters from the previous iteration. You continue this process until the set time is elapsed or you remain with just one possible candidate. Hyperband clearly stands out by spending the time much more efficiently than other approaches we discussed to explore the hyperparameter values using the combination of exploitation and exploration. On the downside, it might discard good candidates very early on and these could be the candidate that converge slowly. That is a wrap on discussion of the four popular hyperparameter tuning algorithms that can help you automate the hyperparameter tuning process.