Skip to content

Hyperparameter optimization

Robin van de Water edited this page Aug 21, 2023 · 5 revisions

To understand how a parameter can be automatically tuned via bayesian optimization, let's look at the following example configuration:

...
# Optimizer params
optimizer/hyperparameter.class_to_tune = @Adam
optimizer/hyperparameter.weight_decay = 1e-6
optimizer/hyperparameter.lr = (1e-5, 3e-4)

# Encoder params
model/hyperparameter.class_to_tune = @LSTMNet
model/hyperparameter.num_classes = %NUM_CLASSES
model/hyperparameter.hidden_dim = (32, 256)
model/hyperparameter.layer_dim = (1, 3)

tune_hyperparameters.scopes = ["model", "optimizer"]  # defines the scopes that the random search runs in
tune_hyperparameters.n_initial_points = 5  # defines random points to initilaize gaussian process
tune_hyperparameters.n_calls = 30  # numbe rof iterations to find best set of hyperparameters
tune_hyperparameters.folds_to_tune_on = 2  # number of folds to use to evaluate set of hyperparameters

In this example, we have the two scopes model and optimizer, the scopes take care of adding the parameters only to the pertinent classes. For each scope a class_to_tune needs to be set to the class it represents, in this case LSTMNet and Adam respectively. We can add whichever parameter we want to the classes following this syntax:

tune_hyperparameters.scopes = ["<scope>", ...]
<scope>/hyperparameter.class_to_tune = @<SomeClass>
<scope>/hyperparameter.<param> = ['list', 'of', 'possible', 'values']

If we run experiments and want to overwrite the model configuration, this can be done easily:

include "configs/tasks/Mortality_At24Hours.gin"
include "configs/models/LSTM.gin"

optimizer/hyperparameter.lr = 1e-4

model/hyperparameter.hidden_dim = [100, 200]

This configuration for example overwrites the lr parameter of Adam with a concrete value, while it only specifies a different search space for hidden_dim of LSTMNet to run the random search on.

The same holds true for the command line. Setting the following flag would achieve the same result (make sure to only have spaces between parameters):

-hp optimizer/hyperparameter.lr=1e-4 model/hyperparameter.hidden_dim='[100,200]'

There is an implicit hierarchy, independent of where the parameters are added (model.gin, experiment.gin or CLI -hp):

LSTM.hidden_dim = 8                         # always takes precedence
model/hyperparameter.hidden_dim = 6         # second most important
model/hyperparameter.hidden_dim = (4, 6)    # only evaluated if the others aren't found in gin configs and CLI

The hierarchy CLI -hp > experiment.gin > model.gin is only important for bindings on the same "level" from above.

Hyperparameters were chosen to be mostly identical to the HiRID benchmark, to improve comparability and for reasons of reproducibility. However, we have chosen to allow for continuous ranges of hyperparameters in some cases, to improve the performance and functionality of YAIB. Log-uniform means that the parameters are sampled according to the reciprocal distribution: $$f(x;a,b)={\frac {1}{x[\log _{e}(b)-\log _{e}(a)]}}\quad {\text{ for }}a\leq x\leq b{\text{ and }}a&gt;0.$$ Uniform means that the parameters are sampled according to the uniform distribution:

$$f(x)={\begin{cases}{\frac {1}{b-a}}& \mathrm {for} \ a\leq x\leq b,\\mathrm {for} \ x<a\ \mathrm {or} \ x>b\end{cases}}$$

Clone this wiki locally