# Suggested Protocol¶

Here we provide a suggested protocol for more rigorously benchmarking deep learning optimizer. It follows the same steps as the baseline results presented in the DeepOBS paper

## Create new Run Script¶

In order to benchmark a new optimization method a new run script has to be written. A more detailed description can be found in the Simple Example and the the API section for the Standard Runner, but all is needed is the optimizer itself and a list of its hyperparameters. For example for the Momentum optimizer this will be.

 1 2 3 4 5 6 7 8 9 import tensorflow as tf import deepobs.tensorflow as tfobs optimizer_class = tf.train.MomentumOptimizer hyperparams = [{"name": "momentum", "type": float}, {"name": "use_nesterov", "type": bool, "default": False }] runner = tfobs.runners.StandardRunner(optimizer_class, hyperparams) runner.run(train_log_interval=10) 

## Repeated Runs with best Setting¶

In order to get a sense of the optimziers consistency, we suggest repeating runs with the best hyperparameter setting multiple times. This allows an assessment of the variance of the optimizer's performance.

For the baselines we determined the best learning rate looking at the final performance of each run, which can be done using

deepobs_plot_results results/ --get_best_run


and then running the best performing setting again using ten different random seeds.

## Plot Results¶

To visualize the final results it is sufficient to run

deepobs_plot_results results/ --full


This will show the performance plots for the small and large benchmark set

as well as the learning rate sensitivity plot

and the overall performance table

If the path to the baseline folder is given, DeepOBS will automatically compare the results with the baselines for SGD, Momentum, and Adam.

For all plots, .tex files will be generated with pgfplots-code for direct inclusion in academic publications.