Hyperparameter Optimization

We look at how you can use Dotscience to explore relationships between hyperparameters & metrics

Create a new project in Dotscience and open JupyterLab. See this tutorial on creating projects and working with the Dotscience hosted JupyterLab for more information on that.

For this tutorial we’re going to tune hyperparameters to optimise the precision and recall on an sklearn dataset. The notebook for this tutorial can be found at https://github.com/dotmesh-io/demos/blob/master/sklearn-gridsearch/grid-search.ipynb.

Download the demos repository with

git clone https://github.com/dotmesh-io/demos.git

Navigate to your project on Dotscience, open a JupyterLab session and upload the notebook file grid-search.ipynb from the git repository above. It can be found at demos/sklearn-gridsearch/grid-search.ipynb

At the start of the notebook, we import the dotscience python library and instrument our training with it. And if you look closely at the notebook, you will notice that as we iterate though a collection of scores to optimise them, we record the summary statistics with ds.add_summary("param", value) for all the params involved.

for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    ds.add_summary("%s-stdev" % (score,), std)
    ds.add_summary("%s-mean" % (score,), mean)
    print("%0.3f (+/-%0.03f) for %r"
          % (mean, std * 2, params))

Run the notebook by clicking run -> all cells

When the run completes, navigate to the Runs tab to see the summary of all the runs. Clicking on each run will show the provenance of the run.

Now, go to the Explore tab, you can see a graphical representation of the optimisation we did earlier.

The screen capture above shows the behaviour of the summary statistic precision-mean. From this we can draw conclusions about how the each change to the hyperparameters affected the summary statistic. Clicking on an individual data point, takes us to the run that was associated with that change.

You can also toggle the views between multiple optimisations by selecting it from the Summary statistic field.

We have demonstrated Hyperparameter tuning on a simple machine learning model using an Sklearn grid search. You can visualise the effect of tuning the params on the graph and specifically zoom into runs where the summary statistics go off the charts.

Note the problem is in a sense too easy: the hyperparameter plateau is too flat and the output model is the same for precision and recall with ties in quality. Nevertheless, the principle is demonstrated of using Python code for hyperparameter optimization, augmented by the ds functions of the Dotscience Python library to automatically record versioning, provenance, parameters, and metrics within the system.