This paper provides a useful overview of Bayesian optimization methods that are currently in use as well as more speculative methods. While this is interesting, from the perspective of using Bayesian optimization hyperparameter search for machine learning algorithms, the main value of this paper is a list of software packages.
There are a variety of codes for Bayesian optimization and Gaussian process regression. Several of these Gaussian process regression and Bayesian optimization packages are developed together, with the Bayesian optimization package making use of the Gaussian process regression package. Other packages are standalone, providing only either Gaussian process regression support or Bayesian optimization support. We list here several of the most prominent packages, along with URLs that are current as of June 2018.
- DiceKriging and DiceOptim are packages for Gaussian process regression and Bayesian optimization respectively, written in R. They are described in detail in Roustant et al. (2012) and are available from CRAN via https://cran.r-project.org/web/packages/DiceOptim/index.html.
- GPyOpt (https://github.com/SheffieldML/GPyOpt) is a python Bayesian optimization library built on top of the Gaussian process regression library GPy (https://sheffieldml.github.io/GPy/) both written and maintained by the machine learning group at Sheffield University.
- Metrics Optimization Engine (MOE, https://github.com/Yelp/MOE) is a Bayesian optimization library in C++ with a python wrapper that supports GPU-based computations for improved speed. It was developed at Yelp by the founders of the Bayesian optimization startup, SigOpt (http://sigopt.com). Cornell MOE (https://github.com/wujian16/Cornell-MOE) is built on MOE with changes that make it easier to install, and support for parallel and derivative-enabled knowledge gradient algorithms.
- Spearmint (https://github.com/HIPS/Spearmint), with an older version under a different license available at https://github.com/JasperSnoek/spearmint, is a python Bayesian optimization library. Spearmint was written by the founders of the Bayesian optimization startup Whetlab, which was acquired by Twitter in 2015 Perez (2015).
- DACE (Design and Analysis of Computer Experiments) is a Gaussian process regression library written in MATLAB, available at http://www2.imm.dtu.dk/projects/dace/. Although it was last updated in 2002, it remains widely used.
- GPFlow (https://github.com/GPflow/GPflow) and GPyTorch (https://github.com/cornellius-gp/gpytorch) are python Gaussian process regression library built on top of Tensorflow (https://www.tensorflow.org/) and PyTorch (https://pytorch.org/) respectively.
- laGP (https://cran.r-project.org/web/packages/laGP/index.html) is an R package for Gaussian process regression and Bayesian optimization with support for inequality constraints.
Below is the abstract of A Tutorial on Bayesian Optimization.
Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.