An effective algorithm for hyper-parameter optimization of neural networks
This post is a synopsis of my understanding of the RBFOpt article described by Diaz et al.
In this paper, the authors formulate the hyperparameter optimization problem for neural networks as a box-constrained problem (the link is the pointer to a paper that talks about it. Not sure if its the most apt citation), and claim that this is empirically a better representation than the normal formulation (). They describe an extention to derivative free optimization algorithms implemented in RBFOpt: a radial basis function model with thin plate splines, combined with a polynomial tail of degree 1.
For their objective, they consider two different metrics to maximize, and consider a weighted sum of these:
- Distance of a point from previously computed values. (Exploration)
- Performance of a point in terms of the surrogate function (the RBF model with splines). (Exploitation)
The weight will indicate how much emphasis is to be placed on one over the other. The weight follows a cyclic strategy that will alternate between exploration and exploitation over the number of iterations of hyperparameter optimization.
To determine the best for the surrogate function, they employ a round of GA to find the surrogate function value at different points.
I am as of yet, unclear about how they use RBFOpt here. I guess more information should be found in this paper. I will need to read and understand this before I can make any final comments.