Hyperparameter Search With GPyOpt: Part 1 – Scikit-learn Classification and Ensembling
Hyperparameter Search With GPyOpt: Part 2 – XGBoost Classification and Ensembling
1. GPyOpt Python Package
GPyOpt is a Python open-source library for Bayesian Optimization developed by the Machine Learning group of the University of Sheffield. It is based on GPy, a Python framework for Gaussian process modelling.
In this article, we demonstrate how to use this package to perform hyperparameter search for a classification problem with Keras.
2. Using GPyOpt
Below are code fragments showing how to integrate the package with Keras.
We begin by specifying if the problem is one of minimization or maximization. Here, we are again using the MNIST handwritten digits data set for our classification problem. Thus we wish to minimize the validation categorical crossentropy loss by setting maximize_bool = False.
Then we set the search space as a single element list of dictionaries in which we label the variable names, set variable types to continuous or discrete, and specify their range of values.
search_space = [ {'name':'num_cnn_blocks', 'type':'discrete', 'domain':(2,3,4)}, {'name':'num_filters', 'type':'discrete', 'domain':(16,32,48,64)}, {'name':'kernel_size', 'type':'discrete', 'domain':(2,3,4)}, {'name':'num_dense_nodes', 'type':'discrete', 'domain':(64,96,128,160,192,224,256)}, {'name':'dense_nodes_divisor', 'type':'discrete', 'domain':(2,4,8)}, {'name':'batch_size', 'type':'discrete', 'domain':(32,64,96,128)}, {'name':'drop_out', 'type':'continuous', 'domain':(0.05, 0.5)}]
Now we create our CNN:
def cnn_function(params): model_name = ml_algo_name + '_' + str(np.random.uniform(0,1,))[2:9] dict_params = {'num_cnn_blocks':int(params[0][0]), 'num_filters':int(params[0][1]), 'kernel_size':int(params[0][2]), 'num_dense_nodes':int(params[0][3]), 'dense_nodes_divisor':int(params[0][4]), 'batch_size':int(params[0][5]), 'drop_out':params[0][6]} input_tensor = Input(shape=input_shape) x = BatchNormalization()(input_tensor) x = Activation('relu')(x) x = Conv2D(filters=dict_params['num_filters'], kernel_size=dict_params['kernel_size'], strides=1, padding='same')(x) x = Dropout(dict_params['drop_out'])(x) for iblock in range(dict_params['num_cnn_blocks'] - 1): x = BatchNormalization()(x) x = Activation('relu')(x) x = Conv2D(filters=dict_params['num_filters'], kernel_size=dict_params['kernel_size'], strides=1, padding='same')(x) x = MaxPooling2D()(x) x = Dropout(dict_params['drop_out'])(x) x = Flatten()(x) x = Dense(dict_params['num_dense_nodes'], activation='relu')(x) x = Dropout(dict_params['drop_out'])(x) x = Dense(dict_params['num_dense_nodes']//dict_params['dense_nodes_divisor'], activation='relu')(x) output_tensor = Dense(number_of_classes, activation='softmax')(x) cnn_model = Model(inputs=input_tensor, outputs=output_tensor) opt = Adam(lr=0.00025) cnn_model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy']) callbacks_list = [EarlyStopping(monitor='val_loss', patience=early_stop_epochs), ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=learning_rate_epochs, verbose=0, mode='auto', min_lr=1.0e-6), ModelCheckpoint(filepath=results_dir + model_name + '.h5', monitor='val_loss', save_best_only=True)] h = cnn_model.fit(x=xcalib, y=ycalib, batch_size=dict_params['batch_size'], epochs=maximum_epochs, validation_split=0.25, shuffle=True, verbose=0, callbacks=callbacks_list) # record actual best epochs and valid loss here, added to bayes opt parameter df below list_early_stop_epochs.append(len(h.history['val_loss']) - early_stop_epochs) validation_loss = np.min(h.history['val_loss']) list_validation_loss.append(validation_loss) list_saved_model_name.append(model_name) return validation_loss
params is the search space list of dictionaries. Note that the second element must match the order specified in search_space.
We create our Bayesian optimization model
gpyopt_bo = GPyOpt.methods.BayesianOptimization(f=cnn_function, domain=search_space, model_type='GP', initial_design_numdata=num_random_points, initial_design_type='random', acquisition_type='EI', normalize_Y=True, exact_feval=False, acquisition_optimizer_type='lbfgs', model_update_interval=1, evaluator_type='sequential', batch_size=1, num_cores=os.cpu_count(), verbosity=True, verbosity_model=False, maximize=maximize_bool, de_duplication=True)
We are using Gaussian processes, with expected improvement acquisition function and the limited memory Broyden–Fletcher–Goldfarb–Shanno (lbfgs) optimization algorithm. See https://gpyopt.readthedocs.io/en/latest/GPyOpt.methods.html for information about all of the parameters as well as how to use models other than Gaussian processes.
In the next step, we set up log files and run the optimization by specifying the number of trials.
# text file with parameter info, gp info, best results rf = results_dir + 'report' # text file with header: Iteration Y var_1 var_2 etc. ef = results_dir + 'evaluation' # text file with gp model info mf = results_dir + 'models' gpyopt_bo.run_optimization(max_iter=num_iterations, report_file=rf, evaluations_file=ef, models_file=mf)
Finally, we create a Pandas DataFrame to store hyperparameter values, loss, model name, etc. for each trial. Note that integer values are rendered as reals.
header_params = [] for param in search_space: header_params.append(param['name']) df_results = pd.DataFrame(data=gpyopt_bo.X, columns=header_params) # gpyopt yields -0.91, integer values are rendered as real (1 -> 1.0) df_results['validation_loss'] = gpyopt_bo.Y df_results['epochs'] = list_early_stop_epochs df_results['validation loss check'] = list_validation_loss df_results['model_name'] = list_saved_model_name df_results.to_pickle(results_dir + 'df_results.pkl') df_results.to_csv(results_dir + 'df_results.csv')
3. Best Result and Ensembling
To obtain an ensemble, we load the saved models, sort by ‘validation_loss’, and apply a threshold so that only the top 10 models are used. Then we input the production data (unseen data) to obtain class probabilities, NOT the classes themselves. These probabilities are then averaged and the final predicted class is determined via plurality voting.
To obtain the single best result, we sorted the DataFrame mentioned in the previous section to obtain the hyperparameters that yielded the best result.
# using validation loss thus minimization threshold_multiplier = 1.0 df_params.sort_values(by='validation_loss', ascending=True, inplace=True) # apply threshold accepted_models = 0 list_predicted_prob = [] num_models = df_params.shape[0] for i in range(num_models): bool1 = df_params.loc[df_params.index[i],'validation_loss'] < threshold_multiplier*threshold bool2 = accepted_models < accepted_models_num if bool1 and bool2: model_name = df_params.loc[df_params.index[i],'model_name'] model_final = load_model(models_directory + model_name + '.h5') # these are class probabilities list_predicted_prob.append(model_final.predict(xprod)) # best results if i == 0: y_predicted_class_best = np.argmax(model_final.predict(xprod), axis=1) name_result = ml_name + '_best' classification_analysis.classification_analysis( yprod, y_predicted_class_best, list_class_names, save_directory, name_result) accepted_models = accepted_models + 1 # compute mean probabilities mean_probabilities = np.mean(list_predicted_prob, axis=0) # compute predicted class y_predicted_class_ensemble = np.argmax(mean_probabilities, axis=1)
4. Results
We used the MNIST handwritten digits data set. To reduce run time, we used half of the training data as our calibration data.
num_cnn_blocks | num_filters | kernel_size | num_dense_nodes | dense_nodes_divisor | batch_size | drop_out | validation_loss |
---|---|---|---|---|---|---|---|
4 | 64 | 4 | 192 | 2 | 64 | 0.0500 | 0.0293 |
4 | 64 | 4 | 192 | 8 | 128 | 0.0500 | 0.0299 |
4 | 64 | 4 | 160 | 8 | 128 | 0.0500 | 0.0328 |
4 | 32 | 4 | 192 | 8 | 128 | 0.0500 | 0.0342 |
4 | 16 | 4 | 256 | 2 | 96 | 0.0500 | 0.0344 |
4 | 32 | 4 | 256 | 4 | 128 | 0.0500 | 0.0348 |
4 | 64 | 4 | 128 | 8 | 96 | 0.0500 | 0.0348 |
3 | 64 | 4 | 192 | 2 | 32 | 0.4195 | 0.0349 |
4 | 64 | 4 | 160 | 8 | 96 | 0.1196 | 0.0354 |
4 | 48 | 4 | 256 | 8 | 128 | 0.0500 | 0.0359 |
Best results:
balanced accuracy score = 0.9941
accuracy score = 0.9942



Ensemble results:
balanced accuracy score = 0.9945
accuracy score = 0.9946


