Cannot reproduce optimization results for XGBClassifier #7057

RonaldGalea · 2021-06-24T11:15:36Z

Hello,
I'm having an issue getting reproducible results when optimizing an XGBClassifier. I'm using BayesSearchCV to optimize some hyperparameters, defined in a search grid. When running locally, everything works as expected and I get the same results each time I run. However, running on a (local) dask cluster gives different results each time, please see the code snippet below.

Code to reproduce issue:

from copy import deepcopy

import numpy as np
import pandas as pd
import skopt.space as skspace
import xgboost
from dask.distributed import Client
from distributed import LocalCluster
from IPython.display import display
from joblib import parallel_backend
from skopt import BayesSearchCV

# generate some data
np.random.seed(0)
train_data = pd.DataFrame(np.random.rand(1000, 10))
labels = pd.Series(np.random.randint(2, size=1000))

# define xgb model, bayesian search grid and optimizer
model = xgboost.XGBClassifier(random_state=0)

xgboost_search_grid = {
    'n_estimators': skspace.Integer(45, 100),
    'max_depth': skspace.Integer(5, 15),
    'colsample_bytree': skspace.Real(0.08, 0.3),
    'subsample': skspace.Real(0.2, 0.8),
    'learning_rate': skspace.Real(0.01, 0.15)
}

opt = BayesSearchCV(model, xgboost_search_grid, random_state=0, n_jobs=-1, n_iter=4, n_points=2, cv=10, refit=False)

# run optimization locally twice, we can see results are the same
local_res1 = deepcopy(opt.fit(train_data, labels))
print("Local run 1:", local_res1.cv_results_['mean_test_score'], "\n")

local_res2 = deepcopy(opt.fit(train_data, labels))
print("Local run 2:", local_res2.cv_results_['mean_test_score'], "\n")

# set up local dask cluster
cluster = LocalCluster()
client = Client(cluster)

# run optimization on a local dask cluster -> The results are different each time
with parallel_backend('dask'):
    dask_res1 = deepcopy(opt.fit(train_data, labels))
print("Dask run 1:", dask_res1.cv_results_['mean_test_score'], "\n")

with parallel_backend('dask'):
    dask_res2 = deepcopy(opt.fit(train_data, labels))
print("Dask run 2:", dask_res2.cv_results_['mean_test_score'], "\n")

# inspecting the full results, we can see that the sets of hyperparameters that are evaluated are identical for both runs
# this should mean that the difference comes from training the xgboost models
display("Evaluated hyperparameters dask run 1:", dask_res1.cv_results_["params"])
print("\n\n----------------------\n\n")
display("Evaluated hyperparameters dask run 2:", dask_res2.cv_results_["params"])

# just to confirm
assert dask_res1.cv_results_["params"] == dask_res2.cv_results_["params"]
assert np.array_equal(dask_res1.cv_results_["mean_test_score"], dask_res2.cv_results_["mean_test_score"]) is False

Output:

Local run 1: [0.529 0.491 0.511 0.512] 

Local run 2: [0.529 0.491 0.511 0.512] 

Dask run 1: [0.506 0.522 0.516 0.516] 

Dask run 2: [0.527 0.518 0.518 0.514] 

'Evaluated hyperparameters dask run 1:'
[OrderedDict([('colsample_bytree', 0.19681211628947243),
              ('learning_rate', 0.10465113124276788),
              ('max_depth', 11),
              ('n_estimators', 81),
              ('subsample', 0.7152462989930835)]),
 OrderedDict([('colsample_bytree', 0.29572237778914645),
              ('learning_rate', 0.02792961188398764),
              ('max_depth', 5),
              ('n_estimators', 77),
              ('subsample', 0.3453743955677361)]),
 OrderedDict([('colsample_bytree', 0.13684176563400458),
              ('learning_rate', 0.1470465121796906),
              ('max_depth', 14),
              ('n_estimators', 78),
              ('subsample', 0.35751455362325446)]),
 OrderedDict([('colsample_bytree', 0.11581214899157613),
              ('learning_rate', 0.11818146284860102),
              ('max_depth', 11),
              ('n_estimators', 51),
              ('subsample', 0.31304057362794935)])]


----------------------


'Evaluated hyperparameters dask run 2:'
[OrderedDict([('colsample_bytree', 0.19681211628947243),
              ('learning_rate', 0.10465113124276788),
              ('max_depth', 11),
              ('n_estimators', 81),
              ('subsample', 0.7152462989930835)]),
 OrderedDict([('colsample_bytree', 0.29572237778914645),
              ('learning_rate', 0.02792961188398764),
              ('max_depth', 5),
              ('n_estimators', 77),
              ('subsample', 0.3453743955677361)]),
 OrderedDict([('colsample_bytree', 0.13684176563400458),
              ('learning_rate', 0.1470465121796906),
              ('max_depth', 14),
              ('n_estimators', 78),
              ('subsample', 0.35751455362325446)]),
 OrderedDict([('colsample_bytree', 0.11581214899157613),
              ('learning_rate', 0.11818146284860102),
              ('max_depth', 11),
              ('n_estimators', 51),
              ('subsample', 0.31304057362794935)])]

It appears the differences come from the training of the XGBClassifiers, because the sets of evaluated hyperparameters are the same across runs. I also noticed there might be something amiss with this particular search grid, because if any of the entries are removed, the results are reproducible again.

Library versions:

XGBoost version: 1.3.3
Scikit-optimize version: 0.9.dev0
Joblib version: 1.0.1
Dask version: 2021.04.0

The text was updated successfully, but these errors were encountered:

hcho3 · 2021-06-24T11:37:48Z

Have you tried removing all sampling hyperparameters from the search grid?

trivialfis · 2021-06-24T12:13:45Z

This is likely the same issue with global rng. I have a WIP branch removing it, will prioritize it.

RonaldGalea · 2021-06-24T12:30:52Z

@hcho3
Removing any of the 5 hyperparameters from the grid makes it work correctly. It is somehow these exact 5 that cause the issue.

apatange-source · 2022-03-26T07:32:38Z

Is this issue closed or is it still WIP?

RonaldGalea changed the title ~~XGBClassifier random_state not working~~ Cannot reproduce optimization results for XGBClassifier Jun 24, 2021

marcofavoritobi mentioned this issue Mar 18, 2023

Reproducibility issue of XGBoostSampler results for Windows and Linux bancaditalia/black-it#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce optimization results for XGBClassifier #7057

Cannot reproduce optimization results for XGBClassifier #7057

RonaldGalea commented Jun 24, 2021 •

edited

Loading

hcho3 commented Jun 24, 2021

trivialfis commented Jun 24, 2021

RonaldGalea commented Jun 24, 2021

apatange-source commented Mar 26, 2022

Cannot reproduce optimization results for XGBClassifier #7057

Cannot reproduce optimization results for XGBClassifier #7057

Comments

RonaldGalea commented Jun 24, 2021 • edited Loading

hcho3 commented Jun 24, 2021

trivialfis commented Jun 24, 2021

RonaldGalea commented Jun 24, 2021

apatange-source commented Mar 26, 2022

RonaldGalea commented Jun 24, 2021 •

edited

Loading