Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce optimization results for XGBClassifier #7057

Open
RonaldGalea opened this issue Jun 24, 2021 · 4 comments
Open

Cannot reproduce optimization results for XGBClassifier #7057

RonaldGalea opened this issue Jun 24, 2021 · 4 comments

Comments

@RonaldGalea
Copy link

RonaldGalea commented Jun 24, 2021

Hello,
I'm having an issue getting reproducible results when optimizing an XGBClassifier. I'm using BayesSearchCV to optimize some hyperparameters, defined in a search grid. When running locally, everything works as expected and I get the same results each time I run. However, running on a (local) dask cluster gives different results each time, please see the code snippet below.

Code to reproduce issue:

from copy import deepcopy

import numpy as np
import pandas as pd
import skopt.space as skspace
import xgboost
from dask.distributed import Client
from distributed import LocalCluster
from IPython.display import display
from joblib import parallel_backend
from skopt import BayesSearchCV

# generate some data
np.random.seed(0)
train_data = pd.DataFrame(np.random.rand(1000, 10))
labels = pd.Series(np.random.randint(2, size=1000))

# define xgb model, bayesian search grid and optimizer
model = xgboost.XGBClassifier(random_state=0)

xgboost_search_grid = {
    'n_estimators': skspace.Integer(45, 100),
    'max_depth': skspace.Integer(5, 15),
    'colsample_bytree': skspace.Real(0.08, 0.3),
    'subsample': skspace.Real(0.2, 0.8),
    'learning_rate': skspace.Real(0.01, 0.15)
}

opt = BayesSearchCV(model, xgboost_search_grid, random_state=0, n_jobs=-1, n_iter=4, n_points=2, cv=10, refit=False)

# run optimization locally twice, we can see results are the same
local_res1 = deepcopy(opt.fit(train_data, labels))
print("Local run 1:", local_res1.cv_results_['mean_test_score'], "\n")

local_res2 = deepcopy(opt.fit(train_data, labels))
print("Local run 2:", local_res2.cv_results_['mean_test_score'], "\n")

# set up local dask cluster
cluster = LocalCluster()
client = Client(cluster)

# run optimization on a local dask cluster -> The results are different each time
with parallel_backend('dask'):
    dask_res1 = deepcopy(opt.fit(train_data, labels))
print("Dask run 1:", dask_res1.cv_results_['mean_test_score'], "\n")

with parallel_backend('dask'):
    dask_res2 = deepcopy(opt.fit(train_data, labels))
print("Dask run 2:", dask_res2.cv_results_['mean_test_score'], "\n")

# inspecting the full results, we can see that the sets of hyperparameters that are evaluated are identical for both runs
# this should mean that the difference comes from training the xgboost models
display("Evaluated hyperparameters dask run 1:", dask_res1.cv_results_["params"])
print("\n\n----------------------\n\n")
display("Evaluated hyperparameters dask run 2:", dask_res2.cv_results_["params"])

# just to confirm
assert dask_res1.cv_results_["params"] == dask_res2.cv_results_["params"]
assert np.array_equal(dask_res1.cv_results_["mean_test_score"], dask_res2.cv_results_["mean_test_score"]) is False

Output:

Local run 1: [0.529 0.491 0.511 0.512] 

Local run 2: [0.529 0.491 0.511 0.512] 

Dask run 1: [0.506 0.522 0.516 0.516] 

Dask run 2: [0.527 0.518 0.518 0.514] 

'Evaluated hyperparameters dask run 1:'
[OrderedDict([('colsample_bytree', 0.19681211628947243),
              ('learning_rate', 0.10465113124276788),
              ('max_depth', 11),
              ('n_estimators', 81),
              ('subsample', 0.7152462989930835)]),
 OrderedDict([('colsample_bytree', 0.29572237778914645),
              ('learning_rate', 0.02792961188398764),
              ('max_depth', 5),
              ('n_estimators', 77),
              ('subsample', 0.3453743955677361)]),
 OrderedDict([('colsample_bytree', 0.13684176563400458),
              ('learning_rate', 0.1470465121796906),
              ('max_depth', 14),
              ('n_estimators', 78),
              ('subsample', 0.35751455362325446)]),
 OrderedDict([('colsample_bytree', 0.11581214899157613),
              ('learning_rate', 0.11818146284860102),
              ('max_depth', 11),
              ('n_estimators', 51),
              ('subsample', 0.31304057362794935)])]


----------------------


'Evaluated hyperparameters dask run 2:'
[OrderedDict([('colsample_bytree', 0.19681211628947243),
              ('learning_rate', 0.10465113124276788),
              ('max_depth', 11),
              ('n_estimators', 81),
              ('subsample', 0.7152462989930835)]),
 OrderedDict([('colsample_bytree', 0.29572237778914645),
              ('learning_rate', 0.02792961188398764),
              ('max_depth', 5),
              ('n_estimators', 77),
              ('subsample', 0.3453743955677361)]),
 OrderedDict([('colsample_bytree', 0.13684176563400458),
              ('learning_rate', 0.1470465121796906),
              ('max_depth', 14),
              ('n_estimators', 78),
              ('subsample', 0.35751455362325446)]),
 OrderedDict([('colsample_bytree', 0.11581214899157613),
              ('learning_rate', 0.11818146284860102),
              ('max_depth', 11),
              ('n_estimators', 51),
              ('subsample', 0.31304057362794935)])]

It appears the differences come from the training of the XGBClassifiers, because the sets of evaluated hyperparameters are the same across runs. I also noticed there might be something amiss with this particular search grid, because if any of the entries are removed, the results are reproducible again.

Library versions:

XGBoost version: 1.3.3
Scikit-optimize version: 0.9.dev0
Joblib version: 1.0.1
Dask version: 2021.04.0

@RonaldGalea RonaldGalea changed the title XGBClassifier random_state not working Cannot reproduce optimization results for XGBClassifier Jun 24, 2021
@hcho3
Copy link
Collaborator

hcho3 commented Jun 24, 2021

Have you tried removing all sampling hyperparameters from the search grid?

@trivialfis
Copy link
Member

This is likely the same issue with global rng. I have a WIP branch removing it, will prioritize it.

@RonaldGalea
Copy link
Author

@hcho3
Removing any of the 5 hyperparameters from the grid makes it work correctly. It is somehow these exact 5 that cause the issue.

@apatange-source
Copy link

Is this issue closed or is it still WIP?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants