You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I'm having an issue getting reproducible results when optimizing an XGBClassifier. I'm using BayesSearchCV to optimize some hyperparameters, defined in a search grid. When running locally, everything works as expected and I get the same results each time I run. However, running on a (local) dask cluster gives different results each time, please see the code snippet below.
Code to reproduce issue:
fromcopyimportdeepcopyimportnumpyasnpimportpandasaspdimportskopt.spaceasskspaceimportxgboostfromdask.distributedimportClientfromdistributedimportLocalClusterfromIPython.displayimportdisplayfromjoblibimportparallel_backendfromskoptimportBayesSearchCV# generate some datanp.random.seed(0)
train_data=pd.DataFrame(np.random.rand(1000, 10))
labels=pd.Series(np.random.randint(2, size=1000))
# define xgb model, bayesian search grid and optimizermodel=xgboost.XGBClassifier(random_state=0)
xgboost_search_grid= {
'n_estimators': skspace.Integer(45, 100),
'max_depth': skspace.Integer(5, 15),
'colsample_bytree': skspace.Real(0.08, 0.3),
'subsample': skspace.Real(0.2, 0.8),
'learning_rate': skspace.Real(0.01, 0.15)
}
opt=BayesSearchCV(model, xgboost_search_grid, random_state=0, n_jobs=-1, n_iter=4, n_points=2, cv=10, refit=False)
# run optimization locally twice, we can see results are the samelocal_res1=deepcopy(opt.fit(train_data, labels))
print("Local run 1:", local_res1.cv_results_['mean_test_score'], "\n")
local_res2=deepcopy(opt.fit(train_data, labels))
print("Local run 2:", local_res2.cv_results_['mean_test_score'], "\n")
# set up local dask clustercluster=LocalCluster()
client=Client(cluster)
# run optimization on a local dask cluster -> The results are different each timewithparallel_backend('dask'):
dask_res1=deepcopy(opt.fit(train_data, labels))
print("Dask run 1:", dask_res1.cv_results_['mean_test_score'], "\n")
withparallel_backend('dask'):
dask_res2=deepcopy(opt.fit(train_data, labels))
print("Dask run 2:", dask_res2.cv_results_['mean_test_score'], "\n")
# inspecting the full results, we can see that the sets of hyperparameters that are evaluated are identical for both runs# this should mean that the difference comes from training the xgboost modelsdisplay("Evaluated hyperparameters dask run 1:", dask_res1.cv_results_["params"])
print("\n\n----------------------\n\n")
display("Evaluated hyperparameters dask run 2:", dask_res2.cv_results_["params"])
# just to confirmassertdask_res1.cv_results_["params"] ==dask_res2.cv_results_["params"]
assertnp.array_equal(dask_res1.cv_results_["mean_test_score"], dask_res2.cv_results_["mean_test_score"]) isFalse
It appears the differences come from the training of the XGBClassifiers, because the sets of evaluated hyperparameters are the same across runs. I also noticed there might be something amiss with this particular search grid, because if any of the entries are removed, the results are reproducible again.
Hello,
I'm having an issue getting reproducible results when optimizing an
XGBClassifier
. I'm usingBayesSearchCV
to optimize some hyperparameters, defined in a search grid. When running locally, everything works as expected and I get the same results each time I run. However, running on a (local) dask cluster gives different results each time, please see the code snippet below.Code to reproduce issue:
Output:
It appears the differences come from the training of the
XGBClassifiers
, because the sets of evaluated hyperparameters are the same across runs. I also noticed there might be something amiss with this particular search grid, because if any of the entries are removed, the results are reproducible again.Library versions:
XGBoost version: 1.3.3
Scikit-optimize version: 0.9.dev0
Joblib version: 1.0.1
Dask version: 2021.04.0
The text was updated successfully, but these errors were encountered: