Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving monotone constraints ("Fast" method; linked to #2305, #2717) #2770

Merged

Conversation

CharlesAuguste
Copy link
Contributor

@CharlesAuguste CharlesAuguste commented Feb 18, 2020

@aldanor @redditur

@guolinke @chivee @jameslamb @huanzhang12 @Laurae2 @StrikerRUS
You were all reviewers of the PRs #2305 and #2717. #2305 was judged to be not merge-able because it is too big. Therefore, in my ultimate comment (#2305 (comment)) I said I would split into smaller PRs easier to merge. This is the second PR in relation with #2305; the first one being #2717 and having been merged.

The goal of this PR is to introduce a new constraining method. This is the "Fast" method from #2305. I introduce a new parameter in the config called "monotone_constraints_method". When this is set to "basic", then the original constraining method is used. When it is set to "intermediate", the new constraining method of this PR is used. (And later, we would like to add an even better constraining method, called "Slow" method, for which this parameter will be set to "advanced").

Here are a few details about the changes I made:

  • test_engine.py: improved test to make sure to catch any error and to test both methods;
  • Parameters.rst, config.h, config_auto.cpp: added the new config parameter "monotone_constraints_method";
  • tree.cpp, tree.h, leaf_splits.hpp: added new utility functions which should be fairly straightforward;
  • monotone_constraints.hpp: added the methods to compute the new constraints;
  • serial_tree_learner.h, serial_tree_learner.cpp: added code to switch between the new method + a new function to compute best splits, used only with the new fast method.

As I detailed in #2305, this new method yields significantly better results than the existing one, because it is not as constraining. Feel free to ask more details about that. The code in monotone_constraints.hpp may be complicated to understand because the new constraining method is an algorithm that goes through the tree from the split to the root and then from the root to all the leaves, and that updates constraints while trying ignore branches it does need to update to save as much time as possible. I am very happy to give more details or to explain the method further if needed.

I guess this PR is still pretty big, but I hope if is still manageable for you. Please @guolinke let me know if that is okay for you. Thank you very much.

EDIT: Here is the link to the report we made for the original PR. The method we implement in this PR is described in it. https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf

@CharlesAuguste
Copy link
Contributor Author

Also there is a single build failing about the docs I added. I am not sure why. Let me know if you have an idea. Thanks

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste

Also there is a single build failing about the docs I added. I am not sure why. Let me know if you have an idea. Thanks

Please run python helpers/parameter_generator.py to fix the error. I guess it's due to excess one blank line in Parameters.rst.

@CharlesAuguste
Copy link
Contributor Author

I just did as you said and pushed it to see if it would build. I think there was a build error on your side @StrikerRUS (I see errors like "Exception: Please install CMake first"). Thanks

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste Now docs test is OK. All other builds are failing with the following compilation error:

...
[ 84%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:9:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/parallel_tree_learner.h:15:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/gpu_tree_learner.h:22:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/feature_histogram.hpp:19:
/home/travis/build/microsoft/LightGBM/src/treelearner/monotone_constraints.hpp:119:38: error: 
      no member named 'FeatureMonotone' in 'LightGBM::Dataset'
          learner_state.train_data_->FeatureMonotone(inner_feature);
          ~~~~~~~~~~~~~~~~~~~~~~~~~  ^
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:9:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/parallel_tree_learner.h:15:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/gpu_tree_learner.h:24:
/home/travis/build/microsoft/LightGBM/src/treelearner/serial_tree_learner.h:37:1: warning: 
      class 'CostEfficientGradientBoosting' was previously declared as a struct
      [-Wmismatched-tags]
class CostEfficientGradientBoosting;
^
/home/travis/build/microsoft/LightGBM/src/treelearner/monotone_constraints.hpp:16:8: note: 
      previous use is here
struct CostEfficientGradientBoosting;
       ^
1 warning and 1 error generated.
CMakeFiles/_lightgbm.dir/build.make:387: recipe for target 'CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o' failed
make[3]: *** [CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/feature_parallel_tree_learner.cpp:8:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/parallel_tree_learner.h:15:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/gpu_tree_learner.h:22:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/feature_histogram.hpp:19:
/home/travis/build/microsoft/LightGBM/src/treelearner/monotone_constraints.hpp:119:38: error: 
      no member named 'FeatureMonotone' in 'LightGBM::Dataset'
          learner_state.train_data_->FeatureMonotone(inner_feature);
          ~~~~~~~~~~~~~~~~~~~~~~~~~  ^
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/feature_parallel_tree_learner.cpp:8:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/parallel_tree_learner.h:15:
In file included from /home/travis/build/microsoft/LightGBM/src/treelearner/gpu_tree_learner.h:24:
/home/travis/build/microsoft/LightGBM/src/treelearner/serial_tree_learner.h:37:1: warning: 
      class 'CostEfficientGradientBoosting' was previously declared as a struct
      [-Wmismatched-tags]
class CostEfficientGradientBoosting;
^
/home/travis/build/microsoft/LightGBM/src/treelearner/monotone_constraints.hpp:16:8: note: 
      previous use is here
struct CostEfficientGradientBoosting;
       ^
1 warning and 1 error generated.
CMakeFiles/_lightgbm.dir/build.make:400: recipe for target 'CMakeFiles/_lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o' failed
make[3]: *** [CMakeFiles/_lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/_lightgbm.dir/all' failed
make[2]: *** [CMakeFiles/_lightgbm.dir/all] Error 2
CMakeFiles/Makefile2:84: recipe for target 'CMakeFiles/_lightgbm.dir/rule' failed
make[1]: *** [CMakeFiles/_lightgbm.dir/rule] Error 2
Makefile:164: recipe for target '_lightgbm' failed
make: *** [_lightgbm] Error 2

@CharlesAuguste
Copy link
Contributor Author

Oh okay I see. Sorry I was confused because all the other checks were passing when there was an issue with the docs, and the library is building locally for me, so I thought my code was fine. I will try to fix all that. Thanks

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste Sorry about the inconvenience! I guess it have started to fail due to recently merged commits in master.

@guolinke
Copy link
Collaborator

FeatureMonotone is removed in master. You can rebase to it.

@CharlesAuguste CharlesAuguste force-pushed the PR2-monotone-constraints-fast-method branch from 98467de to c4a9ce8 Compare February 20, 2020 08:57
@CharlesAuguste
Copy link
Contributor Author

All checks are passing now @guolinke @StrikerRUS! Thanks for your advice. Let me know what you think of the PR.

@guolinke
Copy link
Collaborator

@CharlesAuguste could I make some changes based on your code, and propose PR in your repo?

@CharlesAuguste
Copy link
Contributor Author

Of course yes, I would be very happy with that. Do I need to grant you any access to my repo for that, or is it good? Thanks

@guolinke
Copy link
Collaborator

@CharlesAuguste I can fork your repo and make changes : )

@guolinke guolinke self-assigned this Feb 24, 2020
@guolinke
Copy link
Collaborator

I am busy recently, will work on it this weekend.

@CharlesAuguste
Copy link
Contributor Author

No problem, thank you very much for spending time on this! :)

include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved

// this function updates the best split for a leaf
// it is called only when monotone constraints exist
void SerialTreeLearner::UpdateBestSplitsFromHistograms(
Copy link
Collaborator

@guolinke guolinke Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this function. What do you want to get in this function?
Do you want to update the leaf splits, by different constraints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, the only goal of this function is to update the best split (called split here), with the new constraints it just got.

To summarize, first a split is made somewhere. Then the function GoUpToFindLeavesToUpdate updates the constraints of all other splits (if needed). Then the function UpdateBestSplitsFromHistograms updates the best splits in case the constraints associated with the previous best splits were modified. Indeed, if they were, the previous best split may not be the best split anymore.

@guolinke
Copy link
Collaborator

@CharlesAuguste please check my PR in your repo.
I decouple the MC class with tree and tree_learner.
I feel like the code of GoUpToFindLeavesToUpdate is a little bit messy, it would be better if you can make it cleaner.

@CharlesAuguste
Copy link
Contributor Author

@guolinke Thank you very much, I will look at your PR. I understand that you find GoUpToFindLeavesToUpdate messy. I will try to think of a way to make it simpler. Thanks a lot!

@CharlesAuguste
Copy link
Contributor Author

CharlesAuguste commented Mar 3, 2020

  • I rebased on @guolinke PR on my repo and fixed a few issues;
  • I rebased on the current master;
  • The build is failing for the R package, but I don't think it it is my fault, it looks like it is linked to [R-package] Tests failing on Azure devops #2855;
  • @StrikerRUS I added mc_constraints as an alias;
  • I still need to think about improving the readability of GoUpToFindLeavesToUpdate. I will try to do that shortly.

Let me know what you think and if you would like me to make other changes. Thanks!

@CharlesAuguste
Copy link
Contributor Author

CharlesAuguste commented Mar 3, 2020

@guolinke I tried to refactor GoUpToFindLeavesToUpdate to make it less messy (I created a new function, and changed generic variable names to much more explicit ones, and added some comments). Let me know what you think. Hopefully, the function should much easier to understand now!

Also I formatted the code to standard clang-format because the variables names I used were too long for the lines, but feel free to change the format to anything else you like. Thanks

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste

The build is failing for the R package, but I don't think it it is my fault, it looks like it is linked to #2855;

I'm so sorry about that! The fix (#2856) was merged into master several minutes ago.

@CharlesAuguste
Copy link
Contributor Author

@CharlesAuguste

The build is failing for the R package, but I don't think it it is my fault, it looks like it is linked to #2855;

I'm so sorry about that! The fix (#2856) was merged into master several minutes ago.

Thanks for letting me know! All checks seem good now :)

@CharlesAuguste CharlesAuguste force-pushed the PR2-monotone-constraints-fast-method branch from 2077fd9 to fd34998 Compare March 17, 2020 08:48
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesAuguste
Please check some minor comments below.

include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved
src/io/config.cpp Outdated Show resolved Hide resolved
src/io/config.cpp Outdated Show resolved Hide resolved
src/treelearner/leaf_splits.hpp Outdated Show resolved Hide resolved
src/treelearner/monotone_constraints.hpp Outdated Show resolved Hide resolved
src/treelearner/monotone_constraints.hpp Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
@CharlesAuguste
Copy link
Contributor Author

@StrikerRUS I think I answered all your comments. Let me know if you are happy with the changes. There is 1 check failing, but I am not sure if it is my fault, it looks like it timed out. Can you take a look please? Thanks

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesAuguste Thank you so much! Awesome contribution!
Can you please check one comment below before we merge?

if (threshold >=
thresholds_of_splits_going_up_from_original_leaf[i] &&
!was_original_leaf_right_child_of_split[i]) {
keep_going_right = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we speedup here (and in the next if) by not running the rest of the loop when the opposite direction is already false by something like

keep_going_right = false;
if (!keep_going_left) {
  break;
}

and will it make any sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that make sense, that should speed up the function. I just pushed the change, @StrikerRUS let me know if there is anything else.

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste Thank you for revisiting those loops!

Appveyor failed with rare strange MinGW error (#1818) and I don't have access to Appveyor builds, so I'm going to close-reopen this PR to re-trigger all CI builds.

@StrikerRUS StrikerRUS closed this Mar 23, 2020
@StrikerRUS StrikerRUS reopened this Mar 23, 2020
@StrikerRUS StrikerRUS merged commit a8c1e0a into microsoft:master Mar 23, 2020
@jameslamb
Copy link
Collaborator

@CharlesAuguste seconding what others have said. AWESOME contribution! Thanks so much for working through the review process with and for taking the time to contribute.

@CharlesAuguste
Copy link
Contributor Author

@guolinke @StrikerRUS @jameslamb Thank you very much for working with me on this!! If you are still okay to keep going, I still have more have more as indicated in my original PR. So if that's okay with you I will open the next PR in the coming days. Next one will be about monotone splits penalization as a way to improve the greediness of the way the trees are built, which does not mix too well with monotone constraints. It should be fairly simple code-wise compared to this PR. Thanks again

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste Sure! Please continue with your plan.

jameslamb pushed a commit to jameslamb/LightGBM that referenced this pull request Mar 24, 2020
…, microsoft#2717)  (microsoft#2770)

* Add util functions.

* Added monotone_constraints_method as a parameter.

* Add the intermediate constraining method.

* Updated tests.

* Minor fixes.

* Typo.

* Linting.

* Ran the parameter generator for the doc.

* Removed usage of the FeatureMonotone function.

* more fixes

* Fix.

* Remove duplicated code.

* Add debug checks.

* Typo.

* Bug fix.

* Disable the use of intermediate monotone constraints and feature sampling at the same time.

* Added an alias for monotone constraining method.

* Use the right variable to get the number of threads.

* Fix DEBUG checks.

* Add back check to determine if histogram is splittable.

* Added forgotten override keywords.

* Perform monotone constraint update only when necessary.

* Small refactor of FastLeafConstraints.

* Post rebase commit.

* Small refactor.

* Typo.

* Added comment and slightly improved logic of monotone constraints.

* Forgot a const.

* Vectors that are to be modified need to be pointers.

* Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation.

* Remove overload of GoUpToFindLeavesToUpdate.

* Stop memory leaking.

* Fix cpplint issues.

* Fix checks.

* Fix more cpplint issues.

* Refactor config monotone constraints method.

* Typos.

* Remove useless empty lines.

* Add new line to separate includes.

* Replace unsigned ind by size_t.

* Reduce number of trials in tests to decrease CI time.

* Specify monotone constraints better in tests.

* Removed outer loop in test of monotone constraints.

* Added categorical features to the monotone constraints tests.

* Add blank line.

* Regenerate parameters automatically.

* Speed up ShouldKeepGoingLeftRight.

Co-authored-by: Charles Auguste <[email protected]>
Co-authored-by: guolinke <[email protected]>
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants