-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving monotone constraints ("Fast" method; linked to #2305, #2717) #2770
Improving monotone constraints ("Fast" method; linked to #2305, #2717) #2770
Conversation
Also there is a single build failing about the docs I added. I am not sure why. Let me know if you have an idea. Thanks |
Please run |
I just did as you said and pushed it to see if it would build. I think there was a build error on your side @StrikerRUS (I see errors like "Exception: Please install CMake first"). Thanks |
@CharlesAuguste Now docs test is OK. All other builds are failing with the following compilation error:
|
Oh okay I see. Sorry I was confused because all the other checks were passing when there was an issue with the docs, and the library is building locally for me, so I thought my code was fine. I will try to fix all that. Thanks |
@CharlesAuguste Sorry about the inconvenience! I guess it have started to fail due to recently merged commits in |
FeatureMonotone is removed in master. You can rebase to it. |
98467de
to
c4a9ce8
Compare
All checks are passing now @guolinke @StrikerRUS! Thanks for your advice. Let me know what you think of the PR. |
@CharlesAuguste could I make some changes based on your code, and propose PR in your repo? |
Of course yes, I would be very happy with that. Do I need to grant you any access to my repo for that, or is it good? Thanks |
@CharlesAuguste I can fork your repo and make changes : ) |
I am busy recently, will work on it this weekend. |
No problem, thank you very much for spending time on this! :) |
|
||
// this function updates the best split for a leaf | ||
// it is called only when monotone constraints exist | ||
void SerialTreeLearner::UpdateBestSplitsFromHistograms( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about this function. What do you want to get in this function?
Do you want to update the leaf splits, by different constraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, the only goal of this function is to update the best split (called split here), with the new constraints it just got.
To summarize, first a split is made somewhere. Then the function GoUpToFindLeavesToUpdate updates the constraints of all other splits (if needed). Then the function UpdateBestSplitsFromHistograms updates the best splits in case the constraints associated with the previous best splits were modified. Indeed, if they were, the previous best split may not be the best split anymore.
@CharlesAuguste please check my PR in your repo. |
@guolinke Thank you very much, I will look at your PR. I understand that you find GoUpToFindLeavesToUpdate messy. I will try to think of a way to make it simpler. Thanks a lot! |
Let me know what you think and if you would like me to make other changes. Thanks! |
@guolinke I tried to refactor GoUpToFindLeavesToUpdate to make it less messy (I created a new function, and changed generic variable names to much more explicit ones, and added some comments). Let me know what you think. Hopefully, the function should much easier to understand now! Also I formatted the code to standard clang-format because the variables names I used were too long for the lines, but feel free to change the format to anything else you like. Thanks |
2077fd9
to
fd34998
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CharlesAuguste
Please check some minor comments below.
@StrikerRUS I think I answered all your comments. Let me know if you are happy with the changes. There is 1 check failing, but I am not sure if it is my fault, it looks like it timed out. Can you take a look please? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CharlesAuguste Thank you so much! Awesome contribution!
Can you please check one comment below before we merge?
if (threshold >= | ||
thresholds_of_splits_going_up_from_original_leaf[i] && | ||
!was_original_leaf_right_child_of_split[i]) { | ||
keep_going_right = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we speedup here (and in the next if
) by not running the rest of the loop when the opposite direction is already false
by something like
keep_going_right = false;
if (!keep_going_left) {
break;
}
and will it make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that make sense, that should speed up the function. I just pushed the change, @StrikerRUS let me know if there is anything else.
@CharlesAuguste Thank you for revisiting those loops! Appveyor failed with rare strange MinGW error (#1818) and I don't have access to Appveyor builds, so I'm going to close-reopen this PR to re-trigger all CI builds. |
@CharlesAuguste seconding what others have said. AWESOME contribution! Thanks so much for working through the review process with and for taking the time to contribute. |
@guolinke @StrikerRUS @jameslamb Thank you very much for working with me on this!! If you are still okay to keep going, I still have more have more as indicated in my original PR. So if that's okay with you I will open the next PR in the coming days. Next one will be about monotone splits penalization as a way to improve the greediness of the way the trees are built, which does not mix too well with monotone constraints. It should be fairly simple code-wise compared to this PR. Thanks again |
@CharlesAuguste Sure! Please continue with your plan. |
…, microsoft#2717) (microsoft#2770) * Add util functions. * Added monotone_constraints_method as a parameter. * Add the intermediate constraining method. * Updated tests. * Minor fixes. * Typo. * Linting. * Ran the parameter generator for the doc. * Removed usage of the FeatureMonotone function. * more fixes * Fix. * Remove duplicated code. * Add debug checks. * Typo. * Bug fix. * Disable the use of intermediate monotone constraints and feature sampling at the same time. * Added an alias for monotone constraining method. * Use the right variable to get the number of threads. * Fix DEBUG checks. * Add back check to determine if histogram is splittable. * Added forgotten override keywords. * Perform monotone constraint update only when necessary. * Small refactor of FastLeafConstraints. * Post rebase commit. * Small refactor. * Typo. * Added comment and slightly improved logic of monotone constraints. * Forgot a const. * Vectors that are to be modified need to be pointers. * Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation. * Remove overload of GoUpToFindLeavesToUpdate. * Stop memory leaking. * Fix cpplint issues. * Fix checks. * Fix more cpplint issues. * Refactor config monotone constraints method. * Typos. * Remove useless empty lines. * Add new line to separate includes. * Replace unsigned ind by size_t. * Reduce number of trials in tests to decrease CI time. * Specify monotone constraints better in tests. * Removed outer loop in test of monotone constraints. * Added categorical features to the monotone constraints tests. * Add blank line. * Regenerate parameters automatically. * Speed up ShouldKeepGoingLeftRight. Co-authored-by: Charles Auguste <[email protected]> Co-authored-by: guolinke <[email protected]>
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
@aldanor @redditur
@guolinke @chivee @jameslamb @huanzhang12 @Laurae2 @StrikerRUS
You were all reviewers of the PRs #2305 and #2717. #2305 was judged to be not merge-able because it is too big. Therefore, in my ultimate comment (#2305 (comment)) I said I would split into smaller PRs easier to merge. This is the second PR in relation with #2305; the first one being #2717 and having been merged.
The goal of this PR is to introduce a new constraining method. This is the "Fast" method from #2305. I introduce a new parameter in the config called "monotone_constraints_method". When this is set to "basic", then the original constraining method is used. When it is set to "intermediate", the new constraining method of this PR is used. (And later, we would like to add an even better constraining method, called "Slow" method, for which this parameter will be set to "advanced").
Here are a few details about the changes I made:
As I detailed in #2305, this new method yields significantly better results than the existing one, because it is not as constraining. Feel free to ask more details about that. The code in monotone_constraints.hpp may be complicated to understand because the new constraining method is an algorithm that goes through the tree from the split to the root and then from the root to all the leaves, and that updates constraints while trying ignore branches it does need to update to save as much time as possible. I am very happy to give more details or to explain the method further if needed.
I guess this PR is still pretty big, but I hope if is still manageable for you. Please @guolinke let me know if that is okay for you. Thank you very much.
EDIT: Here is the link to the report we made for the original PR. The method we implement in this PR is described in it. https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf