-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Fixed R implementation of upper_bound() and lower_bound() for lgb.Booster #2785
Conversation
Ok I realized one thing that was causing surprising results...the return type needs to be double, not int. Fixed in a0dc638 Now that same example returns values like this:
Is there something fundamental I'm missing? If the target in the training data is bounded between 0 and 1, how is it possible for something tree-based to predict a value lower than 0 or great than 1? When I re-predict on all of the training data, I get results I'd expect
|
Hello @jameslamb, there might be an error, but there is not direct relationship between the minimmum you get in your data and the bound values, the bound values may be totally unreachable. These bounds are taken by looking at the min and max values of every tree's leaves and adding them up. But they are most likely not reachable values. They serve as a lower or upper bound that one can be 100% sure that the model will not surpass without taking a look at any kind of data. I hope it helps |
Thanks, Joan. Maybe there's something I've misunderstood, I still don't get how a tree-based model that has only ever seen data between 0 and 1 could ever predict a value outside of that range even theoretically, since the leaf values are taken by voting or averaging (depending on task). Right? |
hey James, it does not need a theoretical value. It is just equivalent to parse the tree, and take all the time the min or max of leaf_values, and add them up. Imagine: tree 1 The upper and lower bounds would be -1.5 and 1.5. But maybe these values are not reachable, they are just a conservative bound. |
Thanks @JoanFM . I was thinking about this the wrong way. Fixed the tests in ec9a941 I think we are good! |
you are welcome, thank you and sorry for the screw up with the R-package in the previous PR |
No problem! My fault for not giving you a review sooner. Thanks again for the contributions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Thanks you very much for the prompt fixes!
847bd1b
to
1825ad7
Compare
@JoanFM thanks for contributing #2737 ! Unfortunately, there were some issues in the R side. This PR attempts to address them:
_
from method nameslower_bound()
method was storing its return value in variableupper_bound
lightgbm_R.h
(without that,LGBM_BoosterGetUpperBoundValue_R
andLGBM_BoosterGetLowerBoundValue_R
are not callable from R)I still need some help though @guolinke @JoanFM @StrikerRUS .... I am not getting the answers I'd expect. For example, I would expect lower bound to be 0 and upper bound to be 1 for binary classification, but running this:
I'm confused by the results:
I'll look back through #2737 , but maybe there is just something fundamental that I've misunderstood?