-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) #2573
Conversation
Please rebase to the latest |
6214e77
to
49b4c87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if my comments are not so smart, but since review was requested from me, let me show which things look confusing from point of view of first-time code reader.
R-package/R/lgb.cv.R
Outdated
setinfo(dtest, "group", group) | ||
setinfo(dtrain, "group", group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is group
fields shared across train and test sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was in the original code. Let me double-check what that is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I see now that I misunderstood how group
is working. I haven't worked with the learning-to-rank side of LightGBM
at all, and I'm struggling to understand this documentation.
It seems that group
going to a vector like c(10, 15, 20)
which means "the first 10 records are from one query, the next 15 are samples from another query, etc.".
If that's true, I don't understand what getinfo(data, "group")[-folds[[k]]$group]
could be doing. We don't have any R learning-to-rank examples I could find, so I need some help from @guolinke or someone else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refer to the python implementation: https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/engine.py#L308-L324
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a similar logics in R: https://github.com/microsoft/LightGBM/blob/master/R-package/R/lgb.cv.R#L409-L438
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have been staring at this code for a while and am still confused what it's doing. I am going to try to figure out a learning-to-rank example in R (since we don't have any in the documentation) and use that to step through the code. Going to try to figure out how to do this from the Python tests using lambdarank: https://github.com/microsoft/LightGBM/blob/master/tests/python_package_test/test_engine.py#L635
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this is written by @Laurae2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is working! Thanks to @Laurae2 for pointing me to the examples of learning-to-rank code for the R package in #791 and #397. Based on those, I've added unit tests on lgb.train()
and lgb.cv()
that at least cover the LTR-specific branches of that code.
Reviewers, could you take another look at this PR when you have a chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the review! I really appreciate it. They are good comments 😀 |
Very sorry for the late response, I will take a look today |
f3b6906
to
519735f
Compare
a6526bf
to
4b19bc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Thank you very much for finishing this PR! And special thanks for adding the test! But I think that tests can be more complicated or at least check more values. Because testing that something is working is only one half, another half is to test that it's working as it was expected.
4b19bc6
to
f14458b
Compare
I see there are some lingering linting issues. Will fix shortly! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can merge and lint after on a separate PR.
Thanks for the review! I'm going to fix the linting issues right now. I think it's better that we don't get in the habit of skipping linting and coming back to it later. |
f14458b
to
ead17bc
Compare
Thanks for the reviews @Laurae2 and @StrikerRUS . I think we are really close! I've addressed linting and other issues in the tests in this commit: ead17bc |
…servation weights (fixes microsoft#2572)
ead17bc
to
ee037c9
Compare
The failures in CI on the previous commit were caused by this interesting behavior, where basically I was using
|
Thanks for the reviews @Laurae2 and @StrikerRUS . I'm happy to get this one merged and fix this issue. |
See #2572 for details.