Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) #2573

Merged
merged 9 commits into from
Jan 16, 2020

Conversation

jameslamb
Copy link
Collaborator

See #2572 for details.

@StrikerRUS
Copy link
Collaborator

Please rebase to the latest master to spot some linting issues.

@jameslamb jameslamb changed the title [R-package] fixed sorting issues in lgb.cv when using a model with observation weights (fixes #2572) [R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) Nov 17, 2019
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my comments are not so smart, but since review was requested from me, let me show which things look confusing from point of view of first-time code reader.

R-package/R/lgb.cv.R Outdated Show resolved Hide resolved
Comment on lines 309 to 310
setinfo(dtest, "group", group)
setinfo(dtrain, "group", group)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is group fields shared across train and test sets?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was in the original code. Let me double-check what that is doing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see now that I misunderstood how group is working. I haven't worked with the learning-to-rank side of LightGBM at all, and I'm struggling to understand this documentation.

It seems that group going to a vector like c(10, 15, 20) which means "the first 10 records are from one query, the next 15 are samples from another query, etc.".

If that's true, I don't understand what getinfo(data, "group")[-folds[[k]]$group] could be doing. We don't have any R learning-to-rank examples I could find, so I need some help from @guolinke or someone else.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have been staring at this code for a while and am still confused what it's doing. I am going to try to figure out a learning-to-rank example in R (since we don't have any in the documentation) and use that to step through the code. Going to try to figure out how to do this from the Python tests using lambdarank: https://github.com/microsoft/LightGBM/blob/master/tests/python_package_test/test_engine.py#L635

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this is written by @Laurae2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is working! Thanks to @Laurae2 for pointing me to the examples of learning-to-rank code for the R package in #791 and #397. Based on those, I've added unit tests on lgb.train() and lgb.cv() that at least cover the LTR-specific branches of that code.

Reviewers, could you take another look at this PR when you have a chance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh also I did rebase to most recent master, to get the recent minor changes to the R package included (#2652 and #2654) in the tests

@jameslamb
Copy link
Collaborator Author

Sorry if my comments are not so smart, but since review was requested from me, let me show which things look confusing from point of view of first-time code reader.

thanks for the review! I really appreciate it. They are good comments 😀

@guolinke
Copy link
Collaborator

Very sorry for the late response, I will take a look today

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Thank you very much for finishing this PR! And special thanks for adding the test! But I think that tests can be more complicated or at least check more values. Because testing that something is working is only one half, another half is to test that it's working as it was expected.

R-package/tests/testthat/test_learning_to_rank.R Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

I see there are some lingering linting issues. Will fix shortly!

Copy link
Contributor

@Laurae2 Laurae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can merge and lint after on a separate PR.

@jameslamb
Copy link
Collaborator Author

You can merge and lint after on a separate PR.

Thanks for the review! I'm going to fix the linting issues right now. I think it's better that we don't get in the habit of skipping linting and coming back to it later.

@jameslamb
Copy link
Collaborator Author

Thanks for the reviews @Laurae2 and @StrikerRUS . I think we are really close! I've addressed linting and other issues in the tests in this commit: ead17bc

@jameslamb
Copy link
Collaborator Author

The failures in CI on the previous commit were caused by this interesting behavior, where basically I was using && when I should have been using &, and the code range fine on my machine and failed with r-devel releases.

&& should only be used with inputs that are length-1, and I was using it (incorrectly) to compare to vectors. Just pushed a fix and hopefully that will work.

@jameslamb
Copy link
Collaborator Author

Thanks for the reviews @Laurae2 and @StrikerRUS . I'm happy to get this one merged and fix this issue.

@jameslamb jameslamb merged commit dd16fa9 into microsoft:master Jan 16, 2020
@jameslamb jameslamb deleted the bugfix/ordering branch January 27, 2020 00:14
@lock lock bot locked as resolved and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants