[R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) #2573

jameslamb · 2019-11-16T17:05:41Z

See #2572 for details.

StrikerRUS · 2019-11-16T21:27:19Z

Please rebase to the latest master to spot some linting issues.

StrikerRUS

Sorry if my comments are not so smart, but since review was requested from me, let me show which things look confusing from point of view of first-time code reader.

R-package/R/lgb.cv.R

StrikerRUS · 2019-11-17T18:38:08Z

R-package/R/lgb.cv.R

+        setinfo(dtest, "group", group)
+        setinfo(dtrain, "group", group)


Is group fields shared across train and test sets?

it was in the original code. Let me double-check what that is doing.

ok I see now that I misunderstood how group is working. I haven't worked with the learning-to-rank side of LightGBM at all, and I'm struggling to understand this documentation.

It seems that group going to a vector like c(10, 15, 20) which means "the first 10 records are from one query, the next 15 are samples from another query, etc.".

If that's true, I don't understand what getinfo(data, "group")[-folds[[k]]$group] could be doing. We don't have any R learning-to-rank examples I could find, so I need some help from @guolinke or someone else.

please refer to the python implementation: https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/engine.py#L308-L324

I think there is a similar logics in R: https://github.com/microsoft/LightGBM/blob/master/R-package/R/lgb.cv.R#L409-L438

Sorry, I have been staring at this code for a while and am still confused what it's doing. I am going to try to figure out a learning-to-rank example in R (since we don't have any in the documentation) and use that to step through the code. Going to try to figure out how to do this from the Python tests using lambdarank: https://github.com/microsoft/LightGBM/blob/master/tests/python_package_test/test_engine.py#L635

I remember this is written by @Laurae2

I think this is working! Thanks to @Laurae2 for pointing me to the examples of learning-to-rank code for the R package in #791 and #397. Based on those, I've added unit tests on lgb.train() and lgb.cv() that at least cover the LTR-specific branches of that code.

Reviewers, could you take another look at this PR when you have a chance?

oh also I did rebase to most recent master, to get the recent minor changes to the R package included (#2652 and #2654) in the tests

jameslamb · 2019-11-17T21:58:55Z

Sorry if my comments are not so smart, but since review was requested from me, let me show which things look confusing from point of view of first-time code reader.

thanks for the review! I really appreciate it. They are good comments 😀

guolinke · 2019-11-22T01:21:01Z

Very sorry for the late response, I will take a look today

StrikerRUS

@jameslamb Thank you very much for finishing this PR! And special thanks for adding the test! But I think that tests can be more complicated or at least check more values. Because testing that something is working is only one half, another half is to test that it's working as it was expected.

R-package/tests/testthat/test_learning_to_rank.R

jameslamb · 2020-01-12T07:09:08Z

I see there are some lingering linting issues. Will fix shortly!

Laurae2

You can merge and lint after on a separate PR.

R-package/tests/testthat/test_learning_to_rank.R

jameslamb · 2020-01-12T22:13:51Z

You can merge and lint after on a separate PR.

Thanks for the review! I'm going to fix the linting issues right now. I think it's better that we don't get in the habit of skipping linting and coming back to it later.

jameslamb · 2020-01-12T22:59:09Z

Thanks for the reviews @Laurae2 and @StrikerRUS . I think we are really close! I've addressed linting and other issues in the tests in this commit: ead17bc

…servation weights (fixes microsoft#2572)

…cter

jameslamb · 2020-01-13T01:15:49Z

The failures in CI on the previous commit were caused by this interesting behavior, where basically I was using && when I should have been using &, and the code range fine on my machine and failed with r-devel releases.

&& should only be used with inputs that are length-1, and I was using it (incorrectly) to compare to vectors. Just pushed a fix and hopefully that will work.

jameslamb · 2020-01-16T04:19:29Z

Thanks for the reviews @Laurae2 and @StrikerRUS . I'm happy to get this one merged and fix this issue.

jameslamb requested review from Laurae2, guolinke and StrikerRUS November 16, 2019 17:05

jameslamb changed the title ~~[R-package] fixed sorting issues in lgb.cv when using a model with observation weights (fixes #2572)~~ [R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) Nov 17, 2019

jameslamb force-pushed the bugfix/ordering branch from 6214e77 to 49b4c87 Compare November 17, 2019 03:56

StrikerRUS reviewed Nov 17, 2019

View reviewed changes

jameslamb added the in progress label Dec 2, 2019

jameslamb force-pushed the bugfix/ordering branch from f3b6906 to 519735f Compare December 2, 2019 21:03

jameslamb force-pushed the bugfix/ordering branch 2 times, most recently from a6526bf to 4b19bc6 Compare January 3, 2020 06:25

jameslamb mentioned this pull request Jan 3, 2020

[R-package] sorting of observation weights and initial scores is broken in lgb.cv() #2572

Closed

jameslamb added awaiting review and removed in progress labels Jan 4, 2020

StrikerRUS reviewed Jan 4, 2020

View reviewed changes

R-package/tests/testthat/test_learning_to_rank.R Show resolved Hide resolved

R-package/tests/testthat/test_learning_to_rank.R Outdated Show resolved Hide resolved

jameslamb added in progress and removed awaiting review labels Jan 6, 2020

jameslamb force-pushed the bugfix/ordering branch from 4b19bc6 to f14458b Compare January 12, 2020 01:32

jameslamb added awaiting review and removed in progress labels Jan 12, 2020

Laurae2 approved these changes Jan 12, 2020

View reviewed changes

Laurae2 reviewed Jan 12, 2020

View reviewed changes

R-package/tests/testthat/test_learning_to_rank.R Outdated Show resolved Hide resolved

jameslamb force-pushed the bugfix/ordering branch from f14458b to ead17bc Compare January 12, 2020 22:56

jameslamb added 2 commits January 12, 2020 18:11

[R-package] fixed sorting issues in lgb.cv when using a model with ob…

372022f

…servation weights (fixes microsoft#2572)

fixed linting issues

a9da045

jameslamb added 6 commits January 12, 2020 18:11

fixed test-train typo

9e893ef

tried to fix group setting in lgb.cv()

5651780

fixed groups in lgb.cv()

cb7be9d

fixed linting issues

0b142ec

[R-package] improved learning-to-rank tests

6b57ee0

[R-package] fixed linting issues and made learning-to-rank tests stri…

ee037c9

…cter

jameslamb force-pushed the bugfix/ordering branch from ead17bc to ee037c9 Compare January 13, 2020 00:14

fixed broken unit test

820680c

jameslamb merged commit dd16fa9 into microsoft:master Jan 16, 2020

jameslamb deleted the bugfix/ordering branch January 27, 2020 00:14

jameslamb mentioned this pull request Jan 29, 2020

lgb.cv data.table error - R package #2715

Closed

guolinke added fix and removed awaiting review labels Mar 1, 2020

lock bot locked as resolved and limited conversation to collaborators Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) #2573

[R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) #2573

jameslamb commented Nov 16, 2019

StrikerRUS commented Nov 16, 2019

StrikerRUS left a comment

StrikerRUS Nov 17, 2019

jameslamb Nov 17, 2019

jameslamb Nov 20, 2019

guolinke Nov 22, 2019

guolinke Nov 22, 2019

jameslamb Dec 23, 2019

guolinke Dec 24, 2019

jameslamb Jan 3, 2020

jameslamb Jan 3, 2020

jameslamb commented Nov 17, 2019

guolinke commented Nov 22, 2019

StrikerRUS left a comment

jameslamb commented Jan 12, 2020

Laurae2 left a comment

jameslamb commented Jan 12, 2020

jameslamb commented Jan 12, 2020

jameslamb commented Jan 13, 2020

jameslamb commented Jan 16, 2020

		setinfo(dtest, "group", group)
		setinfo(dtrain, "group", group)

[R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) #2573

[R-package] fixed sorting issues in lgb.cv() when using a model with observation weights (fixes #2572) #2573

Conversation

jameslamb commented Nov 16, 2019

StrikerRUS commented Nov 16, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb commented Nov 17, 2019

guolinke commented Nov 22, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Jan 12, 2020

Laurae2 left a comment

Choose a reason for hiding this comment

jameslamb commented Jan 12, 2020

jameslamb commented Jan 12, 2020

jameslamb commented Jan 13, 2020

jameslamb commented Jan 16, 2020