-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] sorting of observation weights and initial scores is broken in lgb.cv() #2572
Comments
…servation weights (fixes microsoft#2572)
…servation weights (fixes microsoft#2572)
…servation weights (fixes microsoft#2572)
@jameslamb Is this still being worked on? Anything I can do to help? I'm anxious to get this fixed :) |
@rgranvil yes it's still being worked on, sorry for the delay! I hope to have this fixed w ithin the next day |
…servation weights (fixes microsoft#2572)
Hey @rgranvil , I think I have this working! It will be a bit of time until reviewers are able to give feedback on my latest revisions and merge them, but if you're ok with living on an unmerged version of the R package then you can try the fix I just pushed to #2573 . |
…servation weights (fixes microsoft#2572)
…servation weights (fixes microsoft#2572)
…servation weights (fixes microsoft#2572)
In #2503 , it was reported that using certain sampling strategies could cause incorrect results if the indices for subsetting weren't sorted. This was "fixed" in the R package by #2524 , however that PR introduced a bug noted in #2524 (comment).
Essentially,
slice.lgb.Dataset()
takes in a vector of indices (row numbers) and returns a newDataset
with that subset. On its way to doing that, it sorts those indices. If a model had observation weights or initial scores, they don't get sorted the same way, meaning they'll refer to the wrong observations.To fix this, need to ensure that tuples of indices, weights, and initial scores are all kept together when re-sorting indices.
The text was updated successfully, but these errors were encountered: