Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid most_freq_bin to be 0 in categorical features #2824

Merged
merged 6 commits into from
Feb 27, 2020

Conversation

guolinke
Copy link
Collaborator

to fix #2742

@guolinke guolinke requested a review from chivee as a code owner February 26, 2020 03:44
src/io/bin.cpp Outdated Show resolved Hide resolved
src/io/bin.cpp Outdated
static_cast<double>(cnt_in_bin[most_freq_bin_]) / total_sample_cnt;
// When most_freq_bin_ != default_bin_, there are some additional data loading costs.
// so use most_freq_bin_ = default_bin_ when there is not so sparse
if (most_freq_bin_ != default_bin_ && max_sparse_rate <= 0.7f) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kSparseThreshold?

const double kSparseThreshold = 0.7;

src/io/bin.cpp Outdated Show resolved Hide resolved
@guolinke guolinke merged commit e502ed0 into master Feb 27, 2020
@guolinke guolinke deleted the fix-cat-with-most-freq-bin branch February 27, 2020 15:13
CHECK(default_bin_ > 0);
if (most_freq_bin_ == 0) {
CHECK(num_bin_ > 1);
// FIXME: how to enable `most_freq_bin_ = 0` for categorical features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke Create issue for FIXME to not lose it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ I agree, we should be using issues instead of TODO, FIXME, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this a not trivial, so a fixme is here.
BTW, PR #2463 will rewrite categorical feature solutions, and these codes are not needed.

@guolinke guolinke added the fix label Mar 1, 2020
@lock lock bot locked as resolved and limited conversation to collaborators May 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LightGBMError: Check failed: best_split_info.left_count > 0 for ranking task
3 participants