-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid most_freq_bin to be 0 in categorical features #2824
Conversation
src/io/bin.cpp
Outdated
static_cast<double>(cnt_in_bin[most_freq_bin_]) / total_sample_cnt; | ||
// When most_freq_bin_ != default_bin_, there are some additional data loading costs. | ||
// so use most_freq_bin_ = default_bin_ when there is not so sparse | ||
if (most_freq_bin_ != default_bin_ && max_sparse_rate <= 0.7f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kSparseThreshold
?
LightGBM/include/LightGBM/bin.h
Line 36 in e676af2
const double kSparseThreshold = 0.7; |
CHECK(default_bin_ > 0); | ||
if (most_freq_bin_ == 0) { | ||
CHECK(num_bin_ > 1); | ||
// FIXME: how to enable `most_freq_bin_ = 0` for categorical features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guolinke Create issue for FIXME to not lose it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ I agree, we should be using issues instead of TODO
, FIXME
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this a not trivial, so a fixme
is here.
BTW, PR #2463 will rewrite categorical feature solutions, and these codes are not needed.
to fix #2742