-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix predict with header #2643
fix predict with header #2643
Conversation
@araman0 can you try this fix? |
} | ||
|
||
if (label_idx < 0) { | ||
if (output_label_index < 0 && label_idx >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guolinke
It doesn't output "Data file drop_label.tsv doesn't contain a label column."
$ head tmp.tsv
label qid f1 f2 f3 f4
0 100 0.13835814976295646 0.4402769613464669 0.946869449084728 0.5359206091455541
0 100 0.01386793487399507 0.7304609364663149 0.6480159106639183 0.41084065861637076
0 100 0.36082263352323185 0.19255932163030642 0.5832995411454666 0.4231199512157455
1 100 0.24434117994194782 0.40654813181749727 0.5760772294720323 0.5689696349004272
2 100 0.2640113555206345 0.9101757126052573 0.1468593731027389 0.6202184617895963
0 101 0.6445492878877913 0.9393752647865824 0.8782811206461665 0.06155760291074741
0 101 0.0391478117208669 0.8289305808544266 0.15732825570638398 0.810192132529246
0 101 0.41636558182363037 0.06300000298506869 0.19768374800057087 0.8448096990814711
1 101 0.7191740187766613 0.945746842262128 0.8570465518925168 0.2125836005675651
$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=tmp.tsv header=true output_result=pred_tmp.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction
$ head pred_tmp.txt
-0.02745108988247158
-0.39943097926145144
0.63986234187669611
1.0541387019913457
1.6918502361811527
-1.0965348952564287
0.81806604138389494
-1.1307052283260748
-0.97471278396427918
1.3047786665317376
$ head drop_label.tsv
qid f1 f2 f3 f4
100 0.13835814976295646 0.4402769613464669 0.946869449084728 0.5359206091455541
100 0.01386793487399507 0.7304609364663149 0.6480159106639183 0.41084065861637076
100 0.36082263352323185 0.19255932163030642 0.5832995411454666 0.4231199512157455
100 0.24434117994194782 0.40654813181749727 0.5760772294720323 0.5689696349004272
100 0.2640113555206345 0.9101757126052573 0.1468593731027389 0.6202184617895963
101 0.6445492878877913 0.9393752647865824 0.8782811206461665 0.06155760291074741
101 0.0391478117208669 0.8289305808544266 0.15732825570638398 0.810192132529246
101 0.41636558182363037 0.06300000298506869 0.19768374800057087 0.8448096990814711
101 0.7191740187766613 0.945746842262128 0.8570465518925168 0.2125836005675651
$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=drop_label.tsv header=true output_result=pred_drop_label.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction
$ head pred_drop_label.txt
-0.02745108988247158
-0.39943097926145144
0.63986234187669611
1.0541387019913457
1.6918502361811527
-1.0965348952564287
0.81806604138389494
-1.1307052283260748
-0.97471278396427918
1.3047786665317376
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this output is designed for the file without the header.
When there is no header, we need to exclude the label column, otherwise, we will read the wrong features. Provide this output is to let the user confirm whether we process his input correctly or not.
When there is a header, we don't need to exclude the label column, as we can get feature by its name.
feature_names_map_[i] = j; | ||
break; | ||
} | ||
if (header_mapper.count(header_words[i]) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid duplicated feature names
@guolinke I've confirmed it outputs same predictions. thx!
|
to fix #2642