Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix predict with header #2643

Merged
merged 3 commits into from
Dec 20, 2019
Merged

fix predict with header #2643

merged 3 commits into from
Dec 20, 2019

Conversation

guolinke
Copy link
Collaborator

to fix #2642

@guolinke guolinke requested a review from chivee as a code owner December 19, 2019 10:25
@guolinke
Copy link
Collaborator Author

@araman0 can you try this fix?

}

if (label_idx < 0) {
if (output_label_index < 0 && label_idx >= 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke
It doesn't output "Data file drop_label.tsv doesn't contain a label column."

$ head tmp.tsv 
label	qid	f1	f2	f3	f4
0	100	0.13835814976295646	0.4402769613464669	0.946869449084728	0.5359206091455541
0	100	0.01386793487399507	0.7304609364663149	0.6480159106639183	0.41084065861637076
0	100	0.36082263352323185	0.19255932163030642	0.5832995411454666	0.4231199512157455
1	100	0.24434117994194782	0.40654813181749727	0.5760772294720323	0.5689696349004272
2	100	0.2640113555206345	0.9101757126052573	0.1468593731027389	0.6202184617895963
0	101	0.6445492878877913	0.9393752647865824	0.8782811206461665	0.06155760291074741
0	101	0.0391478117208669	0.8289305808544266	0.15732825570638398	0.810192132529246
0	101	0.41636558182363037	0.06300000298506869	0.19768374800057087	0.8448096990814711
1	101	0.7191740187766613	0.945746842262128	0.8570465518925168	0.2125836005675651

$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=tmp.tsv header=true output_result=pred_tmp.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction

$ head pred_tmp.txt 
-0.02745108988247158
-0.39943097926145144
0.63986234187669611
1.0541387019913457
1.6918502361811527
-1.0965348952564287
0.81806604138389494
-1.1307052283260748
-0.97471278396427918
1.3047786665317376

$ head drop_label.tsv 
qid	f1	f2	f3	f4
100	0.13835814976295646	0.4402769613464669	0.946869449084728	0.5359206091455541
100	0.01386793487399507	0.7304609364663149	0.6480159106639183	0.41084065861637076
100	0.36082263352323185	0.19255932163030642	0.5832995411454666	0.4231199512157455
100	0.24434117994194782	0.40654813181749727	0.5760772294720323	0.5689696349004272
100	0.2640113555206345	0.9101757126052573	0.1468593731027389	0.6202184617895963
101	0.6445492878877913	0.9393752647865824	0.8782811206461665	0.06155760291074741
101	0.0391478117208669	0.8289305808544266	0.15732825570638398	0.810192132529246
101	0.41636558182363037	0.06300000298506869	0.19768374800057087	0.8448096990814711
101	0.7191740187766613	0.945746842262128	0.8570465518925168	0.2125836005675651

$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=drop_label.tsv header=true output_result=pred_drop_label.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction

$ head pred_drop_label.txt 
-0.02745108988247158
-0.39943097926145144
0.63986234187669611
1.0541387019913457
1.6918502361811527
-1.0965348952564287
0.81806604138389494
-1.1307052283260748
-0.97471278396427918
1.3047786665317376

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this output is designed for the file without the header.
When there is no header, we need to exclude the label column, otherwise, we will read the wrong features. Provide this output is to let the user confirm whether we process his input correctly or not.

When there is a header, we don't need to exclude the label column, as we can get feature by its name.

feature_names_map_[i] = j;
break;
}
if (header_mapper.count(header_words[i]) > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never be used?

Copy link
Collaborator Author

@guolinke guolinke Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid duplicated feature names

@araman0
Copy link

araman0 commented Dec 20, 2019

@guolinke I've confirmed it outputs same predictions. thx!

$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=tmp.tsv header=true output_result=pred_tmp.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction
$ head pred_tmp.txt 
-0.67235347824716607
-0.32698080395337359
-0.28923179728330589
0.18131331367689701
1.018185089266975
-1.2743996252961152
-1.3109041864684552
-0.81249028101368681
-0.27020354133159391
0.12807112089932743
$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=shuffle1.tsv header=true output_result=pred_shuffle1.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction
$ head pred_shuffle1.txt 
-0.67235347824716607
-0.32698080395337359
-0.28923179728330589
0.18131331367689701
1.018185089266975
-1.2743996252961152
-1.3109041864684552
-0.81249028101368681
-0.27020354133159391
0.12807112089932743
$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=shuffle2.tsv header=true output_result=pred_shuffle2.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction
$ head pred_shuffle2.txt 
-0.67235347824716607
-0.32698080395337359
-0.28923179728330589
0.18131331367689701
1.018185089266975
-1.2743996252961152
-1.3109041864684552
-0.81249028101368681
-0.27020354133159391
0.12807112089932743
$ LightGBM/lightgbm task=predict input_model=lgbm_model.txt data=shuffle3.tsv header=true output_result=pred_shuffle3.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction
$ head pred_shuffle3.txt 
-0.67235347824716607
-0.32698080395337359
-0.28923179728330589
0.18131331367689701
1.018185089266975
-1.2743996252961152
-1.3109041864684552
-0.81249028101368681
-0.27020354133159391
0.12807112089932743

$ LightGBM/lightgbm task=predict input_model=lgbm_ignore_model.txt data=tmp.tsv header=true output_result=pred_tmp_ignore.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Info] Finished prediction
$ head pred_tmp_ignore.txt
-0.43058259683609235
-0.098525361939025474
-0.2048474171242694
0.026815875041871856
0.94411773547211675
-0.89330324800246408
-1.3645662587572376
-0.5720641564103417
-0.46146592181818258
0.069007612700339171
$ LightGBM/lightgbm task=predict input_model=lgbm_ignore_model.txt data=drop_column.tsv header=true output_result=pred_drop_column_ignore.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Warning] Feature (f4) is missed in data file. If it is weight/query/group/ignore_column, you can ignore this warning.
[LightGBM] [Info] Finished prediction
$ head pred_drop_column_ignore.txt
-0.43058259683609235
-0.098525361939025474
-0.2048474171242694
0.026815875041871856
0.94411773547211675
-0.89330324800246408
-1.3645662587572376
-0.5720641564103417
-0.46146592181818258
0.069007612700339171
$ LightGBM/lightgbm task=predict input_model=lgbm_ignore_model.txt data=add_column.tsv header=true output_result=pred_add_column_ignore.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished initializing prediction, total used 100 iterations
[LightGBM] [Warning] Feature (f4) is missed in data file. If it is weight/query/group/ignore_column, you can ignore this warning.
[LightGBM] [Info] Finished prediction
$ head pred_add_column_ignore.txt
-0.43058259683609235
-0.098525361939025474
-0.2048474171242694
0.026815875041871856
0.94411773547211675
-0.89330324800246408
-1.3645662587572376
-0.5720641564103417
-0.46146592181818258
0.069007612700339171

@guolinke guolinke merged commit ae320e5 into master Dec 20, 2019
@guolinke guolinke deleted the predict-with-header branch December 20, 2019 03:53
@guolinke guolinke added the fix label Mar 1, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants