You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lgb.convert(), and lgb.convert_with_rules() should tell users in a log message if any columns remain unconverted because they are of a type that the function does not support. The names of those columns should be logged.
Motivation
The R package currently exports four functions that can be used to convert tabular datasets into model-ready form:
lgb.convert(): converts columns of type "character" and "factor" to "integer"
lgb.convert2(): similar to lgb.convert(), but returns a set of "rules" describing how non-numeric values were mapped to integer values. Also allows for user-provided rules, useful for cases where you want to be sure the encoding is the same on multiple datasets (e.g. training, test, and validation datasets)
These functions are intended to make it easier to create a model-ready dataset (all numeric or integer). The user expectation is likely that after using calling one of these functions on a dataset, that dataset is ready to use in a model. If that dataset contains columns of other types (not integer or numeric), the user should be notified in a log message.
Column types that these functions are unlikely to support:
POSIX*
Date
list
data.frame
data.table
The text was updated successfully, but these errors were encountered:
Closed in favor of being in #2302. We decided to keep all feature requests in one place.
Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.
jameslamb
changed the title
[R-package] lgb.prepare functions should warn on unconverted columns of unsupported types
[R-package] lgb.convert functions should warn on unconverted columns of unsupported types
Aug 1, 2020
Summary
lgb.convert()
, andlgb.convert_with_rules()
should tell users in a log message if any columns remain unconverted because they are of a type that the function does not support. The names of those columns should be logged.Motivation
The R package currently exports four functions that can be used to convert tabular datasets into model-ready form:
lgb.convert()
: converts columns of type"character"
and"factor"
to"integer"
lgb.convert2()
: similar tolgb.convert()
, but returns a set of "rules" describing how non-numeric values were mapped to integer values. Also allows for user-provided rules, useful for cases where you want to be sure the encoding is the same on multiple datasets (e.g. training, test, and validation datasets)These functions are intended to make it easier to create a model-ready dataset (all numeric or integer). The user expectation is likely that after using calling one of these functions on a dataset, that dataset is ready to use in a model. If that dataset contains columns of other types (not integer or numeric), the user should be notified in a log message.
Column types that these functions are unlikely to support:
POSIX*
Date
list
data.frame
data.table
The text was updated successfully, but these errors were encountered: