-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interaction constraints #3126
Interaction constraints #3126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @btrotta ! I left some minor comments, but will defer to C++ maintainers so I'll just leave a "Comment" review.
docs/Parameters.rst
Outdated
|
||
- for Python, specify as list of lists, e.g. ``[[0, 1, 2], [2, 3]]`` | ||
|
||
- for R, specify as a string similar to the CLI format (ensuring there are no spaces), e.g. ``"[0,1,2],[2,3]"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For R, I think that we should make the interface "a list of character vectors, where each vector is a list of feature names", and then inside the R package turn the feature names into indices.
I think passing a string of an array in a comma-delimited list puts too much burden on R users. @Laurae2 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Good suggestion, thanks! I've tried to implement this, but I'm not that familiar with R so would be great if you could take a look when you get a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be ok to say here that it is not supported in the R package yet, and then create an issue like [R-package] add support for interaction_constraints
and add it to #2302 . Then I or another R maintainer can get to it without delaying this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see #3130
python-package/lightgbm/basic.py
Outdated
pairs.append(str(key) + '=' + ','.join(map(str, val))) | ||
def to_string(x): | ||
if isinstance(x, list): | ||
return '[' + ','.join(map(str, x)) + ']' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return '[' + ','.join(map(str, x)) + ']' | |
return "[{}]".format(','.join(map(str, x))) |
I think this would be a little easier to read using string interpolation. I recommend .format()
since it's an option supported in Python 2 and 3. @StrikerRUS what do you think?
@@ -124,6 +124,60 @@ lgb.train <- function(params = list(), | |||
end_iteration <- begin_iteration + nrounds - 1L | |||
} | |||
|
|||
# Convert interaction constraints to feature numbers | |||
if (!is.null(params[["interaction_constraints"]])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this!
@btrotta I'll leave some comments here and you can decide, but I think it would make sense to do this in a separate pull request so it doesn't hold this one up.
- Unfortunately we have a lot of duplication between
lgb.train()
andlgb.cv()
today
(https://github.com/microsoft/LightGBM/blob/master/R-package/R/lgb.cv.R), so handling for this parameter needs to be added in both places. I think it would make sense to create a new internal function like.check_interaction_constraints()
, similar to how we havelgb.check.obj()
, that is then called by both functions. That would also make it easier to unit test the handling of this parameter. - We should add some validation that
interaction_constraints
is a list of either numeric or character vectors. Like this
interaction_constraints <- params[["interaction_constraints"]]
if (!is.null(interaction_constraints)){
# validation
if (!methods::is(interaction_constraints, "list")){
stop("interaction_constraints must be a list")
}
if (!all(sapply(interaction_constraints, function(x){is.character(x) || is.numeric(x)}))){
stop("every element in interaction_constraints must be a character vector or numeric vector)
}
# rest of the code
}
- I think the uses of
as.list()
in this code should be replaced withas.integer()
- I'd like to have unit tests, for both
lgb.train()
andlgb.cv()
. That will catch some issues and prevent us from breaking this functionality in the future.- validation that you get an informative error if you don't pass a
list
object - validation that you get an informative error if you pass a
list
of something other than character vectors and numeric vectors - validation that you get an informative error for indices that are greater than
max_features
- test training with only passing indices
- test training with only passing feature names
- test training with passing a mix of indices and feature names
- validation that you get an informative error if you don't pass a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sure, I have moved these changes to new PR #3136.
@btrotta is this ready to review? |
@guolinke Yes |
return std::vector<int8_t>(train_data_->num_features(), 1); | ||
std::vector<int8_t> GetByNode(const Tree* tree, int leaf) { | ||
// get interaction constraints for current branch | ||
std::unordered_set<int> allowed_features; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that possible to cache the used features of all branches, rather than re-generate them every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the code so that we keep track of the features in each branch as we build the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much! However, it seems the tracking is still enabled even when interaction constraint is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I changed it now so it only keeps track if interaction constraint is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much! Looks good to me
@StrikerRUS Thanks, have updated according to your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks!
@btrotta: Simply fantastic, thank you so much for digging into this and the great implementation. I made some tests in R and all make perfect sense:
Example 1: Additive model with depth > 1
Okay -> pairwise H-statistic values are all 0. H measures the interaction strength in a model agnostic way based on bivariate partial dependence curves. Some additive, some pairwise
Okay -> there is only one interacting variable pair. Overlapping sets
-> Okay as well! X2 and X3 are not interacting, as intended. And note to myself: Unlisted variables are modelled without interaction. |
@mayer79 Thanks for doing those additional tests, good to know it's working as expected! |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Implement interaction constraints feature requested in #2884.