documentation: Clarify n_subfeatures in build_tree? #224

mlesnoff · 2023-03-22T13:19:36Z

I have a question about a point that I did not find in the documentation of DecistionTree.jl

For function build_tree, it is indicated for argument n_subfeatures:

n_subfeatures: number of features to select at random (default: 0, keep all)

Is the features random selection done at each split of the tree or only one single time before to build the tree?

For function build_forest, it is indicated that the selection is done at each split (as in usual RF):

n_subfeatures: number of features to consider at random per split (default: -1, sqrt(# features))

therefore I presume that it is the same for build_tree, but I am not sure. Could you confirm (and eventually add it to the doc)?

Another question - What is the method used to split: the "exact" method or is it an approximated histogram-based method? I did not find indications in the doc.

The text was updated successfully, but these errors were encountered:

ablaom · 2023-03-22T21:14:12Z

Is the features random selection done at each split of the tree or only one single time before to build the tree?

Yes, at each split.

What is the method used to split: the "exact" method or is it an approximated histogram-based method? I did not find indications in the doc.

The implementation is CART, which means exact.

You may want to keep in mind that all splits assume the feature is ordered and uses that ordering in the splitting algorithm, which means certain splits are never considered if the feature is unordered (but accepted by the algorithm because it is encoded using a type with an order, such as Int).

Another common gotcha is that setting n_subfeatures to the maximum number of features does not strictly recreate the classic CART algorithm because the features are still shuffled, leading to RNG-dependent resolution of feature ties (draws).

ablaom · 2023-03-22T21:15:37Z

Closed as tracked.

mlesnoff · 2023-03-22T21:29:26Z

Thanks for the infos @ablaom, and congrats for this package that has become very fast

ablaom · 2023-03-22T22:15:58Z

Your welcome.

congrats for this package that has become very fast

I'm just a maintainer. The main work was carried out by @bensadeghi and others.

ablaom mentioned this issue Mar 22, 2023

[Tracking Issue] Add document strings to public methods #215

Open

22 tasks

ablaom added the documentation label Mar 22, 2023

ablaom closed this as completed Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation: Clarify n_subfeatures in build_tree? #224

documentation: Clarify n_subfeatures in build_tree? #224

mlesnoff commented Mar 22, 2023

ablaom commented Mar 22, 2023 •

edited

Loading

ablaom commented Mar 22, 2023

mlesnoff commented Mar 22, 2023

ablaom commented Mar 22, 2023

documentation: Clarify n_subfeatures in build_tree? #224

documentation: Clarify n_subfeatures in build_tree? #224

Comments

mlesnoff commented Mar 22, 2023

ablaom commented Mar 22, 2023 • edited Loading

ablaom commented Mar 22, 2023

mlesnoff commented Mar 22, 2023

ablaom commented Mar 22, 2023

ablaom commented Mar 22, 2023 •

edited

Loading