-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve best batch and random forest classes #29
Conversation
9734f0f
to
ce6c690
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #29 +/- ##
==========================================
+ Coverage 97.72% 97.74% +0.01%
==========================================
Files 23 23
Lines 1230 1240 +10
==========================================
+ Hits 1202 1212 +10
Misses 28 28
|
self.a = a | ||
self.b = b | ||
self.perturbation_range = perturbation_range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is OK for me. To make the new feature perfect, I would add some value checks on a
, b
, and perturbation_range
. In particular:
a > 0.0
b > 0.0
perturbation_range > 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it makes sense. Probably perturbation_range
should be greater or equal to 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it makes sense. Probablu
perturbation_range
should be greater or equal to 1
If perturbation_range = 1
, integers(1, self.perturbation_range)
gives error:
np.random.default_rng().integers(1, 1) # raises ValueError: low >= high
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
""" | ||
super().__init__(batch_size, random_state, max_deduplication_passes) | ||
|
||
self._n_estimators = n_estimators | ||
self._criterion = criterion | ||
self._n_classes = n_classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the implementation of prepare_data_for_classifier
, n_classes
should be at least X, where at the bare minimum 1
should not break the code, but ideally we could require more bins, e.g. at least 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
ce6c690
to
cba47fd
Compare
Thank you @marcofavoritobi for your feedback. I implemented the checks you mentioned |
Proposed changes
I generalise the best batch constructor by letting it take the
a
andb
parameter of the beta-binomial distribution as well as theperturbation_range
applied to the parameter values.I generalise the random forest constructor by letting it take the number of classes (
n_classes
) used by the classifier in the fitting phase.