Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve best batch and random forest classes #29

Merged
merged 3 commits into from
Dec 1, 2022
Merged

Conversation

AldoGl
Copy link
Contributor

@AldoGl AldoGl commented Nov 30, 2022

Proposed changes

I generalise the best batch constructor by letting it take the a and b parameter of the beta-binomial distribution as well as the perturbation_range applied to the parameter values.

I generalise the random forest constructor by letting it take the number of classes (n_classes) used by the classifier in the fitting phase.

@AldoGl AldoGl force-pushed the improve_BB_RF_classes branch from 9734f0f to ce6c690 Compare November 30, 2022 14:36
@codecov-commenter
Copy link

Codecov Report

Merging #29 (ce6c690) into main (f7a49f9) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #29      +/-   ##
==========================================
+ Coverage   97.72%   97.74%   +0.01%     
==========================================
  Files          23       23              
  Lines        1230     1240      +10     
==========================================
+ Hits         1202     1212      +10     
  Misses         28       28              
Impacted Files Coverage Δ
black_it/samplers/best_batch.py 100.00% <100.00%> (ø)
black_it/samplers/random_forest.py 100.00% <100.00%> (ø)

Comment on lines +62 to +78
self.a = a
self.b = b
self.perturbation_range = perturbation_range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK for me. To make the new feature perfect, I would add some value checks on a, b, and perturbation_range. In particular:

  • a > 0.0
  • b > 0.0
  • perturbation_range > 1

Copy link
Contributor Author

@AldoGl AldoGl Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it makes sense. Probably perturbation_range should be greater or equal to 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it makes sense. Probablu perturbation_range should be greater or equal to 1

If perturbation_range = 1, integers(1, self.perturbation_range) gives error:

np.random.default_rng().integers(1, 1) # raises ValueError: low >= high

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

"""
super().__init__(batch_size, random_state, max_deduplication_passes)

self._n_estimators = n_estimators
self._criterion = criterion
self._n_classes = n_classes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the implementation of prepare_data_for_classifier, n_classes should be at least X, where at the bare minimum 1 should not break the code, but ideally we could require more bins, e.g. at least 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@AldoGl AldoGl force-pushed the improve_BB_RF_classes branch from ce6c690 to cba47fd Compare November 30, 2022 15:55
@AldoGl
Copy link
Contributor Author

AldoGl commented Nov 30, 2022

Thank you @marcofavoritobi for your feedback. I implemented the checks you mentioned

@AldoGl AldoGl merged commit 40f314f into main Dec 1, 2022
@AldoGl AldoGl deleted the improve_BB_RF_classes branch December 1, 2022 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants