Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce BaseScheduler abstraction #52

Merged
merged 2 commits into from
Mar 21, 2023
Merged

Introduce BaseScheduler abstraction #52

merged 2 commits into from
Mar 21, 2023

Conversation

marcofavoritobi
Copy link
Contributor

@marcofavoritobi marcofavoritobi commented Mar 19, 2023

Proposed changes

Introduce BaseScheduler abstraction, with RoundRobinScheduler as default scheduler for Calibrator.

Fixes

n/a

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING doc
  • I am making a pull request against the main branch (left side). Also you should start your branch off our main.
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works

Further comments

To be merged after #39

Comment on lines +170 to +174
self.scheduler.random_state = self.random_state

# "burn" seeds from the calibrator seed generator for backward compatibility
for _ in self.scheduler.samplers:
self._get_random_seed()
Copy link
Contributor Author

@marcofavoritobi marcofavoritobi Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really to show how the proposed change could, in theory, preserve the same behaviour.

Clearly "burning random seeds" is a bit odd thing to do.

Moreover, self.scheduler.random_state = self.random_state should be changed to self.scheduler.random_state = self._get_random_seed(), which necessarily detours from the old (exactly reproducible) behaviour.

@codecov-commenter
Copy link

codecov-commenter commented Mar 19, 2023

Codecov Report

Merging #52 (4b10e30) into main (c0e2e68) will increase coverage by 0.06%.
The diff coverage is 98.90%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   96.86%   96.92%   +0.06%     
==========================================
  Files          31       34       +3     
  Lines        1499     1563      +64     
==========================================
+ Hits         1452     1515      +63     
- Misses         47       48       +1     
Impacted Files Coverage Δ
black_it/calibrator.py 97.00% <97.50%> (-0.30%) ⬇️
black_it/schedulers/__init__.py 100.00% <100.00%> (ø)
black_it/schedulers/base.py 100.00% <100.00%> (ø)
black_it/schedulers/round_robin.py 100.00% <100.00%> (ø)
black_it/utils/json_pandas_checkpointing.py 100.00% <100.00%> (ø)

Comment on lines 216 to +217
# overwrite the list of samplers
self.samplers = samplers
self.scheduler._samplers = tuple(samplers) # pylint: disable=protected-access
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be illegal, and we inherit it from the not-so-clean-design of including samplers_id_table into calibrator. Anyway, topic for another PR, let's put this feature in first while keeping the rest to work as usual.

@@ -476,7 +500,7 @@ def create_checkpoint(self, file_name: Union[str, os.PathLike]) -> None:
self.random_state,
self.random_generator.bit_generator.state,
model_name,
self.samplers,
self.scheduler,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breaking change: we now pickle scheduler rather than samplers.

@@ -107,7 +107,7 @@ def save_calibrator_state( # pylint: disable=too-many-arguments,too-many-locals
initial_random_seed: Optional[int],
random_generator_state: Mapping,
model_name: str,
samplers: Sequence[BaseSampler],
scheduler: BaseScheduler,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breaking change on checkpointing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO update sqlite checkpointing as well

@@ -174,7 +174,7 @@ def save_calibrator_state( # pylint: disable=too-many-arguments,too-many-locals

# save instantiated samplers and loss functions
with open(checkpoint_path / "samplers_pickled.pickle", "wb") as fb:
pickle.dump(samplers, fb)
pickle.dump(scheduler, fb)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should samplers_pickle be renamed? maybe not...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably yes actually, I'll take care of that

@@ -222,7 +231,7 @@ def test_calibrator_restore_from_checkpoint_and_set_sampler() -> None:
type(vars_cal["param_grid"]).__name__
== type(cal_restored.param_grid).__name__ # noqa
)
elif key == "_random_generator":
elif key == f"_{BaseSeedable.__name__}__random_generator":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name mangling... duh!

…ility

In terms of functionality, we only had to change the number of batches,
since nnow each batch only runs one sampler.

This commit will be probably amended and/or splitted in smaller commits.
@AldoGl AldoGl marked this pull request as ready for review March 21, 2023 10:14
new_simulated_data: NDArray[np.float64],
) -> None:
"""Update the state of the scheduler after each batch."""
self._batch_id += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcofavoritobi what about setting this update function to be the default (non abstract) method for all schedulers? What's your opinion on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Actually, would be a first step toward a "shared (read-only) calibration state object" that is accessible both from each sampler and from the scheduler.

@AldoGl AldoGl force-pushed the feat/scheduler branch 3 times, most recently from b5dcde5 to 4b10e30 Compare March 21, 2023 11:11
Copy link
Contributor

@AldoGl AldoGl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! A very useful feature to extend the flexibility of the package in the adaptive selection of specific samplers!

@marcofavoritobi marcofavoritobi merged commit 84903a1 into main Mar 21, 2023
@marcofavoritobi marcofavoritobi deleted the feat/scheduler branch March 21, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants