Introduce `BaseScheduler` abstraction #52

marcofavoritobi · 2023-03-19T23:27:50Z

Proposed changes

Introduce BaseScheduler abstraction, with RoundRobinScheduler as default scheduler for Calibrator.

Fixes

n/a

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING doc
I am making a pull request against the main branch (left side). Also you should start your branch off our main.
Lint and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works

Further comments

To be merged after #39

marcofavoritobi · 2023-03-19T23:30:24Z

black_it/calibrator.py

+        self.scheduler.random_state = self.random_state
+
+        # "burn" seeds from the calibrator seed generator for backward compatibility
+        for _ in self.scheduler.samplers:
+            self._get_random_seed()


this is really to show how the proposed change could, in theory, preserve the same behaviour.

Clearly "burning random seeds" is a bit odd thing to do.

Moreover, self.scheduler.random_state = self.random_state should be changed to self.scheduler.random_state = self._get_random_seed(), which necessarily detours from the old (exactly reproducible) behaviour.

codecov-commenter · 2023-03-19T23:31:10Z

Codecov Report

Merging #52 (4b10e30) into main (c0e2e68) will increase coverage by 0.06%.
The diff coverage is 98.90%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   96.86%   96.92%   +0.06%     
==========================================
  Files          31       34       +3     
  Lines        1499     1563      +64     
==========================================
+ Hits         1452     1515      +63     
- Misses         47       48       +1

Impacted Files	Coverage Δ
black_it/calibrator.py	`97.00% <97.50%> (-0.30%)`	⬇️
black_it/schedulers/__init__.py	`100.00% <100.00%> (ø)`
black_it/schedulers/base.py	`100.00% <100.00%> (ø)`
black_it/schedulers/round_robin.py	`100.00% <100.00%> (ø)`
black_it/utils/json_pandas_checkpointing.py	`100.00% <100.00%> (ø)`

marcofavoritobi · 2023-03-19T23:32:43Z

black_it/calibrator.py

        # overwrite the list of samplers
-        self.samplers = samplers
+        self.scheduler._samplers = tuple(samplers)  # pylint: disable=protected-access


this should be illegal, and we inherit it from the not-so-clean-design of including samplers_id_table into calibrator. Anyway, topic for another PR, let's put this feature in first while keeping the rest to work as usual.

marcofavoritobi · 2023-03-19T23:35:14Z

black_it/calibrator.py

@@ -476,7 +500,7 @@ def create_checkpoint(self, file_name: Union[str, os.PathLike]) -> None:
            self.random_state,
            self.random_generator.bit_generator.state,
            model_name,
-            self.samplers,
+            self.scheduler,


breaking change: we now pickle scheduler rather than samplers.

marcofavoritobi · 2023-03-19T23:35:32Z

black_it/utils/json_pandas_checkpointing.py

@@ -107,7 +107,7 @@ def save_calibrator_state(  # pylint: disable=too-many-arguments,too-many-locals
    initial_random_seed: Optional[int],
    random_generator_state: Mapping,
    model_name: str,
-    samplers: Sequence[BaseSampler],
+    scheduler: BaseScheduler,


breaking change on checkpointing

TODO update sqlite checkpointing as well

marcofavoritobi · 2023-03-19T23:35:50Z

black_it/utils/json_pandas_checkpointing.py

@@ -174,7 +174,7 @@ def save_calibrator_state(  # pylint: disable=too-many-arguments,too-many-locals

    # save instantiated samplers and loss functions
    with open(checkpoint_path / "samplers_pickled.pickle", "wb") as fb:
-        pickle.dump(samplers, fb)
+        pickle.dump(scheduler, fb)


should samplers_pickle be renamed? maybe not...

probably yes actually, I'll take care of that

marcofavoritobi · 2023-03-19T23:40:28Z

tests/test_calibrator.py

@@ -222,7 +231,7 @@ def test_calibrator_restore_from_checkpoint_and_set_sampler() -> None:
                type(vars_cal["param_grid"]).__name__
                == type(cal_restored.param_grid).__name__  # noqa
            )
-        elif key == "_random_generator":
+        elif key == f"_{BaseSeedable.__name__}__random_generator":


name mangling... duh!

…ility In terms of functionality, we only had to change the number of batches, since nnow each batch only runs one sampler. This commit will be probably amended and/or splitted in smaller commits.

AldoGl · 2023-03-21T10:16:02Z

black_it/schedulers/round_robin.py

+        new_simulated_data: NDArray[np.float64],
+    ) -> None:
+        """Update the state of the scheduler after each batch."""
+        self._batch_id += 1


@marcofavoritobi what about setting this update function to be the default (non abstract) method for all schedulers? What's your opinion on this?

That's a good point. Actually, would be a first step toward a "shared (read-only) calibration state object" that is accessible both from each sampler and from the scheduler.

…' to 'scheduler' in json checkpointing

AldoGl

Great! A very useful feature to extend the flexibility of the package in the adaptive selection of specific samplers!

marcofavoritobi commented Mar 19, 2023

View reviewed changes

marcofavoritobi force-pushed the feat/scheduler branch from f64738f to adf42bb Compare March 19, 2023 23:41

feat: add BaseScheduler and RoundRobinScheduler for backward compatib…

aa89fc0

…ility In terms of functionality, we only had to change the number of batches, since nnow each batch only runs one sampler. This commit will be probably amended and/or splitted in smaller commits.

AldoGl force-pushed the feat/scheduler branch from adf42bb to aa89fc0 Compare March 21, 2023 09:56

AldoGl marked this pull request as ready for review March 21, 2023 10:14

AldoGl reviewed Mar 21, 2023

View reviewed changes

AldoGl force-pushed the feat/scheduler branch 3 times, most recently from b5dcde5 to 4b10e30 Compare March 21, 2023 11:11

base scheduler feature: add set_scheduler method and change 'samplers…

f19c113

…' to 'scheduler' in json checkpointing

AldoGl force-pushed the feat/scheduler branch from 4b10e30 to f19c113 Compare March 21, 2023 12:33

AldoGl approved these changes Mar 21, 2023

View reviewed changes

marcofavoritobi merged commit 84903a1 into main Mar 21, 2023

marcofavoritobi deleted the feat/scheduler branch March 21, 2023 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `BaseScheduler` abstraction #52

Introduce `BaseScheduler` abstraction #52

marcofavoritobi commented Mar 19, 2023 •

edited

Loading

marcofavoritobi Mar 19, 2023 •

edited

Loading

codecov-commenter commented Mar 19, 2023 •

edited

Loading

marcofavoritobi Mar 19, 2023

marcofavoritobi Mar 19, 2023

marcofavoritobi Mar 19, 2023

marcofavoritobi Mar 19, 2023

marcofavoritobi Mar 19, 2023

AldoGl Mar 21, 2023

marcofavoritobi Mar 19, 2023

AldoGl Mar 21, 2023

marcofavoritobi Mar 21, 2023

AldoGl left a comment

Introduce BaseScheduler abstraction #52

Introduce BaseScheduler abstraction #52

Conversation

marcofavoritobi commented Mar 19, 2023 • edited Loading

Proposed changes

Fixes

Types of changes

Checklist

Further comments

marcofavoritobi Mar 19, 2023 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Mar 19, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AldoGl left a comment

Choose a reason for hiding this comment

Introduce `BaseScheduler` abstraction #52

Introduce `BaseScheduler` abstraction #52

marcofavoritobi commented Mar 19, 2023 •

edited

Loading

marcofavoritobi Mar 19, 2023 •

edited

Loading

codecov-commenter commented Mar 19, 2023 •

edited

Loading