IM-1787: Improve memory efficiency of threshold plugin #1913

bayliffe · 2023-06-16T13:48:19Z

This PR changes the way in which the thresholding plugin works to reduce its memory requirements. The reason for this work is the requirement to handle the larger 51 member ECMWF ensembles, which without this work is prohibitively expensive in terms of required memory resources.

The changes are designed to avoid the need to hold a thresholded, multi-realization array in memory. Prior to this change the threshold plugin will threshold each realization in turn, and if collapsing realizations, perform an iris.analysis.MEAN over that realization dimension. The rewritten version, when collapsing realizations, will handle each realization in turn, summing up the contributions to each threshold, and then divide this sum by the total number of contributions which is also accrued.

More discussion can be found here: #1787

This PR makes some other changes which are simplifying and made this work more tractable:

Move almost all the CLI functionality into the plugin. Some of this was required to enable the vicinity processing to be applied to each realization-threshold pair without having to retain both coordinates throughout the process, which led to large memory usage. Other functionality may have been moved simply to yield a simpler CLI which is desirable.
Removal of the ability to provide a arbitrary function to the plugin. We have only ever used realization collapse and so that option is now all that is left. This changes the CLI and plugin interfaces, meaning we need to notify collaborators and ensure we make a suitable suite change to accompany this PR.
Removal of the option to provide fuzzy-bounds as an argument to the plugin. This was used by the CLI which would interpret a threshold config that the user might provide and then pass in the bounds. Nearly all of the CLI functionality has now been moved into the plugin to simplify the CLI, meaning the threshold config is instead passed to the plugin, making the fuzzy-bounds argument redundant.

There could be more unit tests added to cover the CLI functionality that has now moved into the plugin. I've added a little to cover the vicinity processing, but would appreciate a list of additional functionality that needs testing from the reviewers.

Testing:

Ran tests and they passed OK
Added new tests for the new feature(s)

…lete.

…nts within the threshold plugin.

…broadcasting arrays.

…ested by Stephen Moseley.

…e iterator.

…d value rather than a list. Work through interface changes in other plugins.

…ng an argument has been removed.

…reshold plugin. This was previously all performed in the CLI and thus not covered by unit tests in this specific context.

codecov · 2023-06-16T13:54:25Z

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (e1523c2) 98.37% compared to head (ffd3fb9) 97.13%.
Report is 19 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1913      +/-   ##
==========================================
- Coverage   98.37%   97.13%   -1.25%     
==========================================
  Files         122      123       +1     
  Lines       11707    11938     +231     
==========================================
+ Hits        11517    11596      +79     
- Misses        190      342     +152

Files	Coverage Δ
...precipitation_type/shower_condition_probability.py	`100.00% <100.00%> (ø)`
improver/regrid/landsea.py	`99.21% <100.00%> (ø)`
improver/threshold.py	`100.00% <100.00%> (ø)`
improver/utilities/spatial.py	`97.44% <100.00%> (-1.65%)`	⬇️
improver/utilities/textural.py	`100.00% <100.00%> (ø)`

... and 18 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MoseleyS

I think the changes are basically ok, it just needs tightening up on the documentation and testing side.

improver/threshold.py

improver/utilities/spatial.py

improver/threshold.py

improver/utilities/spatial.py

improver_tests/acceptance/test_threshold.py

improver_tests/threshold/test_BasicThreshold.py

MoseleyS · 2023-06-30T14:59:10Z

improver/utilities/spatial.py

@@ -393,6 +393,101 @@ def process(self, cube: Cube) -> Tuple[Cube, Cube]:
        return tuple(gradients)


+def maximum_within_vicinity(


These new methods in spatial need unit tests

* upstream/master: Fix to the wind vertical displacement adjustment implementation (metoppv#1927) Add function which normalises input cubes according to a reference (metoppv#1919) Skip ECC bounds usage when converting probabilities to percentiles (metoppv#1926) Add CLIs to support rescaling of the forecast based on altitude difference (metoppv#1917) Changes to the modal code to increase the percentage to 30% and change the groupings to provide a more representative daily summary symbol. (metoppv#1925) Add plugins to support rescaling of the forecast based on altitude difference (metoppv#1916) Support conversion from percentiles to probabilities (metoppv#1924) Correct handling of reference time in weather_code plugin (metoppv#1920) Add CLI for clipping cubes (metoppv#1918) Update cbh ecc name (metoppv#1922) Updates Broadcast and expand_bounds in Combine Plugin (metoppv#1914) Mobt515 cloud base height spot extraction (metoppv#1911) MOBT-494: Cube title setting in weather symbol code (metoppv#1912) MOBT512-masking percentiles for cloud base height (metoppv#1908) Mobt 496 enforce forecast between references (metoppv#1907)

This commit also addresses many of the review comments though the expansion / rewriting of the unit tests will be undertaken separately. The BasicThreshold plugin is also renamed to Threshold.

…ncomplete.

… come.

…ded due to new pytests.

bayliffe · 2023-08-04T11:04:29Z

@MoseleyS I have added some tests (a lot of tests!). This ensures the unit tests now capture the vicinity processing and realization collapse behaviour that was previously only lightly tested by acceptance tests as the functionality sat within the CLI.

To try and retain the evidence that the efficiency changes in this PR have not changed any results I have, as things stand, left the original unit tests as they were when you last reviewed them. These are the tests we've had for a long time, with small interface tweaks as required by the changes, that show values are unaffected. I've then added a new test_ThresholdPytest.py file which will replace this in time. I've not tried to use the same values in these tests but have wholesale replaced them.

I've yet to add unit test for the spatial.py utilities that got spun out of the OccurenceWithinVicinity plugin, but there is plenty here to review without those for now.

MoseleyS

Review not completed. I've got as far as line 217 in test_ThresholdPytest.py

improver/precipitation_type/shower_condition_probability.py

improver/regrid/landsea.py

improver/utilities/textural.py

MoseleyS · 2023-08-04T14:20:56Z

improver/threshold.py

+        if landmask is not None:
+            landmask = np.where(landmask.data >= 0.5, True, False)


This point still stands.

improver_tests/threshold/conftest.py

improver/threshold.py

improver_tests/threshold/test_ThresholdPytest.py

MoseleyS · 2023-08-04T14:46:26Z

improver_tests/threshold/test_ThresholdPytest.py

+        # fuzzy_bounds contains one value
+        (
+            {"threshold_config": {"0.6": [0.4]}},
+            "Invalid bounds for one threshold: \\(0.4,\\).",


[Optional] You can reduce the double-escapes to single-escapes with regex-strings. It isn't a massive improvement though.

Suggested change

"Invalid bounds for one threshold: \\(0.4,\\).",

r"Invalid bounds for one threshold: \(0.4,\).",

New-fangled (by which I mean unfamiliar). I'll leave this as is so I understand what I've done.

improver_tests/threshold/test_ThresholdPytest.py

improver/cli/threshold.py

improver/threshold.py

improver_tests/threshold/test_ThresholdPytest.py

MoseleyS

I've thought of a couple of ways this could be very slightly improved.

improver_tests/threshold/test_ThresholdPytest.py

improver_tests/threshold/conftest.py

improver/threshold.py

gavinevans

Thanks for all this work, @bayliffe. This re-write is clearly a significant undertaking 👍

I've added some comments and queries.

improver/cli/threshold.py

improver/threshold.py

improver_tests/threshold/test_ThresholdPytest.py

improver/threshold.py

improver_tests/threshold/test_ThresholdPytest.py

improver/threshold.py

gavinevans

Thanks for the updates, @bayliffe 👍

* Making threshold more efficient. * Getting the contributions right when using masked data. * Correct unit setting bug. * Beginning to restructure. * Changing interfaces, simplifying CLI, and updating unit tests. Incomplete. * Break up the occurence within vicinity plugin to allow use of components within the threshold plugin. * Figuring out where the vicinity processing differs. * Add new exception. May be temporary. * more incremental changes. * Final fixes to vicinity processing from within the threshold plugin. * Format changes. * Remove defunct test. Add array slicing to avoid swelling memory when broadcasting arrays. * Use a slice iterator rather than list to reduce memory usage. As suggested by Stephen Moseley. * Set self.fill_masked which was lost in rebase. * Remove assumption of realization coordinate added when changing to use iterator. * Modify threshold such that it can once again accept a single threshold value rather than a list. Work through interface changes in other plugins. * Remove doc-strings relating to fuzzy-bounds, the setting of which using an argument has been removed. * Add unit tests to cover the vicinity processing applied within the threshold plugin. This was previously all performed in the CLI and thus not covered by unit tests in this specific context. * Format fixes. * Replicate percentile collapse functionality. This commit also addresses many of the review comments though the expansion / rewriting of the unit tests will be undertaken separately. The BasicThreshold plugin is also renamed to Threshold. * Added back land mask without vicinity exception. * Rewriting threshold unit tests using pytest and extending coverage. Incomplete. * Unit test replacement complete. Doc-strings and formatting changes to come. * Add docstrings and typing to the conftest functions. * Formatting corrections. * Remove broken new test added to BasicThreshold tests which is not needed due to new pytests. * Add vicinity tests using a landmask. * Partial review response. * Further review changes. * Further review tweaks. * Tweak to type checking. * Isort reorder * Review changes. * Fix my test for bounds set to None.

bayliffe added 18 commits June 16, 2023 09:55

Making threshold more efficient.

8b68ae7

Getting the contributions right when using masked data.

83513a4

Correct unit setting bug.

589ae1a

Beginning to restructure.

c3b9670

Changing interfaces, simplifying CLI, and updating unit tests. Incomp…

8e2dc97

…lete.

Break up the occurence within vicinity plugin to allow use of compone…

9689768

…nts within the threshold plugin.

Figuring out where the vicinity processing differs.

174258f

Add new exception. May be temporary.

037fdc5

more incremental changes.

5005977

Final fixes to vicinity processing from within the threshold plugin.

6910855

Format changes.

3a9a669

Remove defunct test. Add array slicing to avoid swelling memory when …

c9934f3

…broadcasting arrays.

Use a slice iterator rather than list to reduce memory usage. As sugg…

1d87141

…ested by Stephen Moseley.

Set self.fill_masked which was lost in rebase.

5e371f8

Remove assumption of realization coordinate added when changing to us…

193bca2

…e iterator.

Modify threshold such that it can once again accept a single threshol…

ba3e291

…d value rather than a list. Work through interface changes in other plugins.

Remove doc-strings relating to fuzzy-bounds, the setting of which usi…

aff1499

…ng an argument has been removed.

Add unit tests to cover the vicinity processing applied within the th…

33c3529

…reshold plugin. This was previously all performed in the CLI and thus not covered by unit tests in this specific context.

bayliffe mentioned this pull request Jun 16, 2023

Make threshold more memory efficient #1787

Closed

Format fixes.

80f96c8

MoseleyS requested changes Jun 30, 2023

View reviewed changes

bayliffe added 2 commits July 27, 2023 16:37

Replicate percentile collapse functionality.

19083bf

This commit also addresses many of the review comments though the expansion / rewriting of the unit tests will be undertaken separately. The BasicThreshold plugin is also renamed to Threshold.

bayliffe force-pushed the im1787 branch from 3b6903d to 19083bf Compare July 31, 2023 12:36

bayliffe added 5 commits August 2, 2023 16:01

Added back land mask without vicinity exception.

9cbfccb

Rewriting threshold unit tests using pytest and extending coverage. I…

1bfc213

…ncomplete.

Unit test replacement complete. Doc-strings and formatting changes to…

94a7885

… come.

Add docstrings and typing to the conftest functions.

e63df9f

Formatting corrections.

5ae1450

Remove broken new test added to BasicThreshold tests which is not nee…

f926509

…ded due to new pytests.

Add vicinity tests using a landmask.

24f24ee

MoseleyS requested changes Aug 4, 2023

View reviewed changes

MoseleyS requested changes Aug 10, 2023

View reviewed changes

bayliffe added 2 commits August 25, 2023 08:52

Partial review response.

5f54444

Further review changes.

7d84c96

MoseleyS requested changes Sep 11, 2023

View reviewed changes

improver_tests/threshold/test_ThresholdPytest.py Outdated Show resolved Hide resolved

improver_tests/threshold/conftest.py Outdated Show resolved Hide resolved

improver/threshold.py Outdated Show resolved Hide resolved

bayliffe added 3 commits September 12, 2023 09:19

Further review tweaks.

2e6bb36

Tweak to type checking.

8767a61

Isort reorder

6cdcb10

MoseleyS previously approved these changes Sep 18, 2023

View reviewed changes

gavinevans requested changes Oct 6, 2023

View reviewed changes

Review changes.

ab7edfa

bayliffe dismissed MoseleyS’s stale review via ab7edfa October 17, 2023 16:13

Fix my test for bounds set to None.

ffd3fb9

gavinevans approved these changes Oct 20, 2023

View reviewed changes

gavinevans merged commit 27531d9 into metoppv:master Oct 20, 2023
10 checks passed

bayliffe mentioned this pull request Apr 30, 2024

Remove old threshold tests and retain only the pytest versions #1991

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IM-1787: Improve memory efficiency of threshold plugin #1913

IM-1787: Improve memory efficiency of threshold plugin #1913

bayliffe commented Jun 16, 2023

codecov bot commented Jun 16, 2023 •

edited

Loading

MoseleyS left a comment

MoseleyS Jun 30, 2023

bayliffe Aug 30, 2023

bayliffe commented Aug 4, 2023

MoseleyS left a comment

MoseleyS Aug 4, 2023

MoseleyS Aug 4, 2023

bayliffe Aug 23, 2023

MoseleyS left a comment

gavinevans left a comment

gavinevans left a comment

		@@ -393,6 +393,101 @@ def process(self, cube: Cube) -> Tuple[Cube, Cube]:
		return tuple(gradients)


		def maximum_within_vicinity(

		if landmask is not None:
		landmask = np.where(landmask.data >= 0.5, True, False)

	"Invalid bounds for one threshold: \\(0.4,\\).",
	r"Invalid bounds for one threshold: \(0.4,\).",

IM-1787: Improve memory efficiency of threshold plugin #1913

IM-1787: Improve memory efficiency of threshold plugin #1913

Conversation

bayliffe commented Jun 16, 2023

codecov bot commented Jun 16, 2023 • edited Loading

Codecov Report

MoseleyS left a comment

Choose a reason for hiding this comment

MoseleyS Jun 30, 2023

Choose a reason for hiding this comment

bayliffe Aug 30, 2023

Choose a reason for hiding this comment

bayliffe commented Aug 4, 2023

MoseleyS left a comment

Choose a reason for hiding this comment

MoseleyS Aug 4, 2023

Choose a reason for hiding this comment

MoseleyS Aug 4, 2023

Choose a reason for hiding this comment

bayliffe Aug 23, 2023

Choose a reason for hiding this comment

MoseleyS left a comment

Choose a reason for hiding this comment

gavinevans left a comment

Choose a reason for hiding this comment

gavinevans left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 16, 2023 •

edited

Loading