-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IM-1787: Improve memory efficiency of threshold plugin #1913
Conversation
…nts within the threshold plugin.
…broadcasting arrays.
…ested by Stephen Moseley.
…d value rather than a list. Work through interface changes in other plugins.
…ng an argument has been removed.
…reshold plugin. This was previously all performed in the CLI and thus not covered by unit tests in this specific context.
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1913 +/- ##
==========================================
- Coverage 98.37% 97.13% -1.25%
==========================================
Files 122 123 +1
Lines 11707 11938 +231
==========================================
+ Hits 11517 11596 +79
- Misses 190 342 +152
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes are basically ok, it just needs tightening up on the documentation and testing side.
@@ -393,6 +393,101 @@ def process(self, cube: Cube) -> Tuple[Cube, Cube]: | |||
return tuple(gradients) | |||
|
|||
|
|||
def maximum_within_vicinity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new methods in spatial
need unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
* upstream/master: Fix to the wind vertical displacement adjustment implementation (metoppv#1927) Add function which normalises input cubes according to a reference (metoppv#1919) Skip ECC bounds usage when converting probabilities to percentiles (metoppv#1926) Add CLIs to support rescaling of the forecast based on altitude difference (metoppv#1917) Changes to the modal code to increase the percentage to 30% and change the groupings to provide a more representative daily summary symbol. (metoppv#1925) Add plugins to support rescaling of the forecast based on altitude difference (metoppv#1916) Support conversion from percentiles to probabilities (metoppv#1924) Correct handling of reference time in weather_code plugin (metoppv#1920) Add CLI for clipping cubes (metoppv#1918) Update cbh ecc name (metoppv#1922) Updates Broadcast and expand_bounds in Combine Plugin (metoppv#1914) Mobt515 cloud base height spot extraction (metoppv#1911) MOBT-494: Cube title setting in weather symbol code (metoppv#1912) MOBT512-masking percentiles for cloud base height (metoppv#1908) Mobt 496 enforce forecast between references (metoppv#1907)
This commit also addresses many of the review comments though the expansion / rewriting of the unit tests will be undertaken separately. The BasicThreshold plugin is also renamed to Threshold.
…ded due to new pytests.
@MoseleyS I have added some tests (a lot of tests!). This ensures the unit tests now capture the vicinity processing and realization collapse behaviour that was previously only lightly tested by acceptance tests as the functionality sat within the CLI. To try and retain the evidence that the efficiency changes in this PR have not changed any results I have, as things stand, left the original unit tests as they were when you last reviewed them. These are the tests we've had for a long time, with small interface tweaks as required by the changes, that show values are unaffected. I've then added a new I've yet to add unit test for the spatial.py utilities that got spun out of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review not completed. I've got as far as line 217 in test_ThresholdPytest.py
improver/threshold.py
Outdated
if landmask is not None: | ||
landmask = np.where(landmask.data >= 0.5, True, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This point still stands.
# fuzzy_bounds contains one value | ||
( | ||
{"threshold_config": {"0.6": [0.4]}}, | ||
"Invalid bounds for one threshold: \\(0.4,\\).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Optional] You can reduce the double-escapes to single-escapes with regex-strings. It isn't a massive improvement though.
"Invalid bounds for one threshold: \\(0.4,\\).", | |
r"Invalid bounds for one threshold: \(0.4,\).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New-fangled (by which I mean unfamiliar). I'll leave this as is so I understand what I've done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought of a couple of ways this could be very slightly improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all this work, @bayliffe. This re-write is clearly a significant undertaking 👍
I've added some comments and queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, @bayliffe 👍
* Making threshold more efficient. * Getting the contributions right when using masked data. * Correct unit setting bug. * Beginning to restructure. * Changing interfaces, simplifying CLI, and updating unit tests. Incomplete. * Break up the occurence within vicinity plugin to allow use of components within the threshold plugin. * Figuring out where the vicinity processing differs. * Add new exception. May be temporary. * more incremental changes. * Final fixes to vicinity processing from within the threshold plugin. * Format changes. * Remove defunct test. Add array slicing to avoid swelling memory when broadcasting arrays. * Use a slice iterator rather than list to reduce memory usage. As suggested by Stephen Moseley. * Set self.fill_masked which was lost in rebase. * Remove assumption of realization coordinate added when changing to use iterator. * Modify threshold such that it can once again accept a single threshold value rather than a list. Work through interface changes in other plugins. * Remove doc-strings relating to fuzzy-bounds, the setting of which using an argument has been removed. * Add unit tests to cover the vicinity processing applied within the threshold plugin. This was previously all performed in the CLI and thus not covered by unit tests in this specific context. * Format fixes. * Replicate percentile collapse functionality. This commit also addresses many of the review comments though the expansion / rewriting of the unit tests will be undertaken separately. The BasicThreshold plugin is also renamed to Threshold. * Added back land mask without vicinity exception. * Rewriting threshold unit tests using pytest and extending coverage. Incomplete. * Unit test replacement complete. Doc-strings and formatting changes to come. * Add docstrings and typing to the conftest functions. * Formatting corrections. * Remove broken new test added to BasicThreshold tests which is not needed due to new pytests. * Add vicinity tests using a landmask. * Partial review response. * Further review changes. * Further review tweaks. * Tweak to type checking. * Isort reorder * Review changes. * Fix my test for bounds set to None.
This PR changes the way in which the thresholding plugin works to reduce its memory requirements. The reason for this work is the requirement to handle the larger 51 member ECMWF ensembles, which without this work is prohibitively expensive in terms of required memory resources.
The changes are designed to avoid the need to hold a thresholded, multi-realization array in memory. Prior to this change the threshold plugin will threshold each realization in turn, and if collapsing realizations, perform an iris.analysis.MEAN over that realization dimension. The rewritten version, when collapsing realizations, will handle each realization in turn, summing up the contributions to each threshold, and then divide this sum by the total number of contributions which is also accrued.
More discussion can be found here: #1787
This PR makes some other changes which are simplifying and made this work more tractable:
There could be more unit tests added to cover the CLI functionality that has now moved into the plugin. I've added a little to cover the vicinity processing, but would appreciate a list of additional functionality that needs testing from the reviewers.
Testing: