Skip to content

Commit

Permalink
IM-1787: Improve memory efficiency of threshold plugin (metoppv#1913)
Browse files Browse the repository at this point in the history
* Making threshold more efficient.

* Getting the contributions right when using masked data.

* Correct unit setting bug.

* Beginning to restructure.

* Changing interfaces, simplifying CLI, and updating unit tests. Incomplete.

* Break up the occurence within vicinity plugin to allow use of components within the threshold plugin.

* Figuring out where the vicinity processing differs.

* Add new exception. May be temporary.

* more incremental changes.

* Final fixes to vicinity processing from within the threshold plugin.

* Format changes.

* Remove defunct test. Add array slicing to avoid swelling memory when broadcasting arrays.

* Use a slice iterator rather than list to reduce memory usage. As suggested by Stephen Moseley.

* Set self.fill_masked which was lost in rebase.

* Remove assumption of realization coordinate added when changing to use iterator.

* Modify threshold such that it can once again accept a single threshold value rather than a list. Work through interface changes in other plugins.

* Remove doc-strings relating to fuzzy-bounds, the setting of which using an argument has been removed.

* Add unit tests to cover the vicinity processing applied within the threshold plugin. This was previously all performed in the CLI and thus not covered by unit tests in this specific context.

* Format fixes.

* Replicate percentile collapse functionality.

This commit also addresses many of the review comments
though the expansion / rewriting of the unit tests will
be undertaken separately.

The BasicThreshold plugin is also renamed to Threshold.

* Added back land mask without vicinity exception.

* Rewriting threshold unit tests using pytest and extending coverage. Incomplete.

* Unit test replacement complete. Doc-strings and formatting changes to come.

* Add docstrings and typing to the conftest functions.

* Formatting corrections.

* Remove broken new test added to BasicThreshold tests which is not needed due to new pytests.

* Add vicinity tests using a landmask.

* Partial review response.

* Further review changes.

* Further review tweaks.

* Tweak to type checking.

* Isort reorder

* Review changes.

* Fix my test for bounds set to None.
  • Loading branch information
bayliffe authored Oct 20, 2023
1 parent 0b1463e commit 79fbd3b
Show file tree
Hide file tree
Showing 12 changed files with 1,862 additions and 334 deletions.
2 changes: 1 addition & 1 deletion doc/source/Code-Style-Guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@ Plugins (classes) should be an example of a non-trivial algorithm or set
of algorithms for a particular purpose. They should be set up via the
``__init__`` method and then invoked on a particular iris Cube ``cube``
using a ``process`` method - e.g. using ``process(cube)``. See
e.g. `BasicThreshold <https://github.com/metoppv/improver/blob/master/lib/improver/threshold.py>`_
e.g. `Threshold <https://github.com/metoppv/improver/blob/master/lib/improver/threshold.py>`_
class. In some limited cases an iris ``CubeList`` may be preferable.
Avoid writing code that can do both. Class names use
`PascalCase <https://en.wikipedia.org/wiki/PascalCase>`_ whilst
Expand Down
99 changes: 22 additions & 77 deletions improver/cli/threshold.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,20 +58,31 @@ def process(
Args:
cube (iris.cube.Cube):
A cube to be processed.
land_sea_mask (Cube):
Binary land-sea mask data. True for land-points, False for sea.
Restricts in-vicinity processing to only include points of a
like mask value.
threshold_values (list of float):
Threshold value or values about which to calculate the truth
values; e.g. 270,300. Must be omitted if 'threshold_config'
is used.
Threshold value or values (e.g. 270K, 300K) to use when calculating
the probability of the input relative to the threshold value(s).
These are provided as a comma separated list, e.g. 270,300
The units of these values, e.g. K in the example can be defined
using the threshold_units argument or are otherwise assumed to
match the units of the diagnostic being thresholded.
threshold_values and and threshold_config are mutually exclusive
arguments, defining both will lead to an exception.
threshold_config (dict):
Threshold configuration containing threshold values and
(optionally) fuzzy bounds. Best used in combination with
'threshold_units' It should contain a dictionary of strings that
'threshold_units'. It should contain a dictionary of strings that
can be interpreted as floats with the structure:
"THRESHOLD_VALUE": [LOWER_BOUND, UPPER_BOUND]
e.g: {"280.0": [278.0, 282.0], "290.0": [288.0, 292.0]},
or with structure "THRESHOLD_VALUE": "None" (no fuzzy bounds).
Repeated thresholds with different bounds are ignored; only the
last duplicate will be used.
threshold_values and and threshold_config are mutually exclusive
arguments, defining both will lead to an exception.
threshold_units (str):
Units of the threshold values. If not provided the units are
assumed to be the same as those of the input cube. Specifying
Expand Down Expand Up @@ -104,89 +115,23 @@ def process(
List of distances in metres used to define the vicinities within
which to search for an occurrence. Each vicinity provided will
lead to a different gridded field.
land_sea_mask (Cube):
Binary land-sea mask data. True for land-points, False for sea.
Restricts in-vicinity processing to only include points of a
like mask value.
fill_masked (float):
If provided all masked points in cube will be replaced with the
provided value before thresholding.
Returns:
iris.cube.Cube:
Cube of probabilities relative to the given thresholds
Raises:
ValueError: If threshold_config and threshold_values are both set
ValueError: If threshold_config is used for fuzzy thresholding
ValueError: Cannot apply land-mask cube without in-vicinity processing.
ValueError: Can only collapse over a realization coordinate or a percentile
coordinate that has been rebadged as a realization coordinate.
"""
from improver.ensemble_copula_coupling.ensemble_copula_coupling import (
RebadgePercentilesAsRealizations,
)
from improver.threshold import BasicThreshold
from improver.utilities.cube_manipulation import collapse_realizations
from improver.utilities.spatial import OccurrenceWithinVicinity

if threshold_config and threshold_values:
raise ValueError(
"--threshold-config and --threshold-values are mutually exclusive "
"- please set one or the other, not both"
)
if threshold_config and fuzzy_factor:
raise ValueError("--threshold-config cannot be used for fuzzy thresholding")
if threshold_config:
thresholds = []
fuzzy_bounds = []
for key in threshold_config.keys():
# Ensure thresholds are float64 to avoid rounding errors during
# possible unit conversion.
thresholds.append(float(key))
# If the first threshold has no bounds, fuzzy_bounds is
# set to None and subsequent bounds checks are skipped
if threshold_config[key] == "None":
fuzzy_bounds = None
continue
fuzzy_bounds.append(tuple(threshold_config[key]))
else:
# Ensure thresholds are float64 to avoid rounding errors during possible
# unit conversion.
thresholds = [float(x) for x in threshold_values]
fuzzy_bounds = None

each_threshold_func_list = []

if vicinity is not None:
vicinity = [float(x) for x in vicinity]
# smooth thresholded occurrences over local vicinity
each_threshold_func_list.append(
OccurrenceWithinVicinity(radii=vicinity, land_mask_cube=land_sea_mask)
)
elif land_sea_mask:
raise ValueError("Cannot apply land-mask cube without in-vicinity processing")

if collapse_coord == "realization":
each_threshold_func_list.append(collapse_realizations)
elif collapse_coord == "percentile":
cube = RebadgePercentilesAsRealizations()(cube)
each_threshold_func_list.append(collapse_realizations)
elif collapse_coord is not None:
raise ValueError(
"Can only collapse over a realization coordinate or a percentile "
"coordinate that has been rebadged as a realization coordinate."
)

if fill_masked is not None:
fill_masked = float(fill_masked)
from improver.threshold import Threshold

return BasicThreshold(
thresholds,
return Threshold(
threshold_values=threshold_values,
threshold_config=threshold_config,
fuzzy_factor=fuzzy_factor,
fuzzy_bounds=fuzzy_bounds,
threshold_units=threshold_units,
comparison_operator=comparison_operator,
each_threshold_func=each_threshold_func_list,
collapse_coord=collapse_coord,
vicinity=vicinity,
fill_masked=fill_masked,
)(cube)
)(cube, land_sea_mask)
12 changes: 6 additions & 6 deletions improver/precipitation_type/shower_condition_probability.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
create_new_diagnostic_cube,
generate_mandatory_attributes,
)
from improver.threshold import BasicThreshold
from improver.threshold import Threshold
from improver.utilities.cube_manipulation import collapse_realizations

from .utilities import make_shower_condition_cube
Expand Down Expand Up @@ -171,12 +171,12 @@ def process(self, cubes: CubeList) -> Cube:
cloud, convection = self._extract_inputs(cubes)

# Threshold cubes
cloud_thresholded = BasicThreshold(
self.cloud_threshold, comparison_operator="<="
cloud_thresholded = Threshold(
threshold_values=self.cloud_threshold, comparison_operator="<="
).process(cloud)
convection_thresholded = BasicThreshold(self.convection_threshold).process(
convection
)
convection_thresholded = Threshold(
threshold_values=self.convection_threshold
).process(convection)

# Fill any missing data in the convective ratio field with zeroes.
if np.ma.is_masked(convection_thresholded.data):
Expand Down
6 changes: 3 additions & 3 deletions improver/regrid/landsea.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
from improver.metadata.constants.attributes import MANDATORY_ATTRIBUTE_DEFAULTS
from improver.metadata.constants.mo_attributes import MOSG_GRID_ATTRIBUTES
from improver.regrid.landsea2 import RegridWithLandSeaMask
from improver.threshold import BasicThreshold
from improver.threshold import Threshold
from improver.utilities.cube_checker import spatial_coords_match
from improver.utilities.spatial import OccurrenceWithinVicinity

Expand Down Expand Up @@ -288,9 +288,9 @@ def _get_matches(
# Identify nearby points on regridded input_land that match the
# selector_value
if selector_val > 0.5:
thresholder = BasicThreshold(0.5)
thresholder = Threshold(threshold_values=0.5)
else:
thresholder = BasicThreshold(0.5, comparison_operator="<=")
thresholder = Threshold(threshold_values=0.5, comparison_operator="<=")
in_vicinity = self.vicinity(thresholder(self.input_land))

# Identify those points sourced from the opposite mask that are
Expand Down
Loading

0 comments on commit 79fbd3b

Please sign in to comment.