Merge branch 'feature_branch_nbhood_refactor' into mobt_157_nbhood_re…

…factor_tidy_CLIs * feature_branch_nbhood_refactor: Mobt 157 nbhood refactor consolidate unit tests rebased (metoppv#1664) Mobt 157 nbhood refactor consolidate unit tests part1 (metoppv#1665) Adds a filter to the combine CLI for mismatching realizations (metoppv#1656) Reduce the memory requirements for read-the-docs (metoppv#1672) Further doc-building fixes. (metoppv#1671) DOC Fix intersphinx links for docs (metoppv#1668) Mobt 157 nbhood refactor sort out base classes (metoppv#1653) Modifies wxcode check_tree utility function to report issues with unreachable nodes (metoppv#1637) remove cycle (metoppv#1657) Minor edits to remove raising unnecessary warnings. (metoppv#1646) Change pandas DataFrame.at to DataFrame.loc (metoppv#1655) MOBT-154: Reunification of wx decision trees (metoppv#1639) Consolidate scale parameter usage across EMOS and ECC (metoppv#1642) Adds handling of a model-id-attr to wxcode-modal (metoppv#1634)
fionaRust · Feb 23, 2022 · dd3b727 · dd3b727
2 parents a3ada3c + 62d5f05
commit dd3b727
Show file tree

Hide file tree

Showing 46 changed files with 1,048 additions and 376 deletions.
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -1,9 +1,12 @@
-conda:
-    file: doc/rtd_environment.yml
+version: 2
+
 build:
-    image: latest
+  os: "ubuntu-20.04"
+  tools:
+    python: "mambaforge-4.10"
+
+conda:
+  environment: doc/rtd_environment.yml
+
 python:
-    version: 3.6
     system_packages: true
-formats:
-    - htmlzip
diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -394,12 +394,12 @@
 
 # Example configuration for intersphinx: refer to the Python standard library.
 intersphinx_mapping = {
-    "https://docs.python.org/": None,
+    "https://docs.python.org/3/": None,
     "https://scitools-iris.readthedocs.io/en/latest/": None,
     "https://scitools.org.uk/cartopy/docs/latest/": None,
-    "https://scitools.org.uk/cf-units/docs/latest/": None,
-    "https://docs.scipy.org/doc/numpy/": None,
-    "https://docs.scipy.org/doc/scipy/reference/": None,
+    "https://cf-units.readthedocs.io/en/stable/": None,
+    "https://numpy.org/doc/stable/": None,
+    "https://docs.scipy.org/doc/scipy-1.6.2/reference/": None,
     "https://pandas.pydata.org/pandas-docs/dev/": None,
 }
 

diff --git a/...xtended_documentation/calibration/ensemble_calibration/ensemble_calibration.rst b/...xtended_documentation/calibration/ensemble_calibration/ensemble_calibration.rst
@@ -56,16 +56,16 @@ A normal (Gaussian) distribution is often represented using the syntax:
 where :math:`\mu` is mean and :math:`\sigma^{2}` is the variance. The normal
 distribution is a special case, where :math:`\mu` can be interpreted as both
 the mean and the location parameter and :math:`\sigma^{2}` can be interpreted
-as both the variance and the scale parameter. For an alternative distribution,
-such as a truncated normal distribution that has been truncated to lie within
-0 and infinity, the distribution can be represented as:
+as both the variance and the square of the scale parameter. For an alternative
+distribution, such as a truncated normal distribution that has been truncated
+to lie within 0 and infinity, the distribution can be represented as:
 
 .. math::
 
     \mathcal{N^0}(\mu,\,\sigma^{2})
 
 In this case, the :math:`\mu` is strictly interpreted as the location parameter
-and :math:`\sigma^{2}` is strictly interpreted as the scale parameter.
+and :math:`\sigma^{2}` is strictly interpreted as the square of the scale parameter.
 
 ===============================
 What is the location parameter?
@@ -82,6 +82,29 @@ The scale parameter indicates the width in the distribution. If the scale
 parameter is large, then the distribution will be broader. If the scale is
 smaller, then the distribution will be narrower.
 
+****************************************************
+Implementation details
+****************************************************
+
+In this implementation, we will choose to define the distributions
+using the scale parameter (as this matches scipy's expectation),
+rather than the square of the scale parameter:
+
+.. math::
+
+    \mathcal{N}(\mu,\,\sigma)
+
+The full equation when estimating the EMOS coefficients using
+the ensemble mean is therefore:
+
+.. math::
+
+    \mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})
+
+This matches the equations noted in `Allen et al., 2021`_.
+
+.. _Allen et al., 2021: https://doi.org/10.1002/qj.3983
+
 ****************************************************
 Estimating EMOS coefficients using the ensemble mean
 ****************************************************
@@ -91,7 +114,7 @@ If the predictor is the ensemble mean, coefficients are estimated as
 
 .. math::
 
-    \mathcal{N}(a + \bar{X}, c + dS^{2})
+    \mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})
 
 where N is a chosen distribution and values of a, b, c and d are solved in the
 format of :math:`\alpha, \beta, \gamma` and :math:`\delta`, see the equations
@@ -121,7 +144,7 @@ If the predictor is the ensemble realizations, coefficients are estimated for
 
 .. math::
 
-    \mathcal{N}(a + b_1X_1 + ... + b_mX_m, c + dS^{2})
+    \mathcal{N}(a + b_1X_1 + ... + b_mX_m, \sqrt{c + dS^{2}})
 
 where N is a chosen distribution, the values of a, b, c and d relate
 to alpha, beta, gamma and delta through the equations above with
@@ -140,14 +163,14 @@ The EMOS coefficients represent adjustments to the ensemble mean and ensemble
 variance, in order to generate the location and scale parameters that, for the
 chosen distribution, minimise the CRPS. The coefficients can therefore be used
 to construct the location parameter, :math:`\mu`, and scale parameter,
-:math:`\sigma^{2}`, for the calibrated forecast from today's ensemble mean, or
+:math:`\sigma`, for the calibrated forecast from today's ensemble mean, or
 ensemble realizations, and the ensemble variance.
 
 .. math::
 
     \mu = a + b\bar{X}
 
-    \sigma^{2} = c + dS^{2}
+    \sigma = \sqrt{c + dS^{2}}
 
 Note here that this procedure holds whether the distribution is normal, i.e.
 where the application of the EMOS coefficients to the raw ensemble mean results

diff --git a/...tation/calibration/reliability_calibration/reliability_calibration_examples.rst b/...tation/calibration/reliability_calibration/reliability_calibration_examples.rst
@@ -12,7 +12,7 @@ The reliability calibration tables returned by this plugin are structured as sho
        Auxiliary coordinates:
             table_row_name                            -                   x                   -                           -                             -
        Scalar coordinates:
-            cycle_hour: 22
+            forecast_reference_time: 2017-11-11 00:00:00, bound=(2017-11-10 00:00:00, 2017-11-11 00:00:00)
             forecast_period: 68400 seconds
        Attributes:
             institution: Met Office

diff --git a/doc/source/extended_documentation/wxcode/build_a_decision_tree.rst b/doc/source/extended_documentation/wxcode/build_a_decision_tree.rst
@@ -77,10 +77,16 @@ accessed with this key contains the essentials that make the node function.
     condition_combination. Alternatively, if they are being manipulated within
     the node (e.g. added together), they must be separated by the desired
     operators (e.g. 'diagnostic1', '+', 'diagnostic2').
-  - **diagnostic_thresholds** (List(List(float, str)): The diagnostic threshold
-    value and units being used in the test. A threshold [value, units] pair must
-    be provided for each diagnostic field with the same nested list structure; as
-    the basic unit is a list of value and unit, the overall nested structure is
+  - **diagnostic_thresholds** (List(List(float, str, Optional(int))): The
+    diagnostic threshold value and units being used in the test. An optional
+    third value provides a period in seconds that is associated with the
+    threshold value. For example, a precipitation accumulation threshold might
+    be given for a 1-hour period (3600 seconds). If instead 3-hour symbols are
+    being produced using 3-hour precipitation accumulations then the threshold
+    value will be scaled up by a factor of 3. Only thresholds with an
+    associated period will be scaled in this way. A threshold [value, units] pair
+    must be provided for each diagnostic field with the same nested list structure;
+    as the basic unit is a list of value and unit, the overall nested structure is
     one list deeper.
   - **diagnostic_conditions** (as diagnostic_fields): The expected inequality
     that has been used to construct the input probability field. This is checked

diff --git a/improver/calibration/dataframe_utilities.py b/improver/calibration/dataframe_utilities.py
@@ -326,17 +326,15 @@ def _prepare_dataframes(
         keep="last",
     )
     # Sort to ensure a consistent ordering after removing duplicates.
-    forecast_df.sort_values(
-        by=["blend_time", "percentile", "wmo_id"], inplace=True, ignore_index=True,
+    forecast_df = forecast_df.sort_values(
+        by=["blend_time", "percentile", "wmo_id"], ignore_index=True,
     )
 
     # Remove truth duplicates.
     truth_cols = ["diagnostic", "time", "wmo_id"]
     truth_df = truth_df.drop_duplicates(subset=truth_cols, keep="last",)
     # Sort to ensure a consistent ordering after removing duplicates.
-    truth_df.sort_values(
-        by=truth_cols, inplace=True, ignore_index=True,
-    )
+    truth_df = truth_df.sort_values(by=truth_cols, ignore_index=True)
 
     # Find the common set of WMO IDs.
     common_wmo_ids = sorted(

diff --git a/improver/calibration/ensemble_calibration.py b/improver/calibration/ensemble_calibration.py
@@ -1130,7 +1130,7 @@ def mask_cube(cube: Cube, landsea_mask: Cube) -> None:
             IndexError: if the cube and landsea_mask shapes are not compatible.
         """
         try:
-            cube.data[..., ~landsea_mask.data.astype(np.bool)] = np.nan
+            cube.data[..., ~landsea_mask.data.astype(bool)] = np.nan
         except IndexError as err:
             msg = "Cube and landsea_mask shapes are not compatible. {}".format(err)
             raise IndexError(msg)
@@ -1380,7 +1380,6 @@ def process(
             forecast_var,
             number_of_realizations,
         )
-
         return coefficients_cubelist
 
 
@@ -1581,9 +1580,10 @@ def _calculate_scale_parameter(self) -> ndarray:
         )
 
         # Calculating the scale parameter, based on the raw variance S^2,
-        # where predicted variance = c + dS^2, where c = (gamma)^2 and
-        # d = (delta)^2
-        scale_parameter = (
+        # where predicted scale parameter (or equivalently standard deviation
+        # for a normal distribution) = sqrt(c + dS^2), where c = (gamma)^2 and
+        # d = (delta)^2.
+        scale_parameter = np.sqrt(
             self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
             * self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
             + self.coefficients_cubelist.extract_cube("emos_coefficient_delta").data
@@ -1622,7 +1622,7 @@ def _create_output_cubes(
         )
         scale_parameter_cube = create_new_diagnostic_cube(
             "scale_parameter",
-            f"({template_cube.units})^2",
+            template_cube.units,
             template_cube,
             template_cube.attributes,
             data=scale_parameter,

diff --git a/improver/calibration/reliability_calibration.py b/improver/calibration/reliability_calibration.py
@@ -233,10 +233,11 @@ def _create_reliability_table_cube(
     ) -> Cube:
         """
         Construct a reliability table cube and populate it with the provided
-        data. The returned cube will include a cycle hour coordinate, which
-        describes the model cycle hour at which the forecast data was produced.
-        It will further include the forecast period, threshold coordinate,
-        and spatial coordinates from the forecast cube.
+        data. The returned cube will include a forecast_reference_time
+        coordinate, which will be the maximum range of bounds of the input
+        forecast reference times, with the point value set to the latest
+        of those in the inputs. It will further include the forecast period,
+        threshold coordinate, and spatial coordinates from the forecast cube.
 
         Args:
             forecast:
@@ -443,11 +444,11 @@ def process(self, historic_forecasts: Cube, truths: Cube) -> Cube:
         whether the data is thresholded below or above a given diagnostic
         threshold.
 
+        `historic_forecasts` and `truths` should have matching validity times.
+
         Args:
             historic_forecasts:
                 A cube containing the historical forecasts used in calibration.
-                These are expected to all have a consistent cycle hour, that is
-                the hour in the forecast reference time.
             truths:
                 A cube containing the thresholded gridded truths used in
                 calibration.

diff --git a/improver/cli/combine.py b/improver/cli/combine.py
@@ -37,7 +37,11 @@
 @cli.clizefy
 @cli.with_output
 def process(
-    *cubes: cli.inputcube, operation="+", new_name=None, broadcast_to_threshold=False,
+    *cubes: cli.inputcube,
+    operation="+",
+    new_name=None,
+    broadcast_to_threshold=False,
+    minimum_realizations=None,
 ):
     r"""Combine input cubes.
 
@@ -58,26 +62,22 @@ def process(
         broadcast_to_threshold (bool):
             If True, broadcast input cubes to the threshold coord prior to combining -
             a threshold coord must already exist on the first input cube.
+        minimum_realizations (int):
+            If specified, the input cubes will be filtered to ensure that only realizations that
+            include all available lead times are combined. If the number of realizations that
+            meet this criteria are fewer than this integer, an error will be raised.
 
     Returns:
         result (iris.cube.Cube):
             Returns a cube with the combined data.
     """
     from iris.cube import CubeList
 
-    from improver.cube_combiner import CubeCombiner, CubeMultiplier
+    from improver.cube_combiner import Combine
 
-    if not cubes:
-        raise TypeError("A cube is needed to be combined.")
-    if new_name is None:
-        new_name = cubes[0].name()
-
-    if operation == "*" or operation == "multiply":
-        result = CubeMultiplier()(
-            CubeList(cubes), new_name, broadcast_to_threshold=broadcast_to_threshold,
-        )
-
-    else:
-        result = CubeCombiner(operation)(CubeList(cubes), new_name)
-
-    return result
+    return Combine(
+        operation,
+        broadcast_to_threshold=broadcast_to_threshold,
+        minimum_realizations=minimum_realizations,
+        new_name=new_name,
+    )(CubeList(cubes))
diff --git a/improver/cli/nbhood.py b/improver/cli/nbhood.py
@@ -152,7 +152,7 @@ def process(
             lead_times=lead_times,
             weighted_mode=weighted_mode,
             sum_only=area_sum,
-            re_mask=True,
+            re_mask=remask,
         )(cube, mask_cube=mask)
     elif neighbourhood_output == "percentiles":
         result = GeneratePercentilesFromANeighbourhood(

diff --git a/improver/cli/wxcode.py b/improver/cli/wxcode.py
@@ -40,6 +40,7 @@ def process(
     *cubes: cli.inputcube,
     wxtree: cli.inputjson = None,
     model_id_attr: str = None,
+    target_period: int = None,
     check_tree: bool = False,
 ):
     """ Processes cube for Weather symbols.
@@ -54,12 +55,19 @@ def process(
             Name of attribute recording source models that should be
             inherited by the output cube. The source models are expected as
             a space-separated string.
+        target_period:
+            The period in seconds that the weather symbol being produced should
+            represent. This should correspond with any period diagnostics, e.g.
+            precipitation accumulation, being used as input. This is used to scale
+            any threshold values that are defined with an associated period in
+            the decision tree. It will only be used if the decision tree
+            provided has threshold values defined with an associated period.
         check_tree (bool):
-            If set the decision tree will be checked to see if it conforms to
-            the expected format; the only other argument required is the path
-            to the decision tree. If the tree is found to be valid the required
-            inputs will be listed. Setting this flag will prevent the CLI
-            performing any other actions.
+            If set, the decision tree will be checked to see if it conforms to
+            the expected format and that all nodes can be reached; the only other
+            argument required is the path to the decision tree. If the tree is found
+            to be valid the required inputs will be listed. Setting this flag will
+            prevent the CLI performing any other actions.
 
     Returns:
         iris.cube.Cube:
@@ -68,7 +76,7 @@ def process(
     if check_tree:
         from improver.wxcode.utilities import check_tree
 
-        return check_tree(wxtree)
+        return check_tree(wxtree, target_period=target_period)
 
     from iris.cube import CubeList
 
@@ -77,4 +85,6 @@ def process(
     if not cubes:
         raise RuntimeError("Not enough input arguments. See help for more information.")
 
-    return WeatherSymbols(wxtree, model_id_attr=model_id_attr)(CubeList(cubes))
+    return WeatherSymbols(
+        wxtree, model_id_attr=model_id_attr, target_period=target_period
+    )(CubeList(cubes))
diff --git a/improver/cli/wxcode_modal.py b/improver/cli/wxcode_modal.py
@@ -36,7 +36,7 @@
 
 @cli.clizefy
 @cli.with_output
-def process(*cubes: cli.inputcube):
+def process(*cubes: cli.inputcube, model_id_attr: str = None):
     """Generates a modal weather symbol for the period covered by the input
     weather symbol cubes. Where there are different weather codes available
     for night and day, the modal code returned is always a day code, regardless
@@ -46,6 +46,10 @@ def process(*cubes: cli.inputcube):
         cubes (iris.cube.CubeList):
             A cubelist containing weather symbols cubes that cover the period
             over which a modal symbol is desired.
+        model_id_attr (str):
+            Name of attribute recording source models that should be
+            inherited by the output cube. The source models are expected as
+            a space-separated string.
 
     Returns:
         iris.cube.Cube:
@@ -56,4 +60,4 @@ def process(*cubes: cli.inputcube):
     if not cubes:
         raise RuntimeError("Not enough input arguments. See help for more information.")
 
-    return ModalWeatherCode()(cubes)
+    return ModalWeatherCode(model_id_attr=model_id_attr)(cubes)