Skip to content

Commit

Permalink
Merge branch 'feature_branch_nbhood_refactor' into mobt_157_nbhood_re…
Browse files Browse the repository at this point in the history
…factor_tidy_CLIs

* feature_branch_nbhood_refactor:
  Mobt 157 nbhood refactor consolidate unit tests rebased (metoppv#1664)
  Mobt 157 nbhood refactor consolidate unit tests part1 (metoppv#1665)
  Adds a filter to the combine CLI for mismatching realizations (metoppv#1656)
  Reduce the memory requirements for read-the-docs (metoppv#1672)
  Further doc-building fixes. (metoppv#1671)
  DOC Fix intersphinx links for docs (metoppv#1668)
  Mobt 157 nbhood refactor sort out base classes (metoppv#1653)
  Modifies wxcode check_tree utility function to report issues with unreachable nodes (metoppv#1637)
  remove cycle (metoppv#1657)
  Minor edits to remove raising unnecessary warnings. (metoppv#1646)
  Change pandas DataFrame.at to DataFrame.loc (metoppv#1655)
  MOBT-154: Reunification of wx decision trees (metoppv#1639)
  Consolidate scale parameter usage across EMOS and ECC (metoppv#1642)
  Adds handling of a model-id-attr to wxcode-modal (metoppv#1634)
  • Loading branch information
fionaRust committed Feb 23, 2022
2 parents a3ada3c + 62d5f05 commit dd3b727
Show file tree
Hide file tree
Showing 46 changed files with 1,048 additions and 376 deletions.
15 changes: 9 additions & 6 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
conda:
file: doc/rtd_environment.yml
version: 2

build:
image: latest
os: "ubuntu-20.04"
tools:
python: "mambaforge-4.10"

conda:
environment: doc/rtd_environment.yml

python:
version: 3.6
system_packages: true
formats:
- htmlzip
8 changes: 4 additions & 4 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -394,12 +394,12 @@

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"https://docs.python.org/": None,
"https://docs.python.org/3/": None,
"https://scitools-iris.readthedocs.io/en/latest/": None,
"https://scitools.org.uk/cartopy/docs/latest/": None,
"https://scitools.org.uk/cf-units/docs/latest/": None,
"https://docs.scipy.org/doc/numpy/": None,
"https://docs.scipy.org/doc/scipy/reference/": None,
"https://cf-units.readthedocs.io/en/stable/": None,
"https://numpy.org/doc/stable/": None,
"https://docs.scipy.org/doc/scipy-1.6.2/reference/": None,
"https://pandas.pydata.org/pandas-docs/dev/": None,
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,16 @@ A normal (Gaussian) distribution is often represented using the syntax:
where :math:`\mu` is mean and :math:`\sigma^{2}` is the variance. The normal
distribution is a special case, where :math:`\mu` can be interpreted as both
the mean and the location parameter and :math:`\sigma^{2}` can be interpreted
as both the variance and the scale parameter. For an alternative distribution,
such as a truncated normal distribution that has been truncated to lie within
0 and infinity, the distribution can be represented as:
as both the variance and the square of the scale parameter. For an alternative
distribution, such as a truncated normal distribution that has been truncated
to lie within 0 and infinity, the distribution can be represented as:

.. math::
\mathcal{N^0}(\mu,\,\sigma^{2})
In this case, the :math:`\mu` is strictly interpreted as the location parameter
and :math:`\sigma^{2}` is strictly interpreted as the scale parameter.
and :math:`\sigma^{2}` is strictly interpreted as the square of the scale parameter.

===============================
What is the location parameter?
Expand All @@ -82,6 +82,29 @@ The scale parameter indicates the width in the distribution. If the scale
parameter is large, then the distribution will be broader. If the scale is
smaller, then the distribution will be narrower.

****************************************************
Implementation details
****************************************************

In this implementation, we will choose to define the distributions
using the scale parameter (as this matches scipy's expectation),
rather than the square of the scale parameter:

.. math::
\mathcal{N}(\mu,\,\sigma)
The full equation when estimating the EMOS coefficients using
the ensemble mean is therefore:

.. math::
\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})
This matches the equations noted in `Allen et al., 2021`_.

.. _Allen et al., 2021: https://doi.org/10.1002/qj.3983

****************************************************
Estimating EMOS coefficients using the ensemble mean
****************************************************
Expand All @@ -91,7 +114,7 @@ If the predictor is the ensemble mean, coefficients are estimated as

.. math::
\mathcal{N}(a + \bar{X}, c + dS^{2})
\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})
where N is a chosen distribution and values of a, b, c and d are solved in the
format of :math:`\alpha, \beta, \gamma` and :math:`\delta`, see the equations
Expand Down Expand Up @@ -121,7 +144,7 @@ If the predictor is the ensemble realizations, coefficients are estimated for

.. math::
\mathcal{N}(a + b_1X_1 + ... + b_mX_m, c + dS^{2})
\mathcal{N}(a + b_1X_1 + ... + b_mX_m, \sqrt{c + dS^{2}})
where N is a chosen distribution, the values of a, b, c and d relate
to alpha, beta, gamma and delta through the equations above with
Expand All @@ -140,14 +163,14 @@ The EMOS coefficients represent adjustments to the ensemble mean and ensemble
variance, in order to generate the location and scale parameters that, for the
chosen distribution, minimise the CRPS. The coefficients can therefore be used
to construct the location parameter, :math:`\mu`, and scale parameter,
:math:`\sigma^{2}`, for the calibrated forecast from today's ensemble mean, or
:math:`\sigma`, for the calibrated forecast from today's ensemble mean, or
ensemble realizations, and the ensemble variance.

.. math::
\mu = a + b\bar{X}
\sigma^{2} = c + dS^{2}
\sigma = \sqrt{c + dS^{2}}
Note here that this procedure holds whether the distribution is normal, i.e.
where the application of the EMOS coefficients to the raw ensemble mean results
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The reliability calibration tables returned by this plugin are structured as sho
Auxiliary coordinates:
table_row_name - x - - -
Scalar coordinates:
cycle_hour: 22
forecast_reference_time: 2017-11-11 00:00:00, bound=(2017-11-10 00:00:00, 2017-11-11 00:00:00)
forecast_period: 68400 seconds
Attributes:
institution: Met Office
Expand Down
14 changes: 10 additions & 4 deletions doc/source/extended_documentation/wxcode/build_a_decision_tree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,16 @@ accessed with this key contains the essentials that make the node function.
condition_combination. Alternatively, if they are being manipulated within
the node (e.g. added together), they must be separated by the desired
operators (e.g. 'diagnostic1', '+', 'diagnostic2').
- **diagnostic_thresholds** (List(List(float, str)): The diagnostic threshold
value and units being used in the test. A threshold [value, units] pair must
be provided for each diagnostic field with the same nested list structure; as
the basic unit is a list of value and unit, the overall nested structure is
- **diagnostic_thresholds** (List(List(float, str, Optional(int))): The
diagnostic threshold value and units being used in the test. An optional
third value provides a period in seconds that is associated with the
threshold value. For example, a precipitation accumulation threshold might
be given for a 1-hour period (3600 seconds). If instead 3-hour symbols are
being produced using 3-hour precipitation accumulations then the threshold
value will be scaled up by a factor of 3. Only thresholds with an
associated period will be scaled in this way. A threshold [value, units] pair
must be provided for each diagnostic field with the same nested list structure;
as the basic unit is a list of value and unit, the overall nested structure is
one list deeper.
- **diagnostic_conditions** (as diagnostic_fields): The expected inequality
that has been used to construct the input probability field. This is checked
Expand Down
8 changes: 3 additions & 5 deletions improver/calibration/dataframe_utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,17 +326,15 @@ def _prepare_dataframes(
keep="last",
)
# Sort to ensure a consistent ordering after removing duplicates.
forecast_df.sort_values(
by=["blend_time", "percentile", "wmo_id"], inplace=True, ignore_index=True,
forecast_df = forecast_df.sort_values(
by=["blend_time", "percentile", "wmo_id"], ignore_index=True,
)

# Remove truth duplicates.
truth_cols = ["diagnostic", "time", "wmo_id"]
truth_df = truth_df.drop_duplicates(subset=truth_cols, keep="last",)
# Sort to ensure a consistent ordering after removing duplicates.
truth_df.sort_values(
by=truth_cols, inplace=True, ignore_index=True,
)
truth_df = truth_df.sort_values(by=truth_cols, ignore_index=True)

# Find the common set of WMO IDs.
common_wmo_ids = sorted(
Expand Down
12 changes: 6 additions & 6 deletions improver/calibration/ensemble_calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -1130,7 +1130,7 @@ def mask_cube(cube: Cube, landsea_mask: Cube) -> None:
IndexError: if the cube and landsea_mask shapes are not compatible.
"""
try:
cube.data[..., ~landsea_mask.data.astype(np.bool)] = np.nan
cube.data[..., ~landsea_mask.data.astype(bool)] = np.nan
except IndexError as err:
msg = "Cube and landsea_mask shapes are not compatible. {}".format(err)
raise IndexError(msg)
Expand Down Expand Up @@ -1380,7 +1380,6 @@ def process(
forecast_var,
number_of_realizations,
)

return coefficients_cubelist


Expand Down Expand Up @@ -1581,9 +1580,10 @@ def _calculate_scale_parameter(self) -> ndarray:
)

# Calculating the scale parameter, based on the raw variance S^2,
# where predicted variance = c + dS^2, where c = (gamma)^2 and
# d = (delta)^2
scale_parameter = (
# where predicted scale parameter (or equivalently standard deviation
# for a normal distribution) = sqrt(c + dS^2), where c = (gamma)^2 and
# d = (delta)^2.
scale_parameter = np.sqrt(
self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
* self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
+ self.coefficients_cubelist.extract_cube("emos_coefficient_delta").data
Expand Down Expand Up @@ -1622,7 +1622,7 @@ def _create_output_cubes(
)
scale_parameter_cube = create_new_diagnostic_cube(
"scale_parameter",
f"({template_cube.units})^2",
template_cube.units,
template_cube,
template_cube.attributes,
data=scale_parameter,
Expand Down
13 changes: 7 additions & 6 deletions improver/calibration/reliability_calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,10 +233,11 @@ def _create_reliability_table_cube(
) -> Cube:
"""
Construct a reliability table cube and populate it with the provided
data. The returned cube will include a cycle hour coordinate, which
describes the model cycle hour at which the forecast data was produced.
It will further include the forecast period, threshold coordinate,
and spatial coordinates from the forecast cube.
data. The returned cube will include a forecast_reference_time
coordinate, which will be the maximum range of bounds of the input
forecast reference times, with the point value set to the latest
of those in the inputs. It will further include the forecast period,
threshold coordinate, and spatial coordinates from the forecast cube.
Args:
forecast:
Expand Down Expand Up @@ -443,11 +444,11 @@ def process(self, historic_forecasts: Cube, truths: Cube) -> Cube:
whether the data is thresholded below or above a given diagnostic
threshold.
`historic_forecasts` and `truths` should have matching validity times.
Args:
historic_forecasts:
A cube containing the historical forecasts used in calibration.
These are expected to all have a consistent cycle hour, that is
the hour in the forecast reference time.
truths:
A cube containing the thresholded gridded truths used in
calibration.
Expand Down
32 changes: 16 additions & 16 deletions improver/cli/combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,11 @@
@cli.clizefy
@cli.with_output
def process(
*cubes: cli.inputcube, operation="+", new_name=None, broadcast_to_threshold=False,
*cubes: cli.inputcube,
operation="+",
new_name=None,
broadcast_to_threshold=False,
minimum_realizations=None,
):
r"""Combine input cubes.
Expand All @@ -58,26 +62,22 @@ def process(
broadcast_to_threshold (bool):
If True, broadcast input cubes to the threshold coord prior to combining -
a threshold coord must already exist on the first input cube.
minimum_realizations (int):
If specified, the input cubes will be filtered to ensure that only realizations that
include all available lead times are combined. If the number of realizations that
meet this criteria are fewer than this integer, an error will be raised.
Returns:
result (iris.cube.Cube):
Returns a cube with the combined data.
"""
from iris.cube import CubeList

from improver.cube_combiner import CubeCombiner, CubeMultiplier
from improver.cube_combiner import Combine

if not cubes:
raise TypeError("A cube is needed to be combined.")
if new_name is None:
new_name = cubes[0].name()

if operation == "*" or operation == "multiply":
result = CubeMultiplier()(
CubeList(cubes), new_name, broadcast_to_threshold=broadcast_to_threshold,
)

else:
result = CubeCombiner(operation)(CubeList(cubes), new_name)

return result
return Combine(
operation,
broadcast_to_threshold=broadcast_to_threshold,
minimum_realizations=minimum_realizations,
new_name=new_name,
)(CubeList(cubes))
2 changes: 1 addition & 1 deletion improver/cli/nbhood.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ def process(
lead_times=lead_times,
weighted_mode=weighted_mode,
sum_only=area_sum,
re_mask=True,
re_mask=remask,
)(cube, mask_cube=mask)
elif neighbourhood_output == "percentiles":
result = GeneratePercentilesFromANeighbourhood(
Expand Down
24 changes: 17 additions & 7 deletions improver/cli/wxcode.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def process(
*cubes: cli.inputcube,
wxtree: cli.inputjson = None,
model_id_attr: str = None,
target_period: int = None,
check_tree: bool = False,
):
""" Processes cube for Weather symbols.
Expand All @@ -54,12 +55,19 @@ def process(
Name of attribute recording source models that should be
inherited by the output cube. The source models are expected as
a space-separated string.
target_period:
The period in seconds that the weather symbol being produced should
represent. This should correspond with any period diagnostics, e.g.
precipitation accumulation, being used as input. This is used to scale
any threshold values that are defined with an associated period in
the decision tree. It will only be used if the decision tree
provided has threshold values defined with an associated period.
check_tree (bool):
If set the decision tree will be checked to see if it conforms to
the expected format; the only other argument required is the path
to the decision tree. If the tree is found to be valid the required
inputs will be listed. Setting this flag will prevent the CLI
performing any other actions.
If set, the decision tree will be checked to see if it conforms to
the expected format and that all nodes can be reached; the only other
argument required is the path to the decision tree. If the tree is found
to be valid the required inputs will be listed. Setting this flag will
prevent the CLI performing any other actions.
Returns:
iris.cube.Cube:
Expand All @@ -68,7 +76,7 @@ def process(
if check_tree:
from improver.wxcode.utilities import check_tree

return check_tree(wxtree)
return check_tree(wxtree, target_period=target_period)

from iris.cube import CubeList

Expand All @@ -77,4 +85,6 @@ def process(
if not cubes:
raise RuntimeError("Not enough input arguments. See help for more information.")

return WeatherSymbols(wxtree, model_id_attr=model_id_attr)(CubeList(cubes))
return WeatherSymbols(
wxtree, model_id_attr=model_id_attr, target_period=target_period
)(CubeList(cubes))
8 changes: 6 additions & 2 deletions improver/cli/wxcode_modal.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@

@cli.clizefy
@cli.with_output
def process(*cubes: cli.inputcube):
def process(*cubes: cli.inputcube, model_id_attr: str = None):
"""Generates a modal weather symbol for the period covered by the input
weather symbol cubes. Where there are different weather codes available
for night and day, the modal code returned is always a day code, regardless
Expand All @@ -46,6 +46,10 @@ def process(*cubes: cli.inputcube):
cubes (iris.cube.CubeList):
A cubelist containing weather symbols cubes that cover the period
over which a modal symbol is desired.
model_id_attr (str):
Name of attribute recording source models that should be
inherited by the output cube. The source models are expected as
a space-separated string.
Returns:
iris.cube.Cube:
Expand All @@ -56,4 +60,4 @@ def process(*cubes: cli.inputcube):
if not cubes:
raise RuntimeError("Not enough input arguments. See help for more information.")

return ModalWeatherCode()(cubes)
return ModalWeatherCode(model_id_attr=model_id_attr)(cubes)
Loading

0 comments on commit dd3b727

Please sign in to comment.