Support altering the sites calibrated using EMOS #1706

gavinevans · 2022-04-29T08:30:31Z

Addresses https://github.com/metoppv/mo-blue-team/issues/239

Description
The aim of this PR is to support adding new sites for use in estimating EMOS coefficients for site calibration. The changes in this PR primarily consist of:

Adding a fill_missing_entries function to pad a dataframe, if a site is only present at some validity times. This then allows the construction of a cube with consistent dimensions at different validity times.
Adding a check_data_sufficiency function to consider whether there is sufficient valid data, rather than NaNs, following the padding.

Following #1698, this PR has been updated to support the possible presence of a station_id column. The primary amendments related to this are:

The PR now causes an error to be raised if the station_id column is present only on one of the forecast or truth dataframes.
The checks on the presence of a station_id column have been moved into a single check to avoid any unforeseen circumstances from having different checks at different points in the code.

A few other minor changes have also been made mainly to correct the ordering of the dimension coordinates within unit tests to match the expectations that the forecast representation coordinate will always be the first dimension coordinate.

Testing:

Ran tests and they passed OK
Added new tests for the new feature(s)

codecov · 2022-04-29T08:35:48Z

Codecov Report

Merging #1706 (3eeca5e) into master (6f6e334) will increase coverage by 0.08%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1706      +/-   ##
==========================================
+ Coverage   98.14%   98.23%   +0.08%     
==========================================
  Files         111      115       +4     
  Lines       10244    10764     +520     
==========================================
+ Hits        10054    10574     +520     
  Misses        190      190

Impacted Files	Coverage Δ
improver/calibration/dataframe_utilities.py	`100.00% <100.00%> (ø)`
improver/calibration/ensemble_calibration.py	`100.00% <100.00%> (ø)`
improver/calibration/utilities.py	`100.00% <100.00%> (ø)`
...ometric_calculations/psychrometric_calculations.py	`98.85% <0.00%> (-0.35%)`	⬇️
improver/constants.py	`100.00% <0.00%> (ø)`
improver/regrid/landsea.py	`99.21% <0.00%> (ø)`
improver/utilities/solar.py	`100.00% <0.00%> (ø)`
improver/metadata/probabilistic.py	`100.00% <0.00%> (ø)`
improver/developer_tools/metadata_interpreter.py	`99.35% <0.00%> (ø)`
improver/calibration/rainforest_calibration.py	`99.41% <0.00%> (ø)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f6e334...3eeca5e. Read the comment docs.

…nly a single forecast and truth exist for a site.

…forecast and truth data pairs for calibration with EMOS.

…id column when trying to add and remove new sites.

gavinevans · 2022-05-20T14:20:05Z

It would be useful if someone from BoM could review the changes in dataframe_utilities.py to ensure that this is compatible with your work. Other changes in this PR (e.g. in calibration/utilities.py) don't require a BoM review, unless you'd like to.

btrotta-bom

The logic looks ok to me, and I think changes are consistent with BoM's intended use of the station_id column. I have made a few small suggestions for code clarity and performance.

improver/calibration/dataframe_utilities.py

improver/calibration/utilities.py

btrotta-bom · 2022-05-23T04:24:19Z

improver_tests/calibration/dataframe_utilities/test_dataframe_utilities.py

+        self.expected_period_forecast.data[:, :2, -1] = np.nan
+        self.expected_period_truth.data[:2, -1] = np.nan


Will modifying properties of self cause problems if the data is used in other tests (if not now, maybe for tests added in future)? Should this be a copy instead?
Edit: actually, after some googling I see that a new instance of the class is created for each test, so modifying its properties does not affect other tests. In that case, why do the tests make copies of the dataframe properties before modifying them? (e.g. in the line just below this, df = self.forecast_df.copy())

I think that the copying is out of an abundance of caution about tests conflicting. I think you're right that quite a lot of the copying isn't really necessary, so I've removed most of this in 0026589.

This looks ok to me, but I'm not an expert on test frameworks, so possibly there is some good reason for the copies that I'm not aware of. Would be good if another reviewer can comment.

improver_tests/calibration/dataframe_utilities/test_dataframe_utilities.py

gavinevans · 2022-05-23T16:09:01Z

Thanks a lot for the review comments @btrotta-bom. I think that I've now responded to these.

btrotta-bom

Thanks @gavinevans, changes look fine.

…sent only on one of the forecast or truth dataframes.

Kat-90

Ran unit and acceptance tests.
Just a couple of questions and comments.
Can't really contribute to the copy question as I follow convention set in IMPROVER, maybe someone in the green team could help?

Thanks

improver/calibration/dataframe_utilities.py

improver/calibration/utilities.py

improver_tests/calibration/utilities/test_utilities.py

improver_tests/calibration/dataframe_utilities/test_dataframe_utilities.py

Kat-90

Happy to approve.

Note: I have not addressed the unit test copy question.

…hen filling missing entries.

…station_id column when new sites are added.

Kat-90

Happy with the changes made in this PR and the tests run properly.

gavinevans · 2022-06-21T15:15:16Z

Thanks for your previous review on this PR, @btrotta-bom. I've recently pushed up a couple more commits to this PR following some further testing. These commits relate to handling the station_id column when new sites are added to the parquet files. Would you be able to, or interested in, reviewing these latest commits?

btrotta-bom

Looks fine to me, I made a couple of suggestions for clarity.

btrotta-bom · 2022-06-22T03:51:39Z

improver/calibration/dataframe_utilities.py

+        # Add wmo_id as a static column, if station ID is present in both the
+        # forecast and truth DataFrames.
+        static_cols.append("wmo_id")
+    elif ("station_id" in forecast_df.columns) and not include_station_id:


I think you can remove the second part of the condition and just use elif "station_id" in forecast_df.columns: (and similarly for truth_df below).

Thanks. Yes, you're correct that the second part of this condition isn't actually required here.

btrotta-bom · 2022-06-22T04:45:51Z

improver_tests/calibration/dataframe_utilities/test_dataframe_utilities.py

+        self.forecast_df_multi_station_id["station_id"] = (
+            self.forecast_df_multi_station_id["wmo_id"] + "0"
+        )


It looks like this station_id column is added in all the tests that use self.forecast_df_multi_station_id, and it is always defined the same way, so maybe this could be done in the setup instead. (Similarly for self.truth_df_multi_station_id.)

I've centralised this as suggested.

gavinevans · 2022-06-23T08:24:20Z

Thanks for the review comments, @btrotta-bom. Could you take another look at this please?

btrotta-bom

Thanks @gavinevans, this looks fine.

improver_tests/calibration/dataframe_utilities/test_dataframe_utilities.py

* Initial commit to add code with the aim of handling instances where only a single forecast and truth exist for a site. * Add functionality for checking whether there are sufficient historic forecast and truth data pairs for calibration with EMOS. * Remove commented out code. * Minor formatting edits. * Add additional unit test. * Use representation_type, rather than percentile. * Modifications related to handling the possible presence of a station_id column when trying to add and remove new sites. * Minor edits following review. * Remove copy statements. * Switch to raise a warning, rather than an error, if station_id is present only on one of the forecast or truth dataframes. * Minor edits following review comments. * Minor edit to ensure that the station_id column is filled correctly when filling missing entries. * Modifications following further testing to support the presence of a station_id column when new sites are added. * Minor edits following review. * Add comments for clarification. * Minor edits to comments.

gavinevans self-assigned this Apr 29, 2022

gavinevans added 7 commits May 17, 2022 16:22

Initial commit to add code with the aim of handling instances where o…

d8bc9da

…nly a single forecast and truth exist for a site.

Add functionality for checking whether there are sufficient historic …

2fe57c0

…forecast and truth data pairs for calibration with EMOS.

Remove commented out code.

32667b7

Minor formatting edits.

85347c3

Add additional unit test.

49d500a

Use representation_type, rather than percentile.

85d7ab1

Modifications related to handling the possible presence of a station_…

be684b7

…id column when trying to add and remove new sites.

gavinevans force-pushed the mobt_239_update_sites branch from 32cf229 to be684b7 Compare May 20, 2022 10:49

gavinevans added the BoM review required PRs opened by non-BoM developers that require a BoM review label May 20, 2022

gavinevans marked this pull request as ready for review May 20, 2022 14:20

btrotta-bom reviewed May 23, 2022

View reviewed changes

gavinevans force-pushed the mobt_239_update_sites branch from 985c4d2 to 338fd03 Compare May 23, 2022 15:49

gavinevans added 2 commits May 23, 2022 16:57

Minor edits following review.

f905bac

Remove copy statements.

0026589

gavinevans force-pushed the mobt_239_update_sites branch from 338fd03 to 0026589 Compare May 23, 2022 15:57

btrotta-bom previously approved these changes May 31, 2022

View reviewed changes

gavinevans dismissed btrotta-bom’s stale review via 8333c63 June 1, 2022 11:30

Switch to raise a warning, rather than an error, if station_id is pre…

eb6ecf0

…sent only on one of the forecast or truth dataframes.

gavinevans force-pushed the mobt_239_update_sites branch from 8333c63 to eb6ecf0 Compare June 1, 2022 11:32

Kat-90 requested changes Jun 1, 2022

View reviewed changes

Minor edits following review comments.

3b505d6

Kat-90 previously approved these changes Jun 1, 2022

View reviewed changes

gavinevans added the don't merge yet label Jun 1, 2022

gavinevans added 2 commits June 20, 2022 11:34

Minor edit to ensure that the station_id column is filled correctly w…

f04428c

…hen filling missing entries.

Modifications following further testing to support the presence of a …

2bf7248

…station_id column when new sites are added.

gavinevans dismissed Kat-90’s stale review via 2bf7248 June 20, 2022 16:07

Kat-90 previously approved these changes Jun 21, 2022

View reviewed changes

btrotta-bom previously approved these changes Jun 22, 2022

View reviewed changes

Minor edits following review.

acf056d

gavinevans dismissed stale reviews from btrotta-bom and Kat-90 via acf056d June 23, 2022 08:17

gavinevans assigned btrotta-bom and unassigned gavinevans Jun 23, 2022

btrotta-bom previously approved these changes Jun 24, 2022

View reviewed changes

fionaRust reviewed Jun 29, 2022

View reviewed changes

improver_tests/calibration/dataframe_utilities/test_dataframe_utilities.py Show resolved Hide resolved

improver_tests/calibration/dataframe_utilities/test_dataframe_utilities.py Show resolved Hide resolved

Add comments for clarification.

8d794d4

gavinevans dismissed btrotta-bom’s stale review via 8d794d4 July 4, 2022 07:21

Minor edits to comments.

3eeca5e

fionaRust approved these changes Jul 4, 2022

View reviewed changes

gavinevans removed the don't merge yet label Jul 4, 2022

gavinevans merged commit 571412f into metoppv:master Jul 4, 2022

gavinevans deleted the mobt_239_update_sites branch July 4, 2022 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support altering the sites calibrated using EMOS #1706

Support altering the sites calibrated using EMOS #1706

gavinevans commented Apr 29, 2022 •

edited

Loading

codecov bot commented Apr 29, 2022 •

edited

Loading

gavinevans commented May 20, 2022

btrotta-bom left a comment

btrotta-bom May 23, 2022

gavinevans May 23, 2022

btrotta-bom May 25, 2022

gavinevans commented May 23, 2022

btrotta-bom left a comment

Kat-90 left a comment

Kat-90 left a comment

Kat-90 left a comment

gavinevans commented Jun 21, 2022

btrotta-bom left a comment

btrotta-bom Jun 22, 2022

gavinevans Jun 23, 2022

btrotta-bom Jun 22, 2022

gavinevans Jun 23, 2022

gavinevans commented Jun 23, 2022

btrotta-bom left a comment

		self.expected_period_forecast.data[:, :2, -1] = np.nan
		self.expected_period_truth.data[:2, -1] = np.nan

Support altering the sites calibrated using EMOS #1706

Support altering the sites calibrated using EMOS #1706

Conversation

gavinevans commented Apr 29, 2022 • edited Loading

codecov bot commented Apr 29, 2022 • edited Loading

Codecov Report

gavinevans commented May 20, 2022

btrotta-bom left a comment

Choose a reason for hiding this comment

btrotta-bom May 23, 2022

Choose a reason for hiding this comment

gavinevans May 23, 2022

Choose a reason for hiding this comment

btrotta-bom May 25, 2022

Choose a reason for hiding this comment

gavinevans commented May 23, 2022

btrotta-bom left a comment

Choose a reason for hiding this comment

Kat-90 left a comment

Choose a reason for hiding this comment

Kat-90 left a comment

Choose a reason for hiding this comment

Kat-90 left a comment

Choose a reason for hiding this comment

gavinevans commented Jun 21, 2022

btrotta-bom left a comment

Choose a reason for hiding this comment

btrotta-bom Jun 22, 2022

Choose a reason for hiding this comment

gavinevans Jun 23, 2022

Choose a reason for hiding this comment

btrotta-bom Jun 22, 2022

Choose a reason for hiding this comment

gavinevans Jun 23, 2022

Choose a reason for hiding this comment

gavinevans commented Jun 23, 2022

btrotta-bom left a comment

Choose a reason for hiding this comment

gavinevans commented Apr 29, 2022 •

edited

Loading

codecov bot commented Apr 29, 2022 •

edited

Loading