Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support altering the sites calibrated using EMOS #1706

Merged
merged 16 commits into from
Jul 4, 2022

Conversation

gavinevans
Copy link
Contributor

@gavinevans gavinevans commented Apr 29, 2022

Addresses https://github.com/metoppv/mo-blue-team/issues/239

Description
The aim of this PR is to support adding new sites for use in estimating EMOS coefficients for site calibration. The changes in this PR primarily consist of:

  • Adding a fill_missing_entries function to pad a dataframe, if a site is only present at some validity times. This then allows the construction of a cube with consistent dimensions at different validity times.
  • Adding a check_data_sufficiency function to consider whether there is sufficient valid data, rather than NaNs, following the padding.

Following #1698, this PR has been updated to support the possible presence of a station_id column. The primary amendments related to this are:

  • The PR now causes an error to be raised if the station_id column is present only on one of the forecast or truth dataframes.
  • The checks on the presence of a station_id column have been moved into a single check to avoid any unforeseen circumstances from having different checks at different points in the code.

A few other minor changes have also been made mainly to correct the ordering of the dimension coordinates within unit tests to match the expectations that the forecast representation coordinate will always be the first dimension coordinate.

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

@gavinevans gavinevans self-assigned this Apr 29, 2022
@codecov
Copy link

codecov bot commented Apr 29, 2022

Codecov Report

Merging #1706 (3eeca5e) into master (6f6e334) will increase coverage by 0.08%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1706      +/-   ##
==========================================
+ Coverage   98.14%   98.23%   +0.08%     
==========================================
  Files         111      115       +4     
  Lines       10244    10764     +520     
==========================================
+ Hits        10054    10574     +520     
  Misses        190      190              
Impacted Files Coverage Δ
improver/calibration/dataframe_utilities.py 100.00% <100.00%> (ø)
improver/calibration/ensemble_calibration.py 100.00% <100.00%> (ø)
improver/calibration/utilities.py 100.00% <100.00%> (ø)
...ometric_calculations/psychrometric_calculations.py 98.85% <0.00%> (-0.35%) ⬇️
improver/constants.py 100.00% <0.00%> (ø)
improver/regrid/landsea.py 99.21% <0.00%> (ø)
improver/utilities/solar.py 100.00% <0.00%> (ø)
improver/metadata/probabilistic.py 100.00% <0.00%> (ø)
improver/developer_tools/metadata_interpreter.py 99.35% <0.00%> (ø)
improver/calibration/rainforest_calibration.py 99.41% <0.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f6e334...3eeca5e. Read the comment docs.

@gavinevans gavinevans force-pushed the mobt_239_update_sites branch from 32cf229 to be684b7 Compare May 20, 2022 10:49
@gavinevans
Copy link
Contributor Author

It would be useful if someone from BoM could review the changes in dataframe_utilities.py to ensure that this is compatible with your work. Other changes in this PR (e.g. in calibration/utilities.py) don't require a BoM review, unless you'd like to.

@gavinevans gavinevans added the BoM review required PRs opened by non-BoM developers that require a BoM review label May 20, 2022
@gavinevans gavinevans marked this pull request as ready for review May 20, 2022 14:20
Copy link
Contributor

@btrotta-bom btrotta-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks ok to me, and I think changes are consistent with BoM's intended use of the station_id column. I have made a few small suggestions for code clarity and performance.

improver/calibration/dataframe_utilities.py Outdated Show resolved Hide resolved
improver/calibration/dataframe_utilities.py Show resolved Hide resolved
improver/calibration/dataframe_utilities.py Outdated Show resolved Hide resolved
improver/calibration/dataframe_utilities.py Outdated Show resolved Hide resolved
improver/calibration/dataframe_utilities.py Outdated Show resolved Hide resolved
improver/calibration/utilities.py Outdated Show resolved Hide resolved
improver/calibration/utilities.py Outdated Show resolved Hide resolved
Comment on lines +709 to +710
self.expected_period_forecast.data[:, :2, -1] = np.nan
self.expected_period_truth.data[:2, -1] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will modifying properties of self cause problems if the data is used in other tests (if not now, maybe for tests added in future)? Should this be a copy instead?
Edit: actually, after some googling I see that a new instance of the class is created for each test, so modifying its properties does not affect other tests. In that case, why do the tests make copies of the dataframe properties before modifying them? (e.g. in the line just below this, df = self.forecast_df.copy())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the copying is out of an abundance of caution about tests conflicting. I think you're right that quite a lot of the copying isn't really necessary, so I've removed most of this in 0026589.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me, but I'm not an expert on test frameworks, so possibly there is some good reason for the copies that I'm not aware of. Would be good if another reviewer can comment.

@gavinevans gavinevans force-pushed the mobt_239_update_sites branch from 985c4d2 to 338fd03 Compare May 23, 2022 15:49
@gavinevans gavinevans force-pushed the mobt_239_update_sites branch from 338fd03 to 0026589 Compare May 23, 2022 15:57
@gavinevans
Copy link
Contributor Author

Thanks a lot for the review comments @btrotta-bom. I think that I've now responded to these.

btrotta-bom
btrotta-bom previously approved these changes May 31, 2022
Copy link
Contributor

@btrotta-bom btrotta-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gavinevans, changes look fine.

…sent only on one of the forecast or truth dataframes.
@gavinevans gavinevans force-pushed the mobt_239_update_sites branch from 8333c63 to eb6ecf0 Compare June 1, 2022 11:32
Copy link
Contributor

@Kat-90 Kat-90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran unit and acceptance tests.
Just a couple of questions and comments.
Can't really contribute to the copy question as I follow convention set in IMPROVER, maybe someone in the green team could help?

Thanks

Kat-90
Kat-90 previously approved these changes Jun 1, 2022
Copy link
Contributor

@Kat-90 Kat-90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to approve.

Note: I have not addressed the unit test copy question.

Kat-90
Kat-90 previously approved these changes Jun 21, 2022
Copy link
Contributor

@Kat-90 Kat-90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the changes made in this PR and the tests run properly.

@gavinevans
Copy link
Contributor Author

Thanks for your previous review on this PR, @btrotta-bom. I've recently pushed up a couple more commits to this PR following some further testing. These commits relate to handling the station_id column when new sites are added to the parquet files. Would you be able to, or interested in, reviewing these latest commits?

btrotta-bom
btrotta-bom previously approved these changes Jun 22, 2022
Copy link
Contributor

@btrotta-bom btrotta-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, I made a couple of suggestions for clarity.

# Add wmo_id as a static column, if station ID is present in both the
# forecast and truth DataFrames.
static_cols.append("wmo_id")
elif ("station_id" in forecast_df.columns) and not include_station_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the second part of the condition and just use elif "station_id" in forecast_df.columns: (and similarly for truth_df below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yes, you're correct that the second part of this condition isn't actually required here.

Comment on lines 812 to 814
self.forecast_df_multi_station_id["station_id"] = (
self.forecast_df_multi_station_id["wmo_id"] + "0"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this station_id column is added in all the tests that use self.forecast_df_multi_station_id, and it is always defined the same way, so maybe this could be done in the setup instead. (Similarly for self.truth_df_multi_station_id.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've centralised this as suggested.

@gavinevans gavinevans dismissed stale reviews from btrotta-bom and Kat-90 via acf056d June 23, 2022 08:17
@gavinevans gavinevans assigned btrotta-bom and unassigned gavinevans Jun 23, 2022
@gavinevans
Copy link
Contributor Author

Thanks for the review comments, @btrotta-bom. Could you take another look at this please?

btrotta-bom
btrotta-bom previously approved these changes Jun 24, 2022
Copy link
Contributor

@btrotta-bom btrotta-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gavinevans, this looks fine.

@gavinevans gavinevans merged commit 571412f into metoppv:master Jul 4, 2022
@gavinevans gavinevans deleted the mobt_239_update_sites branch July 4, 2022 15:39
MoseleyS pushed a commit to MoseleyS/improver that referenced this pull request Aug 22, 2024
* Initial commit to add code with the aim of handling instances where only a single forecast and truth exist for a site.

* Add functionality for checking whether there are sufficient historic forecast and truth data pairs for calibration with EMOS.

* Remove commented out code.

* Minor formatting edits.

* Add additional unit test.

* Use representation_type, rather than percentile.

* Modifications related to handling the possible presence of a station_id column when trying to add and remove new sites.

* Minor edits following review.

* Remove copy statements.

* Switch to raise a warning, rather than an error, if station_id is present only on one of the forecast or truth dataframes.

* Minor edits following review comments.

* Minor edit to ensure that the station_id column is filled correctly when filling missing entries.

* Modifications following further testing to support the presence of a station_id column when new sites are added.

* Minor edits following review.

* Add comments for clarification.

* Minor edits to comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BoM review required PRs opened by non-BoM developers that require a BoM review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants