Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ch. 4.4 text on reference time vs. UDUNITS #405

Closed
larsbarring opened this issue Oct 7, 2022 · 22 comments · Fixed by #446
Closed

Update ch. 4.4 text on reference time vs. UDUNITS #405

larsbarring opened this issue Oct 7, 2022 · 22 comments · Fixed by #446
Labels
change agreed Issue accepted for inclusion in the next version and closed enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@larsbarring
Copy link
Contributor

larsbarring commented Oct 7, 2022

Update ch. 4.4 text on reference time vs. UDUNITS

Pull request 446

Moderator

@JonathanGregory

Moderator Status Review [last updated: YYYY-MM-DD]

Brief comment on current status, update periodically

Requirement Summary

The conventions text should be aligned with UDUNITS documentation and functionality, or departures should be specified.

Technical Proposal Summary

There are two parts to this proposal: (1) align how the since phrase is used in CF in relation to what UDUNITS accepts, and (2) remove a citation that is not present in the UDUNITS documentation.

Detailed Proposal

Point (1): Currently chapter 4.4 begins as follows:

Variables representing reference time must always explicitly include the units attribute; there is no default value. The units attribute takes a string value formatted as per the recommendations in the [UDUNITS] package.

In the UDUNIT-2 documentation at least I can not find any specific recommendation regarding formatting of a reference time unit. What however is specified in the unit grammar is several alternatives to the "since" string:

<shift_op>: one of
"@"
"after"
"from"
"since"
"ref"

From discussions at the hackaton it is clear that at least a few exist files using the "@" alternative. However, in the light of the overwhelming dominance of the since alternative it seems as an unnecessary complication to freely allow the use of any alternative. Hence I suggest the text is changed to "strongly encourage" the use of since, or even "strongly discourage" the use of the alternatives to since.

Point (2): The CF text continues thus:

The following excerpt from the UDUNITS documentation explains the time unit encoding by example:

"The specification seconds since 1992-10-8 15:15:42.5 -6:00 indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of Coordinated Universal Time (i.e. Mountain Daylight Time). The time zone specification can also be written without a colon using one or two digits (indicating hours) or three or four digits (indicating hours and minutes)."

Again, at least I cannot find the cited text in the UDUNITS-2 documentation. If the cited text is indeed not in the UDUNITS-2 documentation I suggest to simply drop the citation marks and rephrase the first line to drop the reference to UDUNITS.

@larsbarring larsbarring added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Oct 7, 2022
@taylor13
Copy link

taylor13 commented Oct 7, 2022

I was not aware that there were synonyms for "since" allowed by udunits. I do not see the advantage of having several options, so I would strongly support Lars in discouraging use of anything but "since" in a reference time. (I wish we could forbid the other options, but it sounds like at least "@" is in use.)

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Oct 7, 2022

Dear @larsbarring

(1) Like Karl I did not know about this! Indeed, UDUNITS supports it:

You have: day after 2-1-2022
You want: day after 1-1-2022
    1 day after 2-1-2022 = 366 (day after 1-1-2022)
    x/(day after 1-1-2022) = (x/(day after 2-1-2022)) + 365

I agree with you and Karl about preferring since. I would say that since "should be" used and the alternatives allowed by UDUNITS "should not be" used. That makes it a recommendation for the conformance document, and the cf-checker would give a warning about it.

(2) This text appears in the udunits(3) *nix man page e.g. at https://linux.die.net/man/3/udunits. Even so, it does not need to be given as a quotation, I agree.

Thanks for raising these points. Happy weekend

Jonathan

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Oct 7, 2022

Reading my example again, I see my mistake from the absurd answer. 😄 Sorry! It should say

You have: day since 2022-1-2
You want: day since 2022-1-1
    1 day since 2022-1-2 = 2 (day since 2022-1-1)
    x/(day since 2022-1-1) = (x/(day since 2022-1-2)) + 1

and the same with after.

@larsbarring
Copy link
Contributor Author

Dear @JonathanGregory

Regarding (2) I think that we need to be a bit careful when thinking about references and links. The text cited is not [to my knowledge] in the UDUNITS2 documentation, and the link you give to *nix man pages refers to the previous UDUNITS version. For example, it refers to the old formatted text file holding units definitions (and mentions that on "October 9, 1992, it contained 446 entries".)

Anyway, the explanation as such is fine and can as you say remain without quotation.

@JonathanGregory
Copy link
Contributor

Dear @larsbarring

Yes, I think you're right. The man page is out of date. Since the three of us who have contributed so far agree, and no-one's disagreed yet, will you propose exact changes to the wording?

Best wishes and thanks

Jonathan

@larsbarring
Copy link
Contributor Author

Dear @JonathanGregory,

Yes, I can do that, and almost started already. But when thinking about the wording in concrete terms I wonder, @RosalynHatcher, how the cf-checker currently handle the synonyms to since?

In principle I agree with Karl that it would be preferable to forbid the alternatives, but we cannot do this altogether for historic reasons. But I think that just stating that "since should be used" and the "alternatives should not be used" is not strong enough, because in principle it leaves the decision in the hands of the data writers, and I imagine that a fair number do not even know about the cf-checker. Hence I think that this is a good example of when deprecation is useful: files already produced according to previous CF versions are acceptable (more or less, depending on what the cf-checker does) but new files produced according to next CF version will have to use since.

Thanks,
Lars

@larsbarring
Copy link
Contributor Author

And after having revisited #328 I should probably rephrase the previous comment to state that I would prefer the use of since to be a requirement (and thus to disallow the alternatives as of next CF Conventions version.)

@taylor13
Copy link

I would support that unless "@" is really commonly used by some segment of our community (not just sporadically).

@JonathanGregory
Copy link
Contributor

Dear @larsbarring and @taylor13

I would be comfortable with deprecating all the alternatives to since. That means the checker would give a warning if they were used in new data (from the next version of CF) but not with existing data. That's the usual way it works.

Just above you say you'd prefer it to be a requirement, which is the same except that checker would give an error rather than a warning. I'm afraid we're going to disagree as we did on a very similar point in some other issue - I can't remember which one. 😞 The rule-of-thumb I am applying (my own rule, not a CF one) is that we should prohibit things that make the metadata impossible to interpret in some way, because they produce a contradiction or ambiguity, for instance. We deprecate things which make the metadata inconvenient or less satisfactory, but still useable. I don't think we should use more compulsion than necessary, because we don't want the convention to be onerous (that's one of the principles), but we should insist that the metadata is useable.

Best wishes

Jonathan

@larsbarring
Copy link
Contributor Author

Dear @JonathanGregory

I have just spent a couple of hours exploring how different tools react to the synonyms, and it seems that they mostly work (ncgen, nco, ncview, and a few others). So, I am not going to push this any further. At the same time I must say that I feel a bit at loss regarding how to formulate a suggested update. I still would prefer, as @taylor13 might do (?), a strong discouragement for the alternatives, I think that you might not want this.

While I understand your [own] rule-of-thumb, and in general agree, I also argue that the degree of onerousness depends on the context. And offering alternatives for no other reason than it is possible to have alternatives may in fact add to the onerousness on all parties involved (data producers, consumers and software developers).

@taylor13
Copy link

As @larsbarring implies, allowing aliases seems more difficult for those writing software to interpret data (though perhaps easier for those writing data). I have never seen "@" used in a file, so it might be worthwhile pinging those at the hackathon who were aware of such files. If "@" has only been used by a few individuals, but is not routinely used by any community (however small), then I would think we could outlaw it. How can we poll folks to find out?

@larsbarring
Copy link
Contributor Author

Hi Antonio, @cofinoa

Sorry for also pinging you this way, but do you have any further information/recollection regarding the files having "@" instead of "since" in the reference time unit?

Many thanks,
Lars

@larsbarring
Copy link
Contributor Author

larsbarring commented Oct 13, 2022

Just to add weight to the problem of software development: I did some further checks of software tools and it appears that the pretty central python package, cftime (ping @jswhit) requires "since", see below. Of course not everyone use python and tools that builds on cftime. Still, due to its popularity I think that this adds some weight to the idea that files using the "@" are rare.

>>> import numpy as np
>>> import cftime
>>> print(cftime.__version__)
1.5.2
>>> time = np.array([1])
>>> s = "days since 1970-01-01"
>>> f = "days from 1970-01-01"
>>> a = "days after 1970-01-01"
>>> r = "days ref 1970-01-01"
>>> at = "days @ 1970-01-01"
>>> calendar =  "standard"

>>> print(cftime.num2date(time , s, calendar))
[cftime.DatetimeGregorian(1970, 1, 2, 0, 0, 0, 0, has_year_zero=False)]

>>> print(cftime.num2date(time , f, calendar))
....
ValueError: no 'since' in unit_string

>>> print(cftime.num2date(time , a, calendar))
....
ValueError: no 'since' in unit_string

>>> print(cftime.num2date(time , at, calendar))
....
ValueError: no 'since' in unit_string

@JonathanGregory
Copy link
Contributor

Dear @larsbarring

I think discouragement, as you say, would be OK. For example, after the para in 4.4 that starts "The acceptable units for time", we could put

UDUNITS permits a number of alternatives to the word since in the units of time coordinates. All the alternatives have exactly the same meaning in UDUNITS. For compatibility with other software, CF recommends that since should be used.

That recommendation would appear in the conformance document, and the CF-checker would give a warning if any word other than since appeared in this position.

Best wishes

Jonathan

@taylor13
Copy link

This seems o.k. to me. Thanks.

@larsbarring
Copy link
Contributor Author

As we are more or less in agreement I think that we should move on to draft an updated text. Here is a first go at section 4.4 Time coordinate:

Variables representing reference time must always explicitly include the units attribute; there is no default value.
The units attribute takes a string value that follows UDUNITS formatting requirements.

For example, the time unit specification seconds since 1992-10-8 15:15:42.5 -6:00 indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of Coordinated Universal Time (i.e. Mountain Daylight Time). The time zone specification can also be written without a colon using one or two digits (indicating hours) or three or four digits (indicating hours and minutes).

The acceptable units for time are given by the UDUNITS package <>.
The most commonly used of these strings (and their abbreviations) includes day (d), hour (hr, h), minute (min) and second (sec, s).
Plural forms are also acceptable.

UDUNITS permits a number of alternatives to the word since in the units of time coordinates. All the alternatives have exactly the same meaning in UDUNITS. For compatibility with other software, CF strongly recommends that since should be used.

The reference date/time string (appearing after the identifier since) is required.
.... ....

@taylor13
Copy link

This looks fine to me except the missing link to udunits ("<>") in the above.

@larsbarring
Copy link
Contributor Author

larsbarring commented Aug 14, 2023

Thank you @taylor13. Regarding the missing link; in the draft pull request it is present as in the original text, it just did not render well when copy-pasted over to this issue.

@larsbarring larsbarring mentioned this issue Aug 15, 2023
4 tasks
@larsbarring
Copy link
Contributor Author

And now there is (above) a draft pull request.

@JonathanGregory
Copy link
Contributor

Dear @larsbarring

Since enough support has been expressed according to the rules, and much longer than three weeks has passed, I declare the proposal for change to be accepted. Thanks for preparing the pull request. I have a couple of comments:

  • In the sentence "The units attribute takes a string value that follows UDUNITS formatting requirements.", is there a reason for UDUNITS not to be a reference to the UDUNITS doc, as it is in the existing version?
  • Please add yourself to the additional authors in cf-conventions.adoc.

Best wishes

Jonathan

@larsbarring
Copy link
Contributor Author

Thank you Jonathan @JonathanGregory
Regarding your first point I added the reference link (and removed the link in a later sentence). Also I added a new sentence to better connect the first paragraph to the second:
OLD:

Variables representing reference time must always explicitly include the units attribute; there is no default value.
The units attribute takes a string value that follows UDUNITS formatting requirements.

For example, the time unit specification ....

UPDATED:

Variables representing reference time must always explicitly include the units attribute; there is no default value. The units attribute takes a string value that follows the formatting requirements of the <> package. These requirements can best be described by an example with explanatory comments:

The time unit specification ....

I have updated the authors list as per your second point.

@JonathanGregory JonathanGregory linked a pull request Aug 21, 2023 that will close this issue
@JonathanGregory JonathanGregory removed a link to a pull request Aug 21, 2023
@JonathanGregory JonathanGregory linked a pull request Aug 21, 2023 that will close this issue
4 tasks
@JonathanGregory
Copy link
Contributor

Thanks, Lars. I will merge now. This update will be included in 1.11.

JonathanGregory added a commit that referenced this issue Aug 21, 2023
@JonathanGregory JonathanGregory added the change agreed Issue accepted for inclusion in the next version and closed label Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change agreed Issue accepted for inclusion in the next version and closed enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants