-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new integer types to CF #243
Comments
Meant to ping a few others involved in original discussions: @JimBiardCics @Dave-Allured @rsignell-usgs (didn't find user names for all from email discussion). I wasn't involved in earlier discussions and happy to be moderator. |
Thanks Ethan. I support this proposal as currently written. It is overdue. |
Me two. |
@ethanrd I'm happy to moderate this if you like. I also support this proposal. |
Hi Ethan, I support this .. but we need to check other references to "integer". There are 9 statements in the convention that something "must be type integer" (e.g. "In this representation, the file contains an index variable , which must be of type integer, and must have the sample dimension as its single dimension"). Perhaps all occurrences of this phrase can be replaced with "must have an integer type"? It might be worth stating the link between |
I wholeheartedly approve of this proposal. @martinjuckes makes an excellent point regarding "must have an integer type". |
@martinjuckes Notice that the examples where |
@ethanrd The CF documentation regarding constants of various types references the NUG, which gives some examples of CDL syntax, but which is by no means a rigorous description of how to represent each data type in an attribute. The ncgen man page has a discussion of this, but that sure seems like a multiply-removed place for a basic definition to reside. Perhaps we need to have the NUG or CF go into greater detail about this. (I think the NUG is probably the more appropriate place.) |
I support this, it will be very welcome. |
Summary3 weeks have passed since any contribution and it seems we're pretty much at consensus. Supporting voices, including some from the Conventions Committee:
In my opinion one item should be addressed before this merge, however: |
The original, simple proposal as currently drafted by @ethanrd will have no bearing on other defects or ambiguities in other references to "integer", flag values, or type agreement between attributes and variables. For the sake of efficient process, I request that these other issues be split off as separate Github issues, as needed. I request consensus on the current draft with no further effort. I think this is in good shape to move. Thanks for your consideration. |
As noted by @Dave-Allured, @ethanrd is attempting to remove restrictions on data types in general. This has broad support. The concern of @martinjuckes is that it would be possible to lift the current restriction on index variables, as is used for ragged arrays. However, this could also be addressed in a separate issue - under the current proposal, ragged arrays would still require integers, but we could change that in the future and in the meantime make use of the new types for other variables. Is this OK for the group? Barring objections I will merge this after the three week deadline, namely 24 March 2020. |
I think readers of the document would probably want to understand "integer" to mean any integer type (any number of bytes, signed or unsigned). There is one reference to integer in regard to I agree with @martinjuckes that it would be sensible to replace "type integer" with "any integer type" (or something like that) everywhere, and it would be helpful in sect 2.2 to list explicitly all the datatypes which we mean by "any integer type" in the document. I too support the change being made. |
@JonathanGregory @martinjuckes I agree that these are small changes and support both their implementation and the adoption of this (small) change as a whole. Unfortunately I'm about to be offline for several weeks so I won't be able to pursue implementing these changes myself. @Dave-Allured perhaps you'd be interested in contributing those to @ethanrd 's PR? If they've been implemented before my return and there are no further objections I am happy to merge the changes upon my return at the end of March. |
No thank you. I believe that the original proposal is sufficient and elegant. I would prefer to leave further tuning to those contributors who are unsatisfied and have changes to offer. |
While I agree with @Dave-Allured on keeping PRs focused, I can also see @martinjuckes point that the "must be type integer" statements are linked to this PR. It is also a pretty minimal change, so I will update the "must be type integer" statements and push that change to this PR. The flag_values section might, I think, need some discussion so I will break that into a separate issue/PR. |
I am adding the following sentences to the end of the first paragraph in Section 2.2 "Data Types":
The wording seems a bit awkward, so please make suggestions and I will update the PR. |
Expanding to
Anything else? Or again, better wording all around? |
(Just an idea, take it or leave it.) I would add the following text after the next-to-last sentence in the paragraph (which first uses the phrase "integer types"): "In this document, the term "integer type" includes all of the following types: …" Could be in parentheses, or not. I think that means the same as what your text expresses. |
The definition of "integer types" could perhaps be worked into the first sentence, so as to avoid repeating the list of 8 types:
Is "supported" (taken from the existing convention text) the right word here? Perhaps it would be better to say "Data variables must be assigned one of the following NetCDF external data types: ..". NetCDF also allows user defined data types -- but I don't think we want to allow these in CF at this point (just because no use case has been proposed so far). The last sentence of the first paragraph ("It is possible to treat the byte type as unsigned by using the NUG convention of indicating the unsigned range using the valid_min, valid_max, or valid_range attributes.") looks mis-leading to me. As we now have two explicit byte data types, I suggest we remove this. |
@ethanrd I like your verbiage. I think you could simplify it a bit by saying In many situations, any integer type may be used. This covers both "an integer type" and "any integer type". |
@graybeal @martinjuckes @JimBiardCics - Thanks for your input. @martinjuckes : I agree the first sentence in the Data Type section could be stronger. I would suggest simplifying your alternate sentence by dropping "assigned" and being more specific that these are all netCDF external data types supported by netCDF-4:
|
@ethanrd : yes, dropping "assigned" is a good idea and I like your latest reformulation. |
It looks like consensus has been reached on the text as it stands - I'm starting the countdown and will merge this PR, barring objections or further changes, on 2020-04-16. |
As noted in my previous comment, consensus has been reached on this proposal and the corresponding PR. Support has been expressed by
All concerns expressed have been duly addressed and they have not been further pursued. Therefore the prerequisites for acceptance are met and the 3 week think period is up; I am merging this PR. Thanks all for your contributions. |
Some final rewording agreed above (#243 (comment)) didn't get pushed to PR #244 before it was merged. |
Thank you everyone for your efforts with this! |
Title
Add new integer types to CF (From Trac ticket #166)
Moderator
@erget
Moderator Status Review [last updated: 10 Feb 2020 - @ethanrd ]
This enhancement was first suggested by @czender on the CF email list in September 2017 (here). That discussion was split and moved to two Trac tickets in Oct 2017:
After some discussion and a few minor changes, both Trac ticket 166 and 167 were, I believe, agreed upon without objection. Due to the move from Trac to GitHub the changes were not incorporated into the CF document. This GitHub issue addresses the remaining changes agreed on in Trac ticket #166.
[I will add a GitHub issue to address Trac ticket #167 shortly.]
Requirement Summary
Technical Proposal Summary
Add support in CF for the five “new” integer types supported by netCDF-4 (and now in netCDF-3 with the CDF5 encoding released in netCDF-C version 4.4.0). The five “new” integer types are:
unsigned byte
,unsigned short
,unsigned int
,int64
, andunsigned int64
.Benefits
Explicit support for unsigned integer types
Status Quo
All integer types are assumed to be signed unless one of several workarounds/conventions are followed.
Detailed Proposal
Change the first paragraph of Section 2.2 "Data Types" as follows:
See PR #244
The text was updated successfully, but these errors were encountered: