-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify what action to take when unpacking an int #374
Comments
Dear Ken @mankoff Thanks for raising this. I suppose the concern is that an I'm changing the label of this issue to Cheers Jonathan |
This sentence is about unpacking. Elsewhere packing says that packed data should be type |
Does anyone know what it's getting at, then? |
I would suggest re-writing section 8 from having one subsection Currently under
I would rewrite this
The last two sentences:
Should be in a section on |
Dear Ken @mankoff The preceding text says, "If the You may pack Then comes the sentence you commented on, "It is not advised to unpack an You may pack That is, it's not allowed to pack If the data was written (erroneously) as The text seems to allow the possibility that Best wishes Jonathan |
Coming from #427. I agree to rephrase as suggested, but with the enhancement (bold italics).
|
Dear Kai @kmuehlbauer Yes, you're right, we should mention the unsigned types, to accord with issue 427, tthat 8.1 should be consistent with the list of data types in 2.2. I propose that we should replace most of the second paragraph of 8.1, namely the following text:
with this text:
This text does not permit data to be packed into the same data type. The existing text seems to permit it, but why would you do it? If anyone has a use-case, we should consider it. Like the existing text, the new text does not permit integer data to be packed. However, if existing data is presented which does either of these things, it can still be unpacked with the rules given, so the new text does not invalidate any existing data. Is this OK? Best wishes Jonathan |
Dear Jonathan @JonathanGregory, thanks for the quick reply. I have no use-case for packing integer into integer or floating point into floating point either. But I have seen almost any weird combination of differing dtypes of scale_factor, add_offset, _FillValue etc. So it's good to have at guide to decode mismatched files. @mankoff You have already been deep into this matter, would this new text apply to your use-cases too? |
Yes, this text is more clear. I've changed jobs and am no longer working on the same issue however - I'm losing track of the depth I had on this last year. @JonathanGregory I apologize for not answering your #374 (comment) from 9 Aug 2022. I think the response there is that if you find it "quite hard to understand", what hope to the rest of us have? I agree with your suggested change to the packing rules. That is, to me, clear and specific. As for the more recent comment, your suggested paragraph on packing reads well. I suggest a change to unpacking. Currently,
According to the packing guidelines
So (if properly packed), you cannot have packed data as If we are adding recommendations about how to treat packed data that does not follow the spec, I suggest 1) a new paragraph (possibly a new section) and 2) there may be more scenarios than just this one described here, so we should strive to be more complete, and possibly more generic, in our recommended error handling. Maybe something about largest data type of the unpacked data and then increase by one data type or /n/ bits. |
Sorry to not be more explicit. This means everything including and after the |
Dear Ken @mankoff and Kai @kmuehlbauer Thanks for your comments. I think Ken is right to prefer "must", and I believe we all agree on
I included the further sentence, about how to unpack We have a choice. We could say nothing at all about how to deal with non-conforming data, or we could try to be complete. It would be fine to say nothing, insofar as CF is a convention for writing netCDF data and interpreting CF-conforming netCDF data, but it's not so helpful and may lead to questions like Ken's. We could work out comprehensive guidance, but is that appropriate for the CF standard? What do you think, Ken and Kai, and anyone else? If we don't want to give detailed guidance, maybe we could just say, "We suggest that packed data which does not conform to the rules of this section regarding the types of the data variable and attributes should be unpacked to Cheers Jonathan |
I agree that giving guidance for non-conformant data is a good thing. Being complete and specific may be too high effort at this point. It is easier to be complete if we offer more general and vague advice. I agree with your suggested sentence:
There may be more efficient ways to unpack non-conformant data, but the advice above is simple and implementable by anyone who implements an API. |
Dear all I have created this pull request to implement the changes shown in #374 (comment) and #374 (comment) in Section 8.1, and accordingly in the conformance document. In the conformance document I have deleted the single recommendation, which was the reason for this issue being raised. The new rules are more explicit and we have inserted a sentence of guidance about unpacking non-conformant data. Because this is a defect ticket and it is much more than three weeks since the changes were agreed, I believe this change can be implemented immediately. Please could someone check that I have got the pull request right and merge it if so? Thanks. In view of their valuable work on this issue, Ken Mankoff and Kai Mühlbauer should be added to the list of contributors to the CF conventions. Thanks to both. Jonathan |
As stated on the PR #456 I think this is a good change. Thank you. |
With this strong support from the original enhancement proposers I am happy to merge this PR. Many thanks to all involved! |
In Chapter 8. Reduction of Dataset Size section 8.1. Packed Data on line 18 https://github.com/cf-convention/cf-conventions/blob/main/ch08.adoc#L18 it states,
This is a warning about what not to do, but does not provide a suggestion of what should be done. Does this mean unpacked data should be type int? Or type double? I suggest re-wording to provide positive advice (if there is a simple thing that should be done), rather than negative advice. The 'not advised' is appropriate language if there are many possible correct actions that could be taken.
If someone can offer clarity, I will issue a pull request.
The text was updated successfully, but these errors were encountered: