-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the schema file in all versions of the XML standard name file #470
Comments
The changes outlined above can (will) be implemented in all published versions of the standard name XML file by a simple python program. In a comment @DocOtak suggested that the alias elements should be sorted in alphabetical order according to the aliased standard name. I think this is a good idea that should be easy to implement in the python code. |
Thanks for finding these mistakes. Actually I think you could regard all these as defects, which means correcting them could be treated as a defect issue, though sorting the entries alphabetically would be an enhancement. Are the standard names with no canonical units stated all string-valued quantities, I wonder? Empty string is fine to give in the |
Do any versions of the |
To answer your last comment first: yes, all XML files should get the new schema link. In fact this will happen in this issue, or in the associated PR. Irrespective of whether there actually is a unit specified or not, the tag |
I think it's correct to leave put the null string in the canonical units in the XML file for string-valued quantities. For dimensionless numerical quantities, we should put Do you think I'm right that we need to put some text in sect 3 about |
Regarding cf-convention#469: Just to test the workflow the current XSD link in XML files points to my repo.
This time I have not thought too much (at all) about the actual units as such because that is not something the XML syntax or XSD schema have influence over, which is what string of issues/PRs deals with. But I do agree that once these fundamental aspects are sorted then we could/should have a closer look at the units and other aspects that are related to the CF compliance as such. |
Regarding cf-convention#469: Just to test the workflow the current XSD link in XML files points to my repo.
Regarding cf-convention#469: Just to test the workflow the current XSD link in XML files points to my repo.
I am not sure how to do this: I my fork there is a branch/subdirectoy that contains the python code for actually injecting into all versions of the XML files the changes detailed in this issue, and the preceding ones. When running the codes the original xml files are kept (as Moreover, there are log files detailing the changes made by each step. But the XML files are not in the branch, because of size considerations. But I have spent some time trying to establish that the changes are as intended and do not corrupt some element, but this is not yet conclusive. Just to get things working, the branch includes the changes suggested in previous issues/PRs. But I am not sure how to proceed from here. Should a PR include the codes and other details in the subdirectory linked above? Both the processed files and the original ("*_SAVED") versions are useful for verification, but that doubles the size. I should also say that I have done the final step by creating new html files, see "next issue" in this string of issues. Finally, as Andrew @DocOtak suggested there is the option to sort both the standard name entries and the alias entries, hampers the possibility to compare the old and the new files. But it would be useful as final step, because in particular the aliases are in some more or less random order now. |
Dear @larsbarring I think the PR should replace all the xml and html files in the repo with the new versions. The size of the repo is not a problem; the 1 Gbyte limit refers to total space that the files take up on the website. If I understand correctly, you would be replacing all the xml and html files that appear on the website, but not increasing the number of them. It would also be useful to put the scripts into the repo, for the record. I agree that sorting the entries, as @DocOtak suggested, is a good idea, but that could be done as a subsequent enhancement. There's no need to sort all the past versions, is there? Maybe there could be a future release of the table which did not change the entries, just put them in order, as a separate step. @DocOtak also demonstrated how to tag all the versions so that they did not have to be kept on the website as static files. I think this works well for the xml, but GitHub doesn't render the html upon retrieving it. Hence I think we could adopt this approach for xml, which will save a bit les than half the space per release, but we will need to keep the html files on the website. Again, changing the way it's stored should be a subsequent enhancement, I think. It could be done at the same time as moving the standard name table to its own Best wishes Jonathan |
Yes, the PR will replace the existing xml and html files. When all the issues/PRs leading up to this one, I will do a more careful check that something odd is not happening. I have a fair idea how these checks can be done, but the details are for later. It is here where the eating of that proverbial pudding will happen, and the suitability and correctness of the previous string of issues/PRs will prove their worth. While I do not think so, or have any reason think so, there is always the possibility that something surfaces that requires changes to the previous steps. I agree putting the script in the repo (with the caveat that is is not a nice "self-installing" python environment...). Regarding sorting I believe Andrew's @DocOtak's argument that when it is sorted it is easier to create I agree that the approach Andrew demonstrated in the discussion tread is promising. I will come back to this when I have made a bit of more progress on this issue here. Kind regards, |
Now when the string of preparatory issues (see table in #457) have been completed, much thanks to @sadielbartholomew's quick input and skilful reviewing of PRs, it is time to activate this issue. I will in the coming few days post overview tables of various "technical/formal" issues in the different versions of the standard name table XML file. |
There is now updated information available in a comment below
|
There is now updated information available in a comment below
|
All the format changes leading up to this issue has successfully implemented in newly published version 85 of the standard name table . Excellent work @japamment, @efisher008, @feggleton! :-)) |
That's great, and thank you @larsbarring as well. Jonathan |
Given the recent resolution of how to deal with standard names having a spurious space, I have updated my branch that includes some python code to implement the changes to the old versions of the table. Consequently, I have updated a couple of earlier comments in this thread. While the changes to the published tables should be made by @japamment and @efisher008 to keep the CEDA Vocabulary Editor in sync, this repo includes log files and error summaries that could help in this work. Here are two pointers to help navigating the repo:
After the changes have been implemented only very few formal XML errors remain (see here), and these needs to be investigated in more detail. |
This is one in a string of issues that aims to improve the format of the XML version of the standard name table files, see #457 for background and overview.
This particular issue implements the changes introduced by the following issues (and associated PRs):
#500 Standard names: Add "Conventions" string to the standard name xml table header
#509 In exceptional cases allow a standard name to be aliased into two alternatives
#511 Appendix B: New element in XML file header to record the "first published date"
#516 Update the XML format specification in Appendix B to provide a robust link to the XML schema file
By implementing a proper connection between the XML file and its corresponding original XSD file it was easy to pinpoint a few formal XML errors that are easy to correct, and will remain also with the updated schema file. As these errors in no way influence the material content related to the standard names and their definition etc. I suggest that they are corrected These are:
Version 1:
<last_modified>
DateTime is missing, and is not defined in schema file version 1.0 Add this informationVersion 71:
<last_modified>
DateTime string is malformed: time component of the string is missing. Add this informationVersion 12: Exact duplicate of standard name entry
sea_surface_height_above_reference_ellipsoid
. Remove duplicate entryVersions 17 -- 22: Several standard name entries lack required tag
<description>
. Add empy tagsVersions 20 -- 26: One or several standard name entries lack required tag
<canoncal_unit>
. Add empty tagsThe text was updated successfully, but these errors were encountered: