-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing a CF domain variable #301
Comments
@davidhassell -- I'm in support of this in concept and would be willing to moderate the discussion. I will review the PR in detail soon. Others, please review. Comments on detailed aspects of the PR can be in line, but please put all general discussion here. |
@dblodgett-usgs Thanks for moderating |
@davidhassell -- I left a couple comments on your PR #302 to seed some further discussion. See #302 (comment) We need to be very aware that this change will loosen / modify the field-variable-centric nature of CF. I've always seen this as a central tenant of CF that was both annoying and super useful. This addition will make a lot of confusing things possible. Not really an argument for or against, but something to keep in mind. The other thing is that I'm a little confused about how a domain variable works when it doesn't reference any dimensions. @davidhassell points to scalar coordinate variables as a case where this is valid. This does make sense but calls out the need for inclusion of examples showing how this would work. A Those are two general comments for people to consider. I do want to pose a potential change for consideration. Considering that this domain concept is new, we have an opportunity to require some things. I think it would be worth requiring all coordinate variables be declared in Regards, Dave |
Hi Dave, Thanks for the comments. I very much appreciate your having taken the time to look over it.
This is a good point, and we should be sure that the motives for introducing it are valid. It would be good to hear from some of the people whose use cases I ever so briefly mentionedabove, to provide a better picture of why this domain variable will be worth while. My personal interest is just in helping CF along - so I'm not really qualified to speak on their behalf. For visibility, I'll flag @ajelenak @AndersMS @erget @oceandatalab @dblodgett-usgs who may be able to help with this (thanks!).
I wholly appreciate this stance, and indeed originally had that in mind. However, I came round to think that, as far as possible, the mechanics of the new domain variable should be identical to the equivalent mechanics of a data variable, e.g. that coordinate variables may be omitted form, or included in, the This
These points remove the need for duplicating parts of the conventions with partial modifications and reduce the possibility of misunderstandings between two almost identical, but not quite the same, encodings. The second point also makes it much easier for developers who already deal with data variables for extracting domains. This is because they already have the machinery for decoding the domain from a data variable. I can say from experience that this is the case, having just today implemented the reading of the proposed domain variable in a branch of the Remember that compression (DSG raggered arrays, gathering) also has to be considered. By stating that "things are the same as for a data variable" we get compression "for free", in terms of documentation, on the domain variable.
Agreed. There is already an example showing a scalar coordinate variable, but not yet one with out any named dimensions. The more examples the better, I think. |
Dear David Thanks for this proposal, which I support in its current form, with some minor points:
for which would suggest
Jonathan |
Dear @davidhassell , I also support this proposal. This will be of benefit to remote sensing users and would ensure that we can implement the ongoing work discussed in cf-convention/discuss#37 in a way that is compatible with existing processing systems. |
@JonathanGregory Thanks for your thoughtful comments. Responses inline ...
Good idea
That's right.
It is already stated that "The presence of a
I like the idea of being clearer that if scalar variable has the
That's better. (Aside: We should also update the text for other similar containers. I cut-and-paste my text from grid mappings.)
OK
I see what you mean. I like your text better. We should also change Appendix A from saying "D for variables containing non-coordinate data" to "D for data variables", then.
I can only argue by counter example, here. Consider the domain of:
It would be:
|
There is some confusion here ... Are we saying that a domain variable must be a scalar? We don't insist on that for grid mapping and geometry variables (although it is recommended for grid mapping variables). If we say that for a domain variable, we should say the same for grid mapping and geometry variables - which I think would be a defect change (on the grounds that these variables were never intended to contain meaningful data arrays). Either way, this brings me back to @JonathanGregory's suggestion of disallowing
The phrase variables that are to be interpreted as data variables means that variables that are referenced by data variables (such as auxiliary coordinate variables) may indeed have a |
Dear @davidhassell , |
A couple of comments on the proposed text: Possibly it could be made clearer in the text that multiple domain variables may exist in a file. The text uses the plural form in a few places, like in the heading 5.8 Domain Variables, but mostly uses the singular form. The current CF convention document uses plural in most places when describing variables, attributes and dimensions. Also, does any particular restrictions apply when having multiple domain variables? I would assume that for a particular domain, only one domain variable is permitted? |
Dear @davidhassell |
Dear @AndersMS Thanks (and to @erget) for the reference to the subsampled coordinates issue.
That would be fine. The last sentence in the new text could changed to read: "Multiple domain variables may exist in a file, with or without other data
As @JonathanGregory says, this is no problem. I don't think that this needs special mention, as this is has always been true of all types of variables. @JonathanGregory and all - I'll update the PR next week for the various new bits of text. |
Hi @davidhassell and @JonathanGregory,
From a user point of view and for the ease of discovering the domains, it would appear attractive if:
It would also support better the use case stated by @oceandatalab for accessing coordinate variables without accessing data variables under Standard way to define subsampled coordinates #37. For other data variables, we do need to permit multiple instances for things to work, and cannot excluded that some of these are copies of each other. If there is no similar need for permitting copies of domain variables, I guess it would be cleaner and more beneficial not to permit copies. |
Hi @AndersMS, I don't see the use-case for restricting a dataset to have at most one domain variable, as datasets already can contain multiple implicit domains defined by data variables, so I think it makes sense to mirror that situation. I open to saying that a file must contains either data variables or domain variables, but never both. What do others think? |
Hi @davidhassell, It was not my intention to propose only one domain variable per file, but to suggest one domain variable per domain (so no copies) :-) So the if
and
then a user can easily search a file for all domain variables and the resulting list of domains will be complete and without copies. It would just appear convenient for discoverability. |
Hi @AndersMS, OK, I see. However, I don't see how we can enforce that any two domain variables in a dataset refer to different domains. It may be desired and of note to have multiple domains, some of which happen to be equal. We would also have to define "equal", which is another problem ... I think that this is comes down to a user community choice - for example, a project could insist that, for its outputs, a dataset containing domains must contain only one. This would similar to, say, the CMIP project which favours only one data variable per dataset - a local restriction that goes beyond CF but is useful to its users. |
Should we instead identify a domain variable by having a |
It seems to me that the presence of a |
Thanks, @JonathanGregory. OK - that's fine by me. It's a good point about redundancy. |
Thank you for the reply, I agree that it is better to keep that flexibility and withdraw my proposal regarding a single domain variable per domain. |
Hello, I wrote before:
I have since learned that there is neither a current desire nor use-case for restricting a mixture of domain and data variables, so I withdraw the suggestion. |
We are also in support of @davidhassell 's proposal. Several field variables may share the same domain (output parameters computed on the same grid for numerical model simulations, measurements and derived geophysical data acquired by an instrument in remote-sensing, etc...) but the current conventions define the domain only as an abstract concept which is implemented with attributes on the field variables: in order to identify the domains available in a file (one of our use cases), you have to analyze all the field variables available in the file, parse their attributes to extract domain-related information and then compare the extracted domains to remove duplicates. So listing domains is possible today but it sure is more involved than it should. Materializing the domain variables as proposed here would make this process a lot easier and probably result in a clearer description of the data. In the changes proposed in #302, it is stated that:
and
Does it mean that when a domain construct is part of a field construct it has to be stored exclusively via attributes (as it is done with the current conventions) or is it possible to also have a reference to a domain variable? Something like the following pseudo-CDL:
It would still be compatible with existing software because domain information remains available as attributes on the field variables, but it would also clarify the relation between a domain variable and the field variables that use this domain. |
Hi @oceandatalab, Thank you describing your use case that would be benefited by a domain variable.
I am proposing that domain variable references should not be allowed from a data variable. This is to preserve backwards compatibility and to avoid redundancy (in the senses of design principles 10 and 6). We must be careful not to confuse CF data model constructs with netCDF variables - the data model has been designed to be independent of the netCDF encoding. In the modified data mode proposed here, a field construct may contain a domain construct, but that in no way forces the netCDF representation of the field construct to contain an explicit reference to a domain variable. Thanks, |
Sorry for the confusion between field/data and construct/variable. My question was about having a reference to the domain variable in addition to the attributes that already describe the domain (implicitly) on the data variable so I am not sure how it would break backwards compatibility. I agree that it introduces some redundancy, but I would argue that is already the case when several data variables share the same domain and each of these data variables defines this very same domain implicitly with their attributes. Allowing a reference to the domain variable on data variables would add a way to check that the attributes on these data variables (that are meant to describe the same domain) are consistent with each other. |
Hi @oceandatalab OK - we're in a slightly grey area here! This is where the design principles can really help. Principle 6 says "To avoid potential inconsistency within the metadata, the conventions should minimise redundancy." and principle 10 says "... there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one)." So to minimise redundancy, we should not allow both a domain variable reference and the other data variable attributes to exist at the same time; and we shouldn't allow a domain variable variable reference anyway because we already have adequate (even if improvable) means of conveying the same information. My original comments about backwards compatibility weren't strictly right, I realise. Allowing a domain variable reference instead of the usual data variable attributes would not be a CF backward compatibility issue (though it would be a little tough on software writers), but it would fall foul of principle 10. Thanks, |
Point of order, I updated the moderator comments in the description above. |
Thanks for the summary, Dave. @oceandatalab - are you OK with not allowing a domain variable reference from a data variable? There hasn't been any comment on the changes to the text of the data model. It would be great if someone could review the suggested changes to appendix I in PR #302. All the best, |
If I'm reading the right thing, the text of Appendix I contains the statement "It is not a construct of the data model, but is an abstract concept that is useful for understanding it." That should be deleted now (since that's the whole point 😄 ) |
I'm not sure what's going on here, but were you reading the rich diff? That seems to be having intermittent difficulties in showing the modified image caption (where that text was deleted from), and also isn't showing the modified image). The side-by-side diff is OK, though, I think. |
Yes, I was reading the rich diff. That must be it, then. Jonathan
|
@davidhassell Sorry for the delay I just came back from vacation.
I think you meant that allowing a domain variable reference in addition to the usual data variable attributes would not be a CF backward compatibility issue, whereas replacing the usual attributes by a reference to a domain variable would break backward compatibility. Allowing a domain variable reference from a data variable is not strictly necessary for our use case, so I do not consider this to be a blocking point. However, I still think it should be discussed because I am not sure rule 10 applies here:
For me the reference to the domain variable does not serve the same purpose as the usual data variable attributes because this reference is meant to identify the domain uniquely, and this information is not provided by the usual attributes, so I would consider the reference as additional information, not a replacement/competitor. Being able to clearly identify the domain of a data variable, and therefore the data variables that share a domain, is definitely an operation that could be made simpler and this goal could be achieved very easily by a domain variable reference. If the reference is an issue due to its nature, then one could simply replace it by a unique identifier string, but if domain variables are available then it would be a shame not to use them for that purpose too. |
Dear @oceandatalab I think that allowing data variables to refer to the domain with a single reference instead of providing the domain information by various references on the data variable would be a drastic change to the convention. Although not backwards incompatible in the sense that it wouldn't invalidate existing conventions or data, it would require all software to be rewritten to support this different method. I think that would be a bad decision. Alllowing a domain reference in addition to the other means of describing the domain by the data variable would be redundant, and therefore potentially inconsistent, which also doesn't sound good to me. I understand your argument that you want to use the domain reference as a way to identify the domain uniquely, but I would argue that you can't really depend on that method. It will only work within a single file (within which one can depend on variable names as references) and netCDF datasets aren't necessarily contained in single files. Hence you still need to be able to decide whether domains are equal by inspecting the metadata and coordinates. You would have to be able to do that also if assembling a dataset from various sources. Best wishes Jonathan |
I never suggested to use a single reference instead of the usual data variable attributes. This is something that is only mentioned in #301 (comment) and I think it was just due to a misunderstanding or a typo. So we agree that breaking backward compatibility would be a bad idea.
Here we disagree:
I admit I have no experience with multi-file netCDF datasets so I may not fully grasp all the implications that adding the domain variable reference would have on this data structure. I quickly browsed the NcML documentation and it seems to allow the creation and modification of attributes on the variables of the multi-file dataset, so someone who wants to aggregate files from several sources could write a NcML file that correctly defines the domains and their references in the view offered by the multi-file dataset. But again, I have never worked with this kind of datasets so I may be completely wrong. Cheers, Sylvain |
Hi Sylvain,
I did indeed mean "instead of" rather than "in addition to". Allowing a domain variable reference instead of the usual attributes would neither disallow the usual attributes, nor change their meaning, so no backwards incompatibility. This is similar to the
We shouldn't allow a domain variable instead of the usual domain definition because a) there was no use case for it and b) because it would require all software to be rewritten to support this different method. Even though allowing this would make it easier, in limited circumstances, to see "by eye" if two data variables shared a domain, I don't think that is a use case on its own. These limited circumstances only arise when informally comparing multiple data variables with domain references within the same file (as opposed to the same dataset). Library software would not generally benefit from this as it has to store the constituent parts of the domain (cell measure, grid mappings, coordinates, etc) regardless of how it was encoded. If a stronger use case were to present itself in the future I would welcome this being reviewed, but suggest that for now we do not allow this. With regards the pre-existing redundancy issue, data variables are essentially independent entities. Therefore there is no redundancy if, say, two data variables have the same Anyway, I think (if I've read everything correctly) we are in agreement that a domain variable variable reference should not be used in addition to nor instead of the usual domain definition (data variable Thanks, |
Hi David,
Ok, it was confusing because no one talked about using references to domain variables instead of the usual attributes before, so I thought you were replying to my in addition question.
Agreed.
I get what you mean, but independence achieved by denormalization introduces redundancy as soon as two entities have some elements in common, and therefore makes the data prone to inconsistency issues. Even if each data variable has its own domain instance (i.e. its own set of
The idea is not to identify which definition is correct but to detect when two definitions of the same domain are incompatible or not as complete as they could. The goal is to offer a way for data producers to detect errors (multiple definitions of a single domain that are not compatible with each other) and consistency issues (when two variables share the same domain but one of them only provides a minimal definition while the other has a detailed description), therefore the means to improve the overall quality of the files they generate before these files are distributed to end users. But again, it was just a suggestion for a small improvement of the proposal, it is absolutely not a blocking point for us. Cheers, Sylvain |
Thanks for all of the discussion. I understand (from these comments and off-line conversations) that there are no objections to the pull request as it stands. @dblodgett-usgs would you agree? It would be still be good to get some comment here on the data model changes. Many thanks, |
I agree @davidhassell and I don't think the subsequent conversation warrants any further summary above. Thanks for the good conversation all. |
I've had a look at the latest draft and still support this proposal. The changes are mostly straightforward as they enshrine as a construct what was until now a concept that has served the community well. Thank you @davidhassell for the painstaking work here, I believe this will be a benefit to the community. |
* added example 6.1.2 to the list of examples; fixed cf-convention#284 * updated changes in history.adoc * removed fourth lines of third table in sect 9.3.1; fixed cf-convention#288 * updated history * Bring conformance doc in line with clarification to use of region names/area_types to allow use of flag_values and flag_meanings as per discussion in cf-convention#198 * Add support for variables of type string to conformance doc. See issue cf-conventions#139 * Revert "Bring conformance doc in line with clarification to use of region names/area_types to allow use of flag_values and flag_meanings as per discussion in cf-convention#198" This reverts commit f754457. * first draft of section 5.8 * format typo * rewording * rewording * rewording * New 'Do' Use value, and 'dimensions' entry * Domain construct * rewording * rewording * rewording * formatting of computed_standard_name entry * rewording * rewording * rewording * top-level * rewording * move fig 3 * rewording * span * rewording * data * rewording * rewording * rewording * conformance * recommended attributes * typo * dimensions * dimensions * format * typo * domain independence * domain optional * format * format * format * format * empty dimensions * long_name * UML * Update ch01.adoc * Update history.adoc * Add static assets to HTML check build * Add static assets to Travis upload job * Fix order of i/j in lon/lat bnds figure correct indices of neighbour cells in @d case * update/correct order of indices i/j in Fig 2 (2D lon/lat bounds) * update/correct order of indices i/j in caption of Fig 2 * rename "figure 1" to "figure 3" in Appendix i * correct indices of neighbour cells in @d case * update history Figures are generated from: https://github.com/neumannd/cell_bounds_figures_for_cf_conventions * updates arising from cf-convention#301 up to 2020-09-28 * correct label for 1.2 * format correction * reword empty dimensions example * comma * example links * long_name * formatting * missing 'construct' * term units * term units * standard names * typo * units conformance requirement * remove requirement for identical units * Copyedit * fixed typos * History * more text following 2020-11-27 discussions * bounds * tidy * tidy * tidy * tidy * reproducability * offset * indices * indices * indices * super * tie_point_dimension (1) * tie_point_dimension (2) * tie_point_dimension (3) * tie_point_dimension (4) * tie point * tie_point_dimension (5) * corrected interpolation_configuration description * zone/area rewording * zone/area rewording * multiple mappings * multiple mappings * multiple mappings * typos and some minor rewording suggestions * format * spell check * markup style * example formatting * example formatting * example formatting * example formatting * minor typesetting * interpolation_parameters * interpolation parameters variable dimensions * interpolation parameters variable dimensions * non-standard provision * interpolation parameters variable dimensions * captions, cdl * tidy * minumum size of interpolation zones * Appendix A attributes * interpolation -> sampling * Conformance - first draft * 2nd draft: better descriptions of allowed dimensions * typos * Correct 'is list' to 'is a list' * history cf-convention#304 * check on interpolation zone dimension size * Clarification of the handling of leap seconds This is the suggested initial wording from cf-convention#313 as authored by @JonathanGregory. * leap seconds: added the word "count" in some places The purpose of this change is to slightly highlight the difference between when seconds are used within the coordinate value for counting and the seconds which are part of the date-time. * leap seconds: minor wording extension * leap seconds: added reference to cf-convention#313 to history.adoc * add myself to the end of the list of additional authors * leap seconds: updated conformance text This change excludes values larger or equal to 60 for seconds in reference date-times in time unit attributes. Additionally, the reference time has been changed to reference date-time to agree with the wording in the proposed conventions text. * leap seconds: small rewording as discussed with @JonathanGregory Reasoning: counting may be associated with integral numbers, which is was not intended. We still like the idea of a little more separation between seconds as a unit of the value and seconds as in the date-times. * replace date-time with date/time * conformance changes for new interpolation variable * conformance changes for new interpolation variable * conformance changes for new interpolation variable * conformance changes for new interpolation variable * appendix A changes for new interpolation variable * appendix A changes for new interpolation variable * lat lon tie point definition * spelling * URI -> URL * lower resolution -> sampled * Use on domain variable * typo * Move 'interpolation dimension' definition to first occurence * Minor re-wording * Fix cross-reference * Re-wording * typesetting * tie point index re-wording * Rotation of interpolation axes for two dimensional methods and mino corrections * terminology: interpolation variable and tie point variable * typo * examples in toc * Replace expression for gsqr with equivalent, but numerically more accurate version * Update authors * Update history * Rename attribute tie_points to coordinate_interpolation (Change 2) * Reword section Interpolation and Non-Interpolation Dimensions (Cahnge 10) * Rename tie_point_dimensions attribute to tie_point_mapping (Change 2) * Change term 'tie point variable' to 'tie point coordinate variable' (Change 4) * Reword first paragraph of Section 8 (Change 6) * Remove sentence "This form of compression may also be..." (Change 7) * Update sentence: "A single interpolation dimension may be associated..." (Change 9) * Reword section "Interpolation and non-interpolation dimension" (Change 10) * Improve sentence "An interpolation zone must span at least two points..." (Change 11) * Correct sentence "....must be a subset of zero or more of the ..." (Change 12) * Reword text about the dimensions of interpolation parameter (Change 13) * Improve sentence "The bounds of a tie point must be the same..." (Change 14) * Reduce number of data variables in Example 8.5 (Change 16) * Rename "terms to continuous area" and "interpolation subarea" (Change 5) * Improve wording of "An interpolation subarea must span..." (Change 11) * Remove paragraph "The same interpolation variable may be multiply mapped ...." no longer relevant * Rename terms to: subsampled dimension, interpolated dimension and non-interpolated dimension * Combine the tie_point_dimensions and tie_point_indices attributes (Change 1) * Update figures to match new terms * Improve description of non-overlapping interpolation subareas * Improve description of non-overlapping interpolation subareas * Update Example 8.6 to correctly specify one dimension interpolation for X and Y * Improve wording of Tie Point Index Mapping (Change 8) * Clarify interpolation subarea size * Clarify dimensions in Figure 2 * Add new section 8.3.9, "Computational Precision" * Combine the tie_point_dimensions and tie_point_indices attributes (Change 1) * Remove paragraph "A single interpolated dimension may be associated with multiple ...." no longer relevant * Update ch08.adoc Co-authored-by: David Hassell <[email protected]> * Update ch08.adoc Co-authored-by: David Hassell <[email protected]> * Update ch08.adoc Co-authored-by: David Hassell <[email protected]> * Update ch08.adoc Co-authored-by: David Hassell <[email protected]> * Change sampl... to subsampl... * Rewrite section Interpolation of Cell Boundaries (Change 15) * Constrain interpolation parameters to support bounds interpolation * Update <<link>> names and figure names to new terms * Require tie points to be numeric type and have no missing values * Update Appendix J with new terms and names * Correct spelling mistake in Appendix J * Correct numbering mistake in Appendix J * Change "iz" (interpolation zone) to "is" (interpolation subarea) in App J (Change 3) * Correct "target dimension" to "interpolated dimension" (Change 17) * Introduce section numbering and remove table captions in Appendix J * Include interpolation argument s in figure 1 and 2 * Move Figure 1 and 2 in Appendix J futher down * State tht for linear interpolation, the coordinates of the interpolated points are evenly spaced. * Change "equivalently" to "similarly" in explanation of s1 and s2 in App J * Rename cofficeint "c" to "w" in Appendix J to avoid confusion with point C * Move "Common Conversions and Formulas" in front of "Interpolation Methods" in Appendix J * Add "s" to "each of the interpolated dimension" in Appendix J * Minor wording improvements arising from review * Conformance for bounds tie points * computational_precision conformance Co-authored-by: Daniel Neumann <[email protected]> Co-authored-by: Rosalyn Hatcher <[email protected]> Co-authored-by: JonathanGregory <[email protected]> Co-authored-by: Daniel Lee <[email protected]> Co-authored-by: Daniel Lee <[email protected]> Co-authored-by: David Blodgett <[email protected]> Co-authored-by: AndersMS <[email protected]> Co-authored-by: Tobias Kölling <[email protected]> Co-authored-by: Tobias Kölling <[email protected]>
Introducing a CF domain variable
Moderator
@dblodgett-usgs
Moderator Status Review [last updated: 2020-10-15]
The proposal has been submitted and preliminarily reviewed by the moderator. Attention should be called to the potential for this proposal to subtly but fundamentally alter how CF-NetCDF data fields and domains are treated. Review from authors of CF-NetCDF client software is necessary here.
domain
variable: bycf_role: domain
or by presence of adimensions: "X Y Z ..."
attribute. Presence of adimensions
attribute has won out for it's lack of redundancy."Coordinate systems and domain"
domain: domain_variable
attribute on a data variable was suggested. Adding it would introduce redundancy and seems to be the wrong path.As of 10-15-2020, discussion is slow but ongoing. I will check back in around the beginning of November.
Requirement Summary
The concept of a domain that describes data locations and cell properties is not currently mentioned in the CF conventions, because it does not correspond to any single entity in the netCDF file. Instead, the domain is stored implicitly in a number of other variables and attributes that are linked to the data variable in various ways defined by the conventions.
The domain is, however, well defined in the CF data model as an abstract concept (as opposed to a data model construct) that provides the linkage between the field construct and the metadata constructs that describe the relevant data locations and cell properties. There is currently no "domain construct" in the data model, since there is no corresponding CF-netCDF entity.
There is a need to be able to describe a domain independently of any data variables, which is currently not possible. Use cases include:
Curated data streaming services for which it is impractical to send very large domain descriptions with every file.
Storing time-dependent coordinates from remote sensing applications.
Storing geometries without any timeseries data.
For such use cases, it is not satisfactory to try to locate an appropriate multidimensional data variable that describes the required domain, nor to create a dummy data variable for this purpose, which has no physical meaning.
Therefore, the inclusion of CF-netCDF domain variables that can encode a domain independently of any data, and a corresponding data model domain construct, will enhance CF by meeting these use cases.
Technical Proposal Summary
NetCDF encoding
A new "domain variable" will be introduced that is of arbitrary type since it contains no data. This variable will act as a container to bind together other variables that collectively define a domain, in a similar manner to how a data variable performs the same task.
It will support the same CF attributes as are allowed on the data variable for describing a domain, with exactly the same meanings and syntaxes:
cell_measures
,coordinates
,geometry
, andgrid_mapping
. These will be indicated as domain variable attributes by the additional "Do" indicator (short for Domain) in the "Use" column of Appendix A: Attributes.Any future CF attributes that a data variable may use to describe its domain will be similarly transferred to the domain variable, meaning that keeping the domain variable up to date with other enhancements will be a well defined and easy task.
There is no mechanism for referencing a domain variable from a data variable, i.e. a data variable must still encode its domain in the current, implicit manner. This is to preserve backwards compatibility with all existing software libraries that understand the current structure of a data variable; and to reduce redundancy or incompatibility issues that may arise if a data variable encodes its own domain and references a domain variable.
A domain variable may exist in a file with or without other data variables.
Data model
The domain in the data model will be transformed from an abstract concept into a "top-level" construct, i.e. one that can exist in the absence of any other constructs. Currently, the field construct (corresponding to a CF-netCDF data variable) is the only top-level construct.
The new domain construct will replace the current domain concept, replicating it every in every way apart from that it will be related to the field construct via an aggregation relationship, rather than by the current composition relationship of the abstract domain concept. This makes it clear that the domain construct can exist independently from the field construct.
It is of no consequence to the data model that a CF-netCDF data variable will not be able to explicitly reference a CF-netCDF domain variable. That is an encoding choice that does not affect the logical structure.
Location in the conventions document
The domain variable will be described in a new section: 5.8 Domain Variables
The following appendices will updated:
Appendix A: Attributes
Appendix I: The CF data model
CF Conformance Requirements and Recommendations
Benefits
All those who meet the use cases described in the Requirements summary will benefit from the new domain variable.
Status Quo
At present, a domain can only be encoded implicitly via a data variable, leading to ambiguities when retrieving a domain from a dataset.
Associated pull request
#302
Detailed Proposal
Conventions text has been proposed in chapter 5, appendices A and I, and the conformance document in pull request #302
The text was updated successfully, but these errors were encountered: