Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify Geometry Dimension on Geometry Container #155

Closed
twhiteaker opened this issue Mar 1, 2019 · 21 comments
Closed

Specify Geometry Dimension on Geometry Container #155

twhiteaker opened this issue Mar 1, 2019 · 21 comments
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@twhiteaker
Copy link
Contributor

Title: Specify Geometry Dimension on Geometry Container

Moderator: @davidhassell

Requirement Summary:
This is in regard to geometries in Chapter 7. For single part point geometries, the node_count variable is omitted since it is not needed. However, node_count is dimensioned using the instance dimension of the geometries, so without node_count, we don't know the geometry instance dimension.

Example of ambiguous geometry instance dimension (could be foo or instance):

dimensions:
    foo = 3 ;
    instance = 3 ;
    node = 3 ;
variables:
    int foo(foo);
        time:long_name = "bar" ;
    int geometry_container ;
        geometry_container:geometry_type = "point" ;
        geometry_container:node_coordinates = "x y z" ;
    double x(node) ;
        x:units = "degrees_east" ;
        x:standard_name = "longitude" ;
        x:axis = "X" ;
    double y(node) ;
        y:units = "degrees_north" ;
        y:standard_name = "latitude" ;
        y:axis = "Y" ;
    double z(node) ;
        z:units = "m" ;
        z:standard_name = "altitude" ;
        z:axis = "Z" ;
    double someData(instance, foo) ;
        someData:geometry = "geometry_container" ;

Technical Proposal Summary:
Add a geometry_dimension attribute to geometry container, which indicates the geometry dimension, as in:

dimensions:
    foo = 3 ;
    instance = 3 ;
    node = 3 ;
variables:
    int foo(foo);
        time:long_name = "bar" ;
    int geometry_container ;
        geometry_container:geometry_type = "point" ;
        geometry_container:node_coordinates = "x y z" ;
        geometry_container:geometry_dimension = "instance" ;
    double x(node) ;
        x:units = "degrees_east" ;
        x:standard_name = "longitude" ;
        x:axis = "X" ;
    double y(node) ;
        y:units = "degrees_north" ;
        y:standard_name = "latitude" ;
        y:axis = "Y" ;
    double z(node) ;
        z:units = "m" ;
        z:standard_name = "altitude" ;
        z:axis = "Z" ;
    double someData(instance, foo) ;
        someData:geometry = "geometry_container" ;

Benefits: Who or what will benefit from this proposal?
Software developers can now properly parse the file; they know to which dimension the x, y, and z nodes apply.

Status Quo: Discussion of the current state CF and other standards.
CF already has an instance_dimension, but the existence of that attribute normally identifies the variable as a DSG index variable. Therefore, we use geometry_dimension.

Detailed Proposal: Complete proposal
Require a geometry_dimension attribute on geometry container, which identifies the instance dimension of the geometries. This will require updating Ch7 text and examples.

@twhiteaker
Copy link
Contributor Author

geometry_dimension could be optional for geometries other than single point geometries, but I suggest we require it in all cases to be consistent, and since an attribute doesn't take up much space. If we do that, we should probably state that the geometry_dimension value must match the dimension used for the node_count variable if node_count is present.

@twhiteaker
Copy link
Contributor Author

Conformance edits:
twhiteaker/Conformance@9098401

Ch7 edits:
twhiteaker@ca41968

@davidhassell
Copy link
Contributor

Making "geometry_dimension" mandatory is fine with me - it simplifies the conventions (fewer "if X then Y" clauses) and is easy to check for consistency in library code.

@davidhassell davidhassell added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Mar 12, 2019
@twhiteaker
Copy link
Contributor Author

It's better to see the pull requests for the latest edits.

#158

cf-convention/Conformance#6

@JonathanGregory
Copy link
Contributor

In the case of point geometry, isn't it necessarily the case that the instance dimension equals the number of nodes (both are 3 in your example)? If so, you don't really need the geometry_dimension attribute, because you can get the size by examining the dimension of any of the node coordinate variable. In fact, it doesn't have to be a different dimension anyway; you could drop the instance dimension and put someData(node,foo), couldn't you? If you do wish to introduce the geometry_dimension attribute, perhaps because it would be convenient, there should be a conformance rule that it must equal the node dimension in the case of point geometry, if I have got this right. Jonathan

@davidhassell
Copy link
Contributor

The key to this that it is the node count variable, rather than the node coordinate variable, that shares its dimension with the data variable. In the special case that there is only one node point per cell, the node count variable may be omitted, leaving us in the situation described at the top of issue, for which you can't know if x and y apply to the foo or instance dimensions.

I don't like using node as a data dimension in this special case because it only works in this one special case; and it really refers to the size of a ragged array, and not a (CF data model) domain axis.

I agree that there should be something in the conformance - but it should be that that the geometry_dimension should be equal to the node count dimension (if it exists), and also a dimension of the data variable.

@twhiteaker
Copy link
Contributor Author

That conformance update is included in this pull request with these bullets:

  • The geometry container variable must hold geometry_type, geometry_dimension, and node_coordinates attributes.
  • The geometry_dimension attribute must name a dimension that is shared by the data variable. This dimension represents the total number of geometries.
  • The variable indicated by the node_count attribute must use the same dimension as specified by the geometry_dimension attribute.

@JonathanGregory
Copy link
Contributor

The proposed geometry_dimension attribute is only needed in the special case of single-part point geometries, because the node count variable can be omitted. In other cases, the instance dimension is the dimension of the node count variable, isn't it. If the node count variable were included in this special case, it would be all ones, and that seems like a waste of space. However, defining a new attribute solely to name the dimension of the omitted node count variable feels a bit complicated to me, especially if you require it to appear in all the other cases, which are more common, when it is redundant. It would be simpler (requiring no new convention) to require the node count variable to be included in all cases, even though it's obvious in the special case.

I understand that you want to maintain two dimensions (instance and node), which should be required to be equal by the conformance document in this case, because one relates to the data model and the other to storage. But since you know they must be equal in this case, you can find out the size of the instance dimension by inspecting the node dimension of any of the node coordinate variables named by the geometry container. That needs no new machinery, just an if statement. An if statement is already needed to deal with the missing node count variable, so it's hardly more complex to process, I would say.

@davidhassell
Copy link
Contributor

I would support making the node count variable mandatory (and thus not having the geometry_dimension variable), but I don't have strong feelings about it: if it is a real convenience and expectation of users for it to be omited it if it just contains ones, then that's OK with me. This seems, in principle, a bit like our good friends scalar coordinate variables. @twhiteaker can you give any use cases when this is useful?

I don't, I'm afraid, see how just knowing the size of the instance dimension is necessarily enough. In the top example, the data variable has two dimensions with the same size as the point node coordinate variable. We still need to know which of these two dimension to attach the code coordinates. This, I reckon, either requires a node count variable or a geometry_dimension attribute.

@twhiteaker
Copy link
Contributor Author

The use case for omission of the node_count variable is when your dataset has millions of instances and you're trying to keep the footprint on disk small. It could also help from a programmatic standpoint, as some GIS implementations treat points and multipoints as separate geometry types, and so knowing the geometry type ahead of time by noticing the absence of a node_count variable can speed up importing geometries from netCDF into an Esri geodatabase point feature class for example. Otherwise, you'd have to test whether every node_count value is 1, which may be fast anyway; I haven't tested this.

Taking a step back, I suspect actual usage of the simple geometry specification for single point features will be very rare, since there's already the discrete sampling geometry specification which can accommodate points and is already in use by the community as an archiving standard, a la NCEI's netCDF templates. So whatever we decide, I don't think we should bend over backwards for a use case that may never be realized.

@JonathanGregory
Copy link
Contributor

I agree that the use-case for omitting the node_count variable is persuasive.

Maybe I've misunderstood the top example. It has three dimensions which are all equal to 3. foo could be different, but instance and node must be equal, mustn't they? Each instance is a single-point geometry, and hence has only one node. I now realise that the problem is you don't know which dimension of someData is the geometry dimension. If, in the case of geometry_type="point", we had only one dimension instead of the separate instance and node dimensions, since they must be equal, you would look at node_coordinates, take any of the variables listed, and look up its dimension, which would be the instance dimension. David argues against making the two dimensions identical because they have different logical purposes. I understand that argument, but is it really necessary? I mean, what logical problem would be caused by using the same dimension for both purposes?

@twhiteaker
Copy link
Contributor Author

@davidhassell Any comments on what @JonathanGregory said about:

what logical problem would be caused by using the same dimension for both purposes?

@davidhassell
Copy link
Contributor

davidhassell commented May 14, 2019

If I understand correctly, @JonathanGregory is proposing that when the node_count variable is omitted, the dimension of the node coordinates must be the same as the data variable dimension to which they apply?

E.g.

dimensions:
    foo = 3 ;
    instance = 3 ;
variables:
    int foo(foo);
        time:long_name = "bar" ;
    int geometry_container ;
        geometry_container:geometry_type = "point" ;
        geometry_container:node_coordinates = "x y z" ;
    double x(instance) ;
        x:units = "degrees_east" ;
        x:standard_name = "longitude" ;
        x:axis = "X" ;
    double y(instance) ;
        y:units = "degrees_north" ;
        y:standard_name = "latitude" ;
        y:axis = "Y" ;
    double z(instance) ;
        z:units = "m" ;
        z:standard_name = "altitude" ;
        z:axis = "Z" ;
    double someData(instance, foo) ;
        someData:geometry = "geometry_container" ;

If we make that a rule for the conventions, I'm happy with this.

@davidhassell
Copy link
Contributor

The logical issue I had was, I think, misplaced. We are discussing the encoding here, and if it is clear that these variable share a dimension for convienience, removing the need for a new attribute, then that is fine.

@JonathanGregory
Copy link
Contributor

Yes, that's what I meant. Thanks. You're right that it should be stated as a rule: for geometry_type="point", the dimension of the node coordinate variables must be one of the dimensions of the data variable (because it serves also as the instance dimension).

twhiteaker added a commit to twhiteaker/cf-conventions that referenced this issue May 15, 2019
When node_count attribute is missing, require the dimension of the node coordinate variables to be one of the dimensions of the data variable.
@twhiteaker
Copy link
Contributor Author

I've made that change to my pull request. I reverted the change about requiring a geometry dimension attribute, and revised the following paragraph to include @JonathanGregory 's sentence.

When more than one geometry instance is present, the geometry container variable must have a node_count attribute that contains the name of a variable indicating the count of nodes per geometry. The node count is the total number of nodes in all the parts. The exception is when all geometries are single part point geometries, in which case a node count is not needed since each geometry contains a single node. However in that case, the dimension of the node coordinate variables must be one of the dimensions of the data variable (because it serves also as the instance dimension for geometries).

@davidhassell
Copy link
Contributor

Thanks, Tim. I am happy with all of this.

@twhiteaker
Copy link
Contributor Author

Great! According to the contributing guidelines:

Final review should take place in the pull request and the issue closed when the pull request is merged.

If "I am happy with all of this" counts as the final review, then is it merging time?

@wchen329
Copy link

Hello all,

While I don't want to be "that guy" who stands afar nudging everybody to hastily complete things, I just want to state that in my implementation for CF-1.8 geometries for OGR, corresponding with this pull request I decided to enforce the presence of the geometry_dimension attribute within geometry containers and error otherwise, so it'd be nice if we could get this thing pulled up soon to avoid confusion.

@twhiteaker
Copy link
Contributor Author

@wchen329 we shifted away from requiring geometry_dimension in favor of this rule, since this rule doesn't require a new attribute:

When more than one geometry instance is present, the geometry container variable must have a node_count attribute that contains the name of a variable indicating the count of nodes per geometry. The node count is the total number of nodes in all the parts. The exception is when all geometries are single part point geometries, in which case a node count is not needed since each geometry contains a single node. However in that case, the dimension of the node coordinate variables must be one of the dimensions of the data variable (because it serves also as the instance dimension for geometries).

@wchen329
Copy link

wchen329 commented May 31, 2019

@twhiteaker Ah rats, well ever the more reason why we should get this thing pulled up :) thanks for the clarification @twhiteaker I'll make sure to make this is the case in OGR.

erget pushed a commit that referenced this issue Dec 27, 2019
When node_count attribute is missing, require the dimension of the node coordinate variables to be one of the dimensions of the data variable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

4 participants