-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recording deployment positions #428
Comments
Dear Fernando @fmanzano-pde Chapter 9 about DSGs contains a provision for recording nominal position as well as the actual, in Section 9.5 and example H5 (on p159 in the working version - it says "example A9.2.3.2" in Sect 9.5 - I don't know why). Would this serve your purpose? Best wishes Jonathan |
Dear @JonathanGregory |
Dear Fer @fmanzano-pde I see, thanks for explaining. (I don't know why my page number is different - it's probably better not to rely on them!) While I understand your design, it isn't consistent with CF practice, in which date-times are always numbers stored in variables, not text in attributes. If you regard it as "discovery" metadata following ACDD practice, that's fine, but not in the realm of CF, which is about "use" metadata. On the other hand, you could add another latitude,longitude pair of auxiliary coordinate variables with the Best wishes Jonathan |
Dear Jonathan @JonathanGregory Thank you very much for your answer. I understand what you mean about "use" metadata. My proposal tried to be aligned with the "history" CF global attribute, that also contains date-times, but you are right: the scope of "history" and "deployment_positions" (the global attribute I proposed) is completely different. We've already though about these two extra auxiliary coordinate variables (nominal_lat / nominal_lon) you mentioned, but, as you said, we don't like having so many repeated values as a deployment happens very occasionally. The alternative of having many missing values is not valid either because in my opinion it is conceptually misleading. So, in conclusion, I'll to take the ACDD's way :-) Thank you very much for your help and clarification! All the best, |
Dear Fer @fmanzano-pde The contents of the If your deployment position is really for discovery and not use, then ACDD is the right approach. If you want to include it in CF, another possibility has occurred to me. We could regard the deployment position as a kind of bounds variable for the nominal position. Of course, it's not literally "bounds", but it is similar in that it specifies a series of points, traversed in a particular order, associated with a nominal location, like this (based on example H5):
We don't need to specify the Would this approach be suitable? Best wishes Jonathan |
Continuing the last posting: you also need the time of deployment. For the same arrangement, you would need a nominal time, of which this could be the bounds.
Jonathan |
Sorry to be late to the discussion - it's been a rough last few months. In our OceanSITES long-timeseries (merged deployments) files, we have more fields that are represented by nominal values but which we have deployment-level values. Aside from lat/long, we provide: sensor make/model, measurement depth, watch circle, magnetic correction applied, etc. We thought having just the start time of each deployment would make for too much burden on the user, and might lead them to lose information about instrumentation, measurement depth, etc. To simplify access to these deployment-level fields, we added a time series variable that contains the deployment number, hoping to make it straightforward to index into the relevant fields. Our convention hasn't been adopted by OceanSITES, which is pretty flexible on this type of file, but it works well for us. I'm curious about whether I can claim that this is CF-compliant. Selected details: --- scalar, nominal lat/long --- time series fields --- deployment level fields |
Dear Nan @ngalbraith I understand the convention. It's CF-compliant but CF would not make the link between your deployment variables and your time-dependent variables, using the It occurs to me that the OceanSITES convention is a bit like compression by gathering in CF section 8.2. A CF-compliant version, simplifying your example, would be:
The CF attribute Best wishes Jonathan |
Dear all The DSG ragged array representations (section 9) are ways of saving space in data variables as well as reducing the number of coordinate variables. The compression by gathering is another way of saving space in variables. In the way it's described and shown (section 8.2), its aim is to reduce the size of data variables, but it would work equally well for coordinate variables that have the same dimensions as data variables. I don't think that would be a change in the convention. Therefore it seems to me that we could use the DSG ragged array representation for a set of ocean data timeseries from moored stations in combination with compression by gathering for information about the deployment and redeployment of the stations. Here's an example, based on H.6 for timeseries of station data in the contiguous ragged array representation:
The If there's only one timeseries, we don't need the DSG ragged array mechanism. We can just use I wonder if that seems like a good way to deal with the issue raised by Fer @fmanzano-pde and Ludovic @ludo-ifr. Best wishes Jonathan |
@JonathanGregory -- This is an interesting suggestion. Wouldn't the coordinates of the temperature observation actually just be the This does seem like a valid approach and is a way to use ragged arrays to encode these three sets of spatial information. I would just hesitate to overload the |
Dear Dave @dblodgett-usgs Yes, the Best wishes Jonathan |
Thanks for reminding me of the nominal station location. I'm still now sure about the coordinates attribute of the temperature variable having the lon and deploy_lon in it. Is that according to CF as written? How would some software make the connection to the nominal and deployment positions using that information? |
Thanks @JonathanGregory for this example. So lon(station) will be the last known nominal position. It will be important to know the answer of the question raised by @dblodgett-usgs Just to concretise the last example:
|
Dear @dblodgett-usgs and @ludo-ifr Probably my proposed use of compression by gathering is an extension to the existing convention. In 8.2, we expect that the data variable will be compressed as well as its coordinate variable, in which case the compressed dimension will be a dimension of the data variable. I'm proposing to use this convention to compress an auxiliary coordinate variable (to avoid lots of repetition) although the data variable isn't compressed. A new exception is needed for the Referring to @ludo-ifr's example: The data variable Best wishes Jonathan |
Overloading coordinates like this makes me a little uneasy. I think it would be cleaner to express the same information in a new attribute (or two) designed specifically for this use case. e.g.
The original use case doesn't seem to require a separate precise_lon, correct? (I think the word "nominal" is getting in our way here) In that case, our contiguous ragged array representation could be:
A similar approach could be taken for indexed ragged array if needed... but I think having a stronger separation of concerns for the typical single-position / time series and the more nuanced sampling-position that varies over time is probably going to be cleaner? |
I'm really sorry for not writing until now, but I had a good excuse: this week is being bananas because I'm moving... and trying to survive in the new house, which is still a jungle of boxes... What a nice surprise finding out the issue has raised enormous interest. Thank you very much all of you! I'd like to mention that the option provided by @JonathanGregory and concretised by @ludo-ifr suits pretty well to the casuistic I wanted to reflect, and seems to be perfectly aligned with CF Conventions which is my main concern. On the other hand, I understand perfectly @dblodgett-usgs, the option of splitting "coordinates" attribute in "coordinates"/"precise_coordinates"/"deployment_coordinates" attributes also suits well. However, I find this solution more ad hoc. So, if the aforementioned solution, is accepted... For one station:
...I'd really like to include an additional example in the documentation to show how to record deployments' positions. All the best, Fer |
I just want to add that we could have 3 different cases:
In the third case we have to choose what will be lat & lon (could be the first position reveived, the last one, a mean of position, ...). |
@fmanzano-pde -- no worries at all to be a little asynchronous.
Is that necessarily a negative? By overloading coordinates, we require a person to look at a dataset to confirm that certain variables are being handled according to their use case. What I was trying to do is get to a place where a computer would be able to unambiguously and correctly interpret the data without human intervention. @ludo-ifr -- I think the word nominal is being misinterpreted here. There is already an accommodation for a "nominal" position with moving "precise location(s)" -- so 1 is ok. I don't think there is anything wrong with having a single time varying X and Y axis coordinate variable -- so option 2 is ok. Isn't case 3 a special case of 2? The mechanisms we've discussed above are all viable to some degree. @JonathanGregory -- I think the trade off is that adding variables with explicit metadata for these use cases as I've suggested can be layered on top of the existing convention pretty easily. Further overloading the coordinates with a third set of spatial coordinates requires a modification of the existing convention or, to my eye, it introduces ambiguity regarding what the role of each coordinate variable is. e.g. How do you tell the difference between:
Are you expecting implementing software to use shared dimensions and standard name to get everything figured out? In the case of example H.5, we have:
Note that axis = "X" is only on one of the two spatial coordinate variables. If we have two additonal X coordinate variables, I feel like we should be adding a cf_role or other custom attribute that distinguishes these from each other. I'm fine being wrong here, but having written some software to tease apart the coordinate variable relations with this part of the spec, it often feels like we are "divining" the relationships more than determining them. All the best -- happy we are on this topic! |
Dear Dave @dblodgett-usgs If I have correctly understood the use case, the three types of position are needed for an anchored floating platform which may drift around to some extent:
In my example, three different mechanisms are used for attaching these coordinates (scalar coordinate variable, auxiliary coordinate variable with the time or observation dimension, auxiliary coordinate variable with a compressed time or obs dimension), but I wouldn't use the way they're attached to distinguish them. I think we ought to define standard names for the nominal and deployment location coordinates. I prefer that to Best wishes Jonathan |
Hi @JonathanGregory -- Thanks for the clarification on the use case. Overloading If the consensus is to have Regards -- Dave |
Dear Dave
Standard names can be more or less precise, in order to draw distinctions as required. For example, we have both "time" and "forecast_reference_time", which can both be present as coordinate variables of different dimensions of a data variable, and are distinguished by standard_name.
Best wishes
Jonathan
|
Dear @dblodgett-usgs and @JonathanGregory Thank you very much for the discussion. If I understood correctly, the winner option is to use standard names for the different latitudes and longitudes, isn't it? Rewriting my example...
If it's correct...
|
Dear @fmanzano-pde I would be happy with this approach, unless there's a better idea. Dave @dblodgett-usgs may be unconvinced - I'm not sure. Other opinions would be welcome. If we go this way: (1) Yes, we could add it as a new example, which could be elaborated in this issue, (2) New standard names should be proposed as a new issue in the Best wishes Jonathan |
If the decision to use I may open a separate issue to discuss that point because it has rubbed me the wrong way for a long long time. |
Hi all, I've been following the discussion for a while now. I am not sure I understand what looks to be the main issue here, which is the overcrowding of the coordinates attribute that would prevent "unambiguously and correctly interpret the data without human intervention" (@dblodgett-usgs). For instance I do see that only one out of the now several variables bearing latitude values listed in ":coordinates" must be unambiguously identified as the "nominal latitude", X axis of a feature instance. I tend to think that the already existing variable attribute ":axis" on the variable that defines the "nominal latitude" should be sufficient to prescribe the variable with the X axis for a feature instance. But I don't see why the other latitude variables need to be assigned a specific "role" other than what the user wants to make out of them. Probably it is here where I am missing literacy. What are reading packages, machines, trying to do with those other latitude variables listed in ":coordinates"? Would it be a matter of CF Conventions having to turn more assertive on the use of ":axis" more than just recommending? Best, IPG |
Dear all, thank you very much for your responses. I have to say that I agree with all of you... In my opinion, the situation is getting out of hand. Let me explain myself. @dblodgett-usgs I think you did it well creating a separate issue to discuss the differences between cf_role and standard_name @IPerezGonzalez I agree with you, the most important attribute regarding the coordinates is the attribute axis So... Should we use standard_name and/or cf_role to distinguish between the different latitudes/longitudes? In my opinion, none of them. @JonathanGregory I was really satisfied with your solution using the dimension deployment, the variable deployment (deployment) as a :compress of time, and deploy_lon(deployment). Why don't we forget about adding more information? At the moment, "H.2.3. Single time series, including deviations from a nominal fixed spatial location" is including:
And the text:
(something like that) What do you think? All the best, |
Dear Fer @fmanzano-pde Thanks for the example. We will also need to make a small change in the section about compression by gathering, to allow an auxiliary coordinate variable to be compressed when the data variable isn't. I had thought that you wanted three different kinds of location, but your example shows that you need only two. You don't have an unchanging nominal location for the station. Do you, or does anyone, have a need for a fixed (nominal) location and an infrequently changing deployment location? If we only need one of them, we may need only one new standard name. Best wishes Jonathan |
I'm sorry @JonathanGregory, but I'm confused, I didn't understand your last post. It's very important to show clearly the last known deployment position as it will be the nominal position, that is the coordinates X and Y. It's true that in my example both lat and deploy_lat are deployment positions, but lat plays the role of nominal. I understand you mean that you still think that it's important to distinguish between a measured lat (precise_lat) and a not measured lat (lat and deploy_lat), don't you? But in the end all the positions are measured, because even in the deployments GPS sensors are used to set the position. By the way, I've edited my previous post to clearly talk about "deployment" positions, and reserve the word "nominal" only for the one containing the axis attribute, that is the coordinates. All the best, |
Dear Fer @fmanzano-pde Sorry for confusion. I think I misunderstood you regarding "nominal". The example is fine with three kinds of location. The nominal location does not change, the deployment position changes occasionally, the precise location could be different for every observation. Is that correct? We can distinguish the three kinds of location by their standard names. I think it's logical for the precise location to be plain Best wishes Jonathan |
Dear JonathanGregory - I most definitely need a fixed nominal position (which identifies the site) and an infrequently changing position (that's more nearly correct for a given deployment). We usually also have GPS units recording on our buoys, but we don't publish that data because, for subsurface data, it's still not actually correct on a slack mooring line.
I'm very concerned that removing the option of using latitude and longitude standard names for nominal positions will make our data less usable. These 3 levels of position can be considered as similar to data where,say, air temperature is presented at different intervals - 1 minute being observed, but hourly and daily averages being provided for long time series use. These are all given the air_temperature standard name and the difference between them is noted by the time stamp, the long name, and maybe cell methods. That approach seems more straightforward to me. |
Dear all, I feel that we are very close to an agreement. We all understand the affair and the mechanism to do it is more or less clear. The only pending issue would be the standard_names to be used in the different types of location, right? For me it was ok the last proposal @JonathanGregory made, but I also understand the concern @ngalbraith mentioned in the last post. However, I think that adding the new standard names proposed by Jonathan shouldn't be a problem. In the specific case shown by @ngalbraith, the GPS position is not distributed as it is confusing (for subsurface data, it's still not actually correct on a slack mooring line). Perhaps, in that case, the nominal position provided could be kept as just latitude and longitude (as it is now), because there are no other types of location to distinguish. That way retrocompatibility would be kept. By the way, I'd really appreciate if @dblodgett-usgs could express his opinion on this specific topic about standard names and the last proposal I made based on Jonathan's one. ... I can smell the final agreement on this. All the best, |
Hi Fer -- I'm afraid I've kind of lost track here. Without a stable conceptual basis for all this, the words aren't holding a stable meaning and I'm struggling to keep all these things straight. Let me see if I understand where we are at... We have three sets of locations.
@JonathanGregory has suggested additional I have concerns about this approach because I feel that overloading So, unless others want to argue otherwise, I think the right thing to do is define new Regards,
|
Thank you very much Dave, clear as water. The only pending issue here is the suitability for the situation Nan brought to the fore. I insist it shouldn't be a problem as the new standard names could be used, but also "latitude" and "longitude" could be kept as there are no other locations provided. Anyway, I'll wait for her consent before closing the discussion and moving on. All the best, |
Dear Fer @fmanzano-pde I agree that
Since we can use the Best wishes Jonathan |
Dear @JonathanGregory I've created a new branch for the pull request: https://github.com/fmanzano-pde/cf-conventions-deployment_position Would you mind to include the "small change in the section about compression by gathering, to allow an auxiliary coordinate variable to be compressed when the data variable isn't" you mentioned in this branch? Thay way (I guess) the pull request would include all the related changes at once? Anything else? Thank you very much! All the best, PS Regarding the "pull request" I don't know what info I have to add in the # Release checklist. Could you help me? |
Dear Fer It's easier for me to draft some text here, which you could copy into your branch.
For the pull request, you have to add a line at the start of Best wishes Jonathan |
Thank you very much @JonathanGregory for the support and all your remarks. I've added everything to the pull request just opened (#431). I've changed the name of this issue 428 as the original title could mislead. |
Dear all That looks fine to me, thanks, Fer @fmanzano-pde. To summarise: the pull request amends an example in Appendix H to show deployment locations for a DSG timeseries of ocean observations, in addition to the nominal location and the precise location. Both of the latter still have standard names of Nan @ngalbraith and Dave @dblodgett-usgs, are you willing to support the proposal in this form? (I am aware that Dave has reservations more generally, but not specifically about this proposal.) Do others have comments to make? Best wishes Jonathan |
Dear all, I just have a very minor comment. It looks like there is a good agreement that enables the user accessing deployment locations in a simple way. However, I see a risk of minimal incoherence with the introduction of the deployment_latitude|longitude standard_name(s): the nominal latitude and longitude for the time series will be that of a deployment (looks like the last deployment position is the one of choice) This nominal positions are therefore in nature the new deployment_latitude|longitude standard_names(s). I fully agree with keeping the standard name latitude|longitude for the nominal positions, as I think everyone does. I just wanted to flag that by introducing the deployment_latitude|longitude as standard names, there is now a bit of an incoherence . Maybe I am more alligned with @ngalbraith and feel that the long_name would be the place tell them apart. best, Irene |
Dear @IPerezGonzalez, It's unquestionable there are many ways to do it, in fact, during the last months we've been discussing about them to find out the best solution. Yesterday night I went to bed really happy as I thought we finally achieved to reach an agreement. My joy in a well... I understand your concern, but I don't completely agree with you. It's true that most of the times, the nominal latitude and longitude for the time series will be that of the last deployment, but not necessarily: Imagine that a provider decide that the nominal latitude and longitude values of the whole time series is not the last deployment but the first one, with the objective of keeping the same nominal position without changing it ever (it also makes sense, because these changes in the deployment position don't have significant implications - otherwise, inevitablye, time series would have to be split). So, I still defend the adopted solution. The nominal position is different conceptually from the deployment positions, although it's true that nominal position and deployment position can coincide. Using All the best, Fer |
Dear @IPerezGonzalez @fmanzano-pde I too understand Irene's concern but I agree with Fer's argument. (Unlike both of you, I'm not an expert in this area.) Nan @ngalbraith wrote, "I'm very concerned that removing the option of using latitude and longitude standard names for nominal positions will make our data less usable." Following her comment, I changed what I had earlier suggested, so that Nan @ngalbraith, do you think the current proposal is OK? The Best wishes Jonathan |
Dear @JonathanGregory, @fmanzano-pde , Learning that for the nominal position there might be cases where you don't have the actual deployment information and therefore have to build the nominal position somehow from the GPS coordinates, has helped me be more at ease with the idea of the deployment coordinates having a different stardard name. I agree with Chapter 9 and examples review (in a different issue). I was surprised not to find the axis attribute on latitude and longitude for examples in the trajectory DSG, for instance. Maybe in this case there is a reason I ignore, but I would expect :axis to be there. Kind regards, Irene |
@ IPerezGonzalez
This is not the way I'd use the terms in my data sets. In sequentially redeployed buoys, there's a nominal position for a "site" and an exact deployment position every time a buoy is redeployed. If we want a new set of terms for nominal positions, I'd be inclined to use nominal_latitude etc. |
When I'm looking at an observational dataset that I've never seen before and it claims to conform to some CF version. I'm going to expect that the lon/lat/z variables conform to Chapter 4 and would contain at least one variable each with standard names I think these new standard names should be addition to and not replace the existing standard names in a dataset. In a situation where a deployment position is the only one you have, I think the existing Very related, my group has a small need for recovery positions as well. The three locations for CTD/Rosette casts we used to report in WOCE:
Just to be explicit:
I do not support:
|
Dear Andrew @DocOtak Does the present form of the proposal about these standard names look OK to you? Please feel free to propose names for recovery position in another Because @fmanzano-pde opened the separate issue about the standard names, in this issue we need only consider the changes to the convention. These are minor, I would say, and relate to allowing compression to be applied to an auxiliary coordinate variable in a discrete sampling geometry. Please see example H.5 in the modified Appendix H. Would you support it? Best wishes Jonathan |
@JonathanGregory I do support the modified example in H.5 as it clearly shows:
My data are not ready for a |
Dear Andrew @DocOtak Thanks for your support. At present, then, I don't think there are any concerns that have not been addressed, and sufficient support has been expressed. Therefore this proposal will be accepted in 21 days (5th April) if no further problems are raised before then. Best wishes Jonathan |
Three weeks have passed without further comment, so this change is now accepted according to the rules. Thanks for proposing it and seeing it to a conclusion, Fer @fmanzano-pde. I'm going to merge the pull request, which will close this issue. |
I've created a new pull request: #436 to add the author to the header in addition to the list. Thank you very much! |
Title
Recording changes in nominal position (new deployments).
Moderator
@user
Moderator Status Review [last updated: YYYY-MM-DD]
Initial: 2023-01-24
Requirement Summary
Providing a mechanism to record nominal position changes (new deployments) for timeSeries (and other representations) Discrete Sampling Geometries.
Technical Proposal Summary
Creating a new global attribute to record new deployments positions to track the slight differences in nominal positions along the history of the stations.
Benefits
As the technical coleader of the Copernicus Marine In Situ TAC, we're analysing an evolution of our NetCDF implementation to fully comply with CF Conventions. We've realized that our data fit perfectly the Discrete Sampling Geometries (excepting HF radars which are actually gridded data). One of the most relevant inconvenients to proceed forward is the lack of mechanism to register information regarding the nominal position changes.
At the moment, Copernicus Marine In Situ TAC reports the nominal position in LATITUDE and LONGITUDE coordinate variables with TIME as dimension.
Status Quo
I've not been able to find any reference to changes in nominal positions in the main standards as CF Conventions or ACDD.
OceanSITES includes a platform_deployment_date attribute, considering a deployment as an instrumented platform performing observations for a period of time, considering changes to the instrumentation or to the spatial characteristics of the platform or its instruments constitute the end of the deployment.
It's not the case of Copernicus Marine In Situ TAC as we consider the slight differences in positions along time due to maintenances or repositioning after drifts do not affect the continuity of the time series in the long term.
Associated pull request
#431
Detailed Proposal
The proposal consists of adding a new recommended global attribute officially in the CF Conventions documentation, specifically to "Appendix A: Attributes".
The proposed name would be "deployment_positions". The attribute will be multi-valued, a comma-separated list. Each value will be a date, latitude and longitude (blank-separated).
The date format will follow the Attribute Content Guidance of ACDD, that is YYYY-MM-DDThh:mm:ss
Example:
attributes:
:deployment_positions = "2013-06-22T12:30:00Z 44.1432 -7.7122,2017-11-23T10:00:00Z 44.1421 -7.7118";
Additionally, un update of the documentation is required to complete the example "Example H.5. A single timeseries with time-varying deviations from a nominal point spatial location" to add the new attribute aforementioned.
The text was updated successfully, but these errors were encountered: