-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IOOS:1.2 platform
check
#748
Comments
UPDATE Upon further investigation, the
The dataset was obtained via
This may be why the checker thinks it's a gridded model. |
UPDATE Indeed - after getting the full
, the check yields this result regarding the platform:
This is the desired behavior, as dimensionality of the variable containing
where
@mwengren this is what you expect, correct? |
@daltonkell Yes, that's what we expect to happen currently. Regarding the SECOORA dataset, this an issue I raised to Axiom last week. It gets a little tricky with ERDDAP though, because there are different netCDF representations of the same tabledap dataset. It looks like you requested the .nc format: this actually is the 'flattened' version of the file I believe, and won't ever be CF DSG compliant. Instead, can you request the .ncCF or .ncCFMA formats and test those and see if the results match? See documentation here: https://coastwatch.pfeg.noaa.gov/erddap/tabledap/documentation.html. I think in this case they will, but those would actually be the correct tests for it to do. How does CC account for different output formats in ERDDAP? It may need to request either the .ncCF or .ncCFMA in order to be able to apply these tests. Glad to see it caught it the issue with the cf_role=timeseries_id variable being > 1 in any case, I think we're on the right track. Maybe the error message can be improved though?
Can we say something to the effect of I'd also be curious how it handles one of the compound DSG datatypes. Have you tested it on a GliderDAC trajectoryProfile dataset? In that case, dimension of variable with cf_role=trajectory_id should be 1, whereas variable with cf_role=profile_id can be unlimited. Does the error in your first comment still apply for this dataset? A global platform attribute is required as well, it should check existence of that separately from dimensionality of the platform variable. |
@mwengren I'm doing a bit of updating to that check now, and would be happy to update the message. The first comment isn't necessarily applicable, but I think that has more to do with the format and how that subsetted the data. Apparently using the I'll be testing extensively with all the applicable formats, so hopefully that will reveal some strange edge cases before the end users. Regarding how the CC accounts for different ERDDAP formats: it currently doesn't. It only takes NetCDF, .cdl, or OPeNDAP endpoints. We'll have to formulate a plan about that. |
@daltonkell take a look at the table of output formats/fileTypes here: https://coastwatch.pfeg.noaa.gov/erddap/tabledap/documentation.html. There are only two that I think can be expected to be DSG-compliant: .ncCF and .ncCFMA and therefore be used for that check. I guess we'll have to vary what output format the checker retrieves from ERDDAP if it's passed an ERDDAP URL and is performing the IOOS 1.2 check? For example, CC should be able to extract the base URL for an ERDDAP dataset if the user passes a URL to a specific format like .csv, and it should instead request one of the two formats above to run checks against. That's an approach anyway. Not sure if that affects other checks or not though. |
Following up with a specific example on how to check the 'Platform Variable' dimensionality in ERDDAP. For the moment, let's hold off on making any code changes related to this check - we need to regroup with NDBC to better understand their needs for harvesting, and we may have to relax the requirement for 'Platform Variable' to have dimension=1 as a result. TBD. Regardless, if we were to check this in ERDDAP, we can use the following dataset as an example for how the different ERDDAP netCDF output formats adhere to CF DSG guidelines or not: CF DSG-compliant ERDDAP netCDF formats include .ncCF and .ncCFMA. Below are their corresponding headers (to preview each variable's dimensionality):
Non-CF DSG-compliant ERDDAP netCDF format (aka ERDDAP table-like structure) is .nc: Search for 'char station' in each of these and you will see how the dimensionality varies: for the first two, it is 1, for the last, 14786. In order for Compliance Checker to accurately test something like 'Platform Variable' dimension, it will need to request one of the first two formats from ERDDAP. |
while in it, can you also clarify with NDBC what data qualifies to be GTS worthy? |
@fgayanilo @mwengren In short, NDBC harvests many variables from the RAs, and only a subset are sent on to the GTS. Variables NDBC accepts: Variables NDBC sends to the GTS: |
@daltonkell I think after last week's webinar, we've settled on the modified rules for CF featureType checking for the 'Platform' check here. If this doesn't make sense, please let me know. Can we adapt this check to vary depending on the value of the Here's what the check should do (depending on the value of the
|
@mwengren Looking into this now, are there a few datasets you have in mind I could test on? |
@mwengren additionally, is the same messaging and logic intended for |
@daltonkell Based on what you said in #760 (comment), for the
Yes, same rules for For examples, these test datasets should both pass the test:
This SECOORA dataset looks like it should pass as well: Any GliderDAC dataset should all pass as
I think a lot of the datasets I'd been testing with before that would have failed the |
Cool. I'm thinking that for the instances where the |
So, the way I have the logic set up right now is:
It seems that both http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_profile and http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah don't have the |
Actually, this specific check is pretty critical to this system design, so I'd want it to appear as a 'high priority' message. Now that I think of it, we really need this to be part of the GTS ingest test and associated metadata profile guidance. NDBC will be harvesting ERDDAP datasets based on the assumption they all represent one 'platform'. If a provider puts multiple platforms in a single dataset and accidentally gives it a proper As for:
Can you clarify what you mean by that? Global They're both requirements according to the profile, and the latter one should always point to the correct 'Platform Variable' to check, but it's possible there could be inconsistencies: what if a dataset had differing variable-level I think it should key off of the
Those two test datasets should fail the global/variable |
A far as
goes, my logic was this: If the dataset doesn't have a platform global attribute but has platform variable(s), it should fail the check regardless of variable dimensions, If we want to check the platform variables for dimensions, featureType, and cf_role like stated above regardless of whether the dataset has a |
Yes, let's test each separately (existing platform checks and platform dimension). I think that will be less confusing. I can think of three:
If this is confusing, let me know. |
Those are great -- I'm all for smaller, more compact checks. I'll even toss some pseudo-code in here so we can work through that if it looks strange |
I think this pseudo-code summarizes the strategy of what we're looking for:
Shout out any questions or inputs you may have |
Looks good to me. We could add to the check_single_platform criteria that the name detected for the |
Testing resultsTimeSeriesProfile: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_profile
TimeSeries: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah
http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550
TrajectoryProfile: https://gliders.ioos.us/erddap/tabledap/amelia-20180501T0000
@mwengren the above results seem to be in line with the |
@daltonkell Looks like a good start. There are a few issues though.
For the timeSeries tests you ran, this output for this one looks correct: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah - the 'timseries' dimension is 2 here: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah.ncCFHeader But for this one: http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550, the timeseries dimension should actually be 1: http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550.ncCFHeader The Glider DAC dataset shouldn't fail in the trajectory dimension either: https://gliders.ioos.us/erddap/tabledap/amelia-20180501T0000.ncCFHeader |
Your comments check out for
and for the
where it should only be checking the
I'll get on this and re-test |
Testing results, round 2edu_usf_marine_comps_1407d550.ncCF.nc: ✔️
amelia-20180501T0000.ncCF.nc: ✔️ (This dataset should have a global "platform" variable though)
|
@daltonkell: Protip, making checkboxes in Markdown rather than unicode/emoji should automatically create a task list with completed/out of in the GitHub issues. See https://help.github.com/en/github/managing-your-work-on-github/about-task-lists |
Oh, well that's cool! |
@mwengren how do the most recent test results look to you? |
@daltonkell sorry, got swamped with other things recently. On edu_usf_marine_comps_1407d550.ncCF.nc, I think since that dataset has 'station' dimension of 1, it shouldn't alert with the message (only if >1). For amelia-20180501T0000.ncCF.nc, I think that one looks pretty good. What are we checking exactly on the test that produces this message though? Do we check that the variable-level Other than that, this looks to be good to me. |
Not a problem, that's why I sent an extra ping your way.
When the checker looks to see if the dataset contains a single platform (explicitly, not the DSG check), there are several combinations of results:
I'll ensure that if the message for the |
Great! Otherwise it is a bit confusing, but it is relevant for any timeSeries dataset with 'station' dimension > 1. Re: platform checks, those all look good to me. I guess what threw me off was the message |
@daltonkell I'm doing a few checks with latest master for the platform check. I think one remaining issue to fix if possible is the format CC requests from ERDDAP for the ioos-1.2 test to one of the .ncCF of .ncCFMA types. More info: #748 (comment). If I run the following on the http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550 dataset, I get errors about the CF DSG check dimensions:
Ideally, we want data providers to run the ioos-1.2 test directly against ERDDAP datasets as in the command above, without first downloading them. Can this change be made without too much difficulty? |
@daltonkell I've run a few tests today after #801 was merged against some of the sample ERDDAP datasets from #723, and so far it looks like the platform dimension check has been resolved mostly. It seems to work correctly for the PacIOOS datasets:
Both show a list of 'accepted' GTS ingest variables such as:
Which they didn't before - good news. This PacIOOS dataset doesn't include the Issue:
The issue with the test I think is that they both have a https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/AWS-HIMB.das Is there a way we can make that message more informative about what the issue is and how it might be fixed? Hopefully the variable name references are available in the code, but something like 'Attribute cf_role in variable platform1 not present. The Platform Variable 'platform1' should match the variable containing the cf_role attribute according to CF/IOOS profile guidelines'? |
The test uses a preset attribute check that
I can combine the messages in the two steps. |
Closing on behalf of changes in #813 |
Calling the
ioos:1.2
checker on this SECOORA dataset yields the following messages:This dataset is not a gridded model, and contains the attribute
platform: buoy
.The text was updated successfully, but these errors were encountered: