Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOOS:1.2 platform check #748

Closed
daltonkell opened this issue Feb 26, 2020 · 35 comments
Closed

IOOS:1.2 platform check #748

daltonkell opened this issue Feb 26, 2020 · 35 comments
Assignees
Labels
bug IOOS:1.2 Issues relating to the IOOS Metadata Profile v1.2

Comments

@daltonkell
Copy link
Contributor

Calling the ioos:1.2 checker on this SECOORA dataset yields the following messages:

platform
* Gridded model datasets are not required to declare a platform
* The global attribute "platform" must be a single string containing no blank characters; it is None

This dataset is not a gridded model, and contains the attribute platform: buoy.

@daltonkell daltonkell added bug IOOS:1.2 Issues relating to the IOOS Metadata Profile v1.2 labels Feb 26, 2020
@daltonkell
Copy link
Contributor Author

daltonkell commented Feb 26, 2020

UPDATE

Upon further investigation, the ncdump -h of this dataset reveals that platform is actually not included in the header:

netcdf secoora_edu_usf_marine_comps {
dimensions:
        row = 647 ;
        Row_Type_strlen = 9 ;
        Variable_Name_strlen = 56 ;
        Attribute_Name_strlen = 28 ;
        Data_Type_strlen = 6 ;
        Value_strlen = 452 ;
variables:
        char Row_Type(row, Row_Type_strlen) ;
                Row_Type:_Encoding = "ISO-8859-1" ;
        char Variable_Name(row, Variable_Name_strlen) ;
                Variable_Name:_Encoding = "ISO-8859-1" ;
        char Attribute_Name(row, Attribute_Name_strlen) ;
                Attribute_Name:_Encoding = "ISO-8859-1" ;
        char Data_Type(row, Data_Type_strlen) ;
                Data_Type:_Encoding = "ISO-8859-1" ;
        char Value(row, Value_strlen) ;
                Value:_Encoding = "ISO-8859-1" ;

// global attributes:
                :id = "edu_usf_marine_comps_1407d550_info_1480046041" ;
}

The dataset was obtained via

$ wget http://erddap.secoora.org/erddap/info/edu_usf_marine_comps_1407d550/index..nc -O secoora_edu_usf_marine_comps.nc

This may be why the checker thinks it's a gridded model.

@daltonkell
Copy link
Contributor Author

daltonkell commented Feb 26, 2020

UPDATE

Indeed - after getting the full .nc file

wget "http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550.nc?time%2Clatitude%2Clongitude%2Cz%2Cair_temperature%2Cair_temperature_qc_agg%2Cair_temperature_qc_tests%2Cair_pressure%2Cair_pressure_qc_agg%2Cair_pressure_qc_tests%2Csea_water_electrical_conductivity%2Csea_water_electrical_conductivity_qc_agg%2Csea_water_electrical_conductivity_qc_tests%2Clwe_thickness_of_precipitation_amount%2Clwe_thickness_of_precipitation_amount_qc_agg%2Clwe_thickness_of_precipitation_amount_qc_tests%2Crelative_humidity%2Crelative_humidity_qc_agg%2Crelative_humidity_qc_tests%2Csea_surface_height_above_sea_level_geoid_navd88%2Csea_surface_height_above_sea_level_geoid_navd88_qc_agg%2Csea_surface_height_above_sea_level_geoid_navd88_qc_tests%2Csea_water_temperature%2Csea_water_temperature_qc_agg%2Csea_water_temperature_qc_tests%2Csea_water_temperature_lower_well%2Csea_water_temperature_lower_well_qc_agg%2Csea_water_temperature_lower_well_qc_tests%2Csea_water_temperature_upper_well%2Csea_water_temperature_upper_well_qc_agg%2Csea_water_temperature_upper_well_qc_tests%2Cwind_speed_of_gust%2Cwind_speed_of_gust_qc_agg%2Cwind_speed_of_gust_qc_tests%2Cwind_speed_of_gust_sonic%2Cwind_speed_of_gust_sonic_qc_agg%2Cwind_speed_of_gust_sonic_qc_tests%2Cwind_speed_samples_sonic%2Cwind_speed_samples_sonic_qc_agg%2Cwind_speed_samples_sonic_qc_tests%2Cwind_speed%2Cwind_speed_qc_agg%2Cwind_speed_qc_tests%2Cwind_from_direction%2Cwind_from_direction_qc_agg%2Cwind_from_direction_qc_tests%2Cwind_speed_sonic%2Cwind_speed_sonic_qc_agg%2Cwind_speed_sonic_qc_tests%2Cwind_from_direction_sonic%2Cwind_from_direction_sonic_qc_agg%2Cwind_from_direction_sonic_qc_tests%2Cstation&time%3E=2020-02-19T00%3A00%3A00Z&time%3C=2020-02-26T11%3A54%3A00Z" -O secoora_edu_usf_marine_comps_1407d550_FULL.nc

, the check yields this result regarding the platform:

platform variables
* With a single platform provided, the dimension of the cf_role variable station (cf_role==timeseries_id) should also be equal to 1 (it is 1800) 

This is the desired behavior, as dimensionality of the variable containing cf_role suggests that multiple stations are used:

        char station(row, station_strlen) ;
                station:_Encoding = "ISO-8859-1" ;
                station:cf_role = "timeseries_id" ;
                station:ioos_category = "Identifier" ;
                station:ioos_code = "urn:ioos:station:com.axiomdatascience:75364" ;
                station:long_name = "Big Carlos Pass (Active)" ;
                station:short_name = "edu_usf_marine_comps_1407d550" ;
                station:type = "buoy" ;

where

netcdf secoora_edu_usf_marine_comps_1407d550_FULL {
dimensions:
        row = 1800 ;

@mwengren this is what you expect, correct?

@mwengren
Copy link
Member

@daltonkell Yes, that's what we expect to happen currently.

Regarding the SECOORA dataset, this an issue I raised to Axiom last week. It gets a little tricky with ERDDAP though, because there are different netCDF representations of the same tabledap dataset. It looks like you requested the .nc format: this actually is the 'flattened' version of the file I believe, and won't ever be CF DSG compliant. Instead, can you request the .ncCF or .ncCFMA formats and test those and see if the results match? See documentation here: https://coastwatch.pfeg.noaa.gov/erddap/tabledap/documentation.html. I think in this case they will, but those would actually be the correct tests for it to do.

How does CC account for different output formats in ERDDAP? It may need to request either the .ncCF or .ncCFMA in order to be able to apply these tests.

Glad to see it caught it the issue with the cf_role=timeseries_id variable being > 1 in any case, I think we're on the right track. Maybe the error message can be improved though?

With a single platform provided, the dimension of the cf_role variable station (cf_role==timeseries_id) should also be equal to 1 (it is 1800) 

Can we say something to the effect of The IOOS metadata profile restricts datasets to a single CF DSG platform per dataset. For timeSeries featureType, the dimension of the cf_role variable station...?

I'd also be curious how it handles one of the compound DSG datatypes. Have you tested it on a GliderDAC trajectoryProfile dataset? In that case, dimension of variable with cf_role=trajectory_id should be 1, whereas variable with cf_role=profile_id can be unlimited.

Does the error in your first comment still apply for this dataset? A global platform attribute is required as well, it should check existence of that separately from dimensionality of the platform variable.

@daltonkell
Copy link
Contributor Author

@mwengren I'm doing a bit of updating to that check now, and would be happy to update the message.

The first comment isn't necessarily applicable, but I think that has more to do with the format and how that subsetted the data. Apparently using the .../index.nc does not get the actual dataset (?)... I'm a bit unfamiliar with the plethora of ERDDAP data formats.

I'll be testing extensively with all the applicable formats, so hopefully that will reveal some strange edge cases before the end users.

Regarding how the CC accounts for different ERDDAP formats: it currently doesn't. It only takes NetCDF, .cdl, or OPeNDAP endpoints. We'll have to formulate a plan about that.

@mwengren
Copy link
Member

@daltonkell take a look at the table of output formats/fileTypes here: https://coastwatch.pfeg.noaa.gov/erddap/tabledap/documentation.html. There are only two that I think can be expected to be DSG-compliant: .ncCF and .ncCFMA and therefore be used for that check.

I guess we'll have to vary what output format the checker retrieves from ERDDAP if it's passed an ERDDAP URL and is performing the IOOS 1.2 check? For example, CC should be able to extract the base URL for an ERDDAP dataset if the user passes a URL to a specific format like .csv, and it should instead request one of the two formats above to run checks against. That's an approach anyway. Not sure if that affects other checks or not though.

@mwengren
Copy link
Member

mwengren commented Mar 9, 2020

Following up with a specific example on how to check the 'Platform Variable' dimensionality in ERDDAP. For the moment, let's hold off on making any code changes related to this check - we need to regroup with NDBC to better understand their needs for harvesting, and we may have to relax the requirement for 'Platform Variable' to have dimension=1 as a result. TBD.

Regardless, if we were to check this in ERDDAP, we can use the following dataset as an example for how the different ERDDAP netCDF output formats adhere to CF DSG guidelines or not:

CF DSG-compliant ERDDAP netCDF formats include .ncCF and .ncCFMA. Below are their corresponding headers (to preview each variable's dimensionality):

Non-CF DSG-compliant ERDDAP netCDF format (aka ERDDAP table-like structure) is .nc:

Search for 'char station' in each of these and you will see how the dimensionality varies: for the first two, it is 1, for the last, 14786.

In order for Compliance Checker to accurately test something like 'Platform Variable' dimension, it will need to request one of the first two formats from ERDDAP.

@fgayanilo
Copy link

while in it, can you also clarify with NDBC what data qualifies to be GTS worthy?

@kbailey-noaa
Copy link

@fgayanilo @mwengren In short, NDBC harvests many variables from the RAs, and only a subset are sent on to the GTS.
GTS-worthy = variables useful for weather and ocean models and forecasters.

Variables NDBC accepts:
Meteorological: Winds (direction, speed, gust), air temperature, dew point, relative humidity, barometer, sea surface temperature, short wave and long wave radiation
Oceanographic: Sub-surface water temperature and salinity (30 depths), Ocean currents (70 depths/bins), Dissolved oxygen (near surface and near bottom), turbidity, chlorophyll, water levels, pH, EH
Waves: Wave height, dominant period, mean wave direction; Directional and non-directional (bulk and spectral);

Variables NDBC sends to the GTS:
They send 3 types of GTS messages: Meteorological, Wave, and TESAC messages
Meteorological = winds (direction, speed), air temperature, dew point, sea level pressure, water temperature, wave significant height, dominant period
-- yes, waves are included in 'meteorological' and I think it's because their met buoys also collect wave data).
Wave = Directional and non-directional wave data
TESAC= water temp, salinity, ocean currents (speed/dir)

@mwengren
Copy link
Member

@daltonkell I think after last week's webinar, we've settled on the modified rules for CF featureType checking for the 'Platform' check here. If this doesn't make sense, please let me know.

Can we adapt this check to vary depending on the value of the featureType attribute for a dataset (including different warning messages for each)? Reason is that we decided to allow multiple features (dimension >1) for featureType=timeSeries only.

Here's what the check should do (depending on the value of the featureType global attribute):

  • featureType=timeSeries: if dimension of variable with cf_role=’timeseries_id’ > 1, display message:
    • Dimension length of the variable with cf_role=’timeseries_id’ (the ‘station’ dimension) is XX. Note that the IOOS profile restricts timeSeries datasets with multiple features to share the same lat/lon position (ie. to exist on the same platform). Datasets that include multiple platforms are not valid and will cause harvesting errors.
  • featureType=’timeSeriesProfile’: if dimension of variable with cf_role=’timeseries_id’ > 1, display message:
    • Dimension length of the variable with cf_role=’timeseries_id’ (the ‘station’ dimension) must be equal to 1 (it is XX). The IOOS profile restricts timeSeriesProfile datasets to a single platform (ie. station) per dataset.
  • featureType=’trajectoryProfile’: if dimension of variable with cf_role=’trajectory_id’ > 1, display message:
    • Dimension length of the variable with cf_role=’trajectory_id’ (the ‘trajectory’ dimension) must be equal to 1 (it is XX). The IOOS profile restricts trajectoryProfile datasets to a single platform (ie. trajectory) per dataset.
  • featureType='profile': if dimension of variable with cf_role=’profile_id’ > 1, display message:
    • Dimension length of the variable with cf_role=’profile_id’ (the ‘profile’ dimension) must be equal to 1 (it is XX). The IOOS profile restricts profile datasets to a single platform (ie. profile) per dataset.
  • featureType='point': do nothing

@daltonkell
Copy link
Contributor Author

@mwengren Looking into this now, are there a few datasets you have in mind I could test on?

@daltonkell
Copy link
Contributor Author

@mwengren additionally, is the same messaging and logic intended for featureType==trajectory as featureType==trajectoryProfile above?

@mwengren
Copy link
Member

mwengren commented Apr 1, 2020

@daltonkell Based on what you said in #760 (comment), for the timeSeries case above, we might have to fail the check in order to display the message, even though it would technically be passing (it's more of a warning message, which I don't think we have the concept of in CC). For all the others, these would be actual test failures.

additionally, is the same messaging and logic intended for featureType==trajectory as featureType==trajectoryProfile above?

Yes, same rules for trajectory as trajectoryProfile, forgot that one.

For examples, these test datasets should both pass the test:

This SECOORA dataset looks like it should pass as well:

Any GliderDAC dataset should all pass as trajectoryProfile type. For example:

I think a lot of the datasets I'd been testing with before that would have failed the timeSeries check have already been fixed. Might have to dig one up if we need an example to test failure.

@daltonkell
Copy link
Contributor Author

Cool. I'm thinking that for the instances where the timeSeries check "fails", we can use a lower-priority message so it appears more as a warning than as an explicit failure. Depending on which strictness criteria is specified when invoking the test, the message might not even show up -- how does that sound?

@daltonkell
Copy link
Contributor Author

So, the way I have the logic set up right now is:

if datatset has platform attr:
    - test the conditions

It seems that both http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_profile and http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah don't have the platform global attribute -- at least, it doesn't look like it from the DAS. Does that check out with you?

@mwengren
Copy link
Member

mwengren commented Apr 1, 2020

I'm thinking that for the instances where the timeSeries check "fails", we can use a lower-priority message so it appears more as a warning than as an explicit failure.

Actually, this specific check is pretty critical to this system design, so I'd want it to appear as a 'high priority' message.

Now that I think of it, we really need this to be part of the GTS ingest test and associated metadata profile guidance. NDBC will be harvesting ERDDAP datasets based on the assumption they all represent one 'platform'. If a provider puts multiple platforms in a single dataset and accidentally gives it a proper wmo_platform_code and gts_ingest attributes, it's going to cause problems.

As for:

if datatset has platform attr:
- test the conditions

Can you clarify what you mean by that? Global platform attribute or variable-level platform attribute pointing to a global variable (of whatever name) that represents the 'Platform Variable'?

They're both requirements according to the profile, and the latter one should always point to the correct 'Platform Variable' to check, but it's possible there could be inconsistencies: what if a dataset had differing variable-level platform attributes so it wasn't clear which the right 'Platform Variable' was? That case should fail that check, but would lead to problems running this platform dimensionality check.

I think it should key off of the featureType and then variable-level cf_role attributes directly instead as described above. With the addition of:

featureType=’trajectoryProfile' > cf_role=trajectory_id > 1:

  • Dimension length of the variable with cf_role=’trajectory_id’ (the ‘trajectory’ dimension) must be equal to 1 (it is XX). The IOOS profile restricts trajectory datasets to a single platform (ie. trajectory) per dataset.

Those two test datasets should fail the global/variable platform attribute checks, but not this one. Probably should revise the title of this issue to make this all clearer and differentiate from the other tests.

@daltonkell
Copy link
Contributor Author

A far as

if datatset has platform attr:
- test the conditions

goes, my logic was this:

If the dataset doesn't have a platform global attribute but has platform variable(s), it should fail the check regardless of variable dimensions, featureType, or cf_role.

If we want to check the platform variables for dimensions, featureType, and cf_role like stated above regardless of whether the dataset has a platform global attribute or not, I can make that happen.

@mwengren
Copy link
Member

mwengren commented Apr 1, 2020

Yes, let's test each separately (existing platform checks and platform dimension). I think that will be less confusing.

I can think of three:

  1. global attribute platform
  2. variable attribute(s) platform - point to single 'Platform Variable' name
  3. dimension of CF DSG station|profile|trajectory variable (aka 'Platform Variable') - name of this should match value of # 2, but may not (this actually should be a separate test, or could be built into # 2)

If this is confusing, let me know.

@daltonkell
Copy link
Contributor Author

Those are great -- I'm all for smaller, more compact checks. I'll even toss some pseudo-code in here so we can work through that if it looks strange

@daltonkell
Copy link
Contributor Author

daltonkell commented Apr 2, 2020

I think this pseudo-code summarizes the strategy of what we're looking for:


def check_global_platform:
    - get platform name
    - if it exists:
      - is it a compliant string?
        - if not, fail

def check_single_platform:
    - loop through each variable
      - get value of "platform" attribute if it exists
    - are there more than one platform?
      - if so, fail

def check_cf_dsg:
    - loop through all variables which contain a cf_role attribute
      - if featureType==timeSeries and cf_role==timeseries_id
        - if dimension > 1
          - display message
      - if featureType==timeSeriesProfile and cf_role==timeseries_id
        - if dimension > 1
          - fail, wtih message
      - if featureType==trajectory and cf_role==trajectory_id
        - if dimension > 1
          - fail, with message
      - if featureType==trajectoryProfile and cf_role==trajectory_id
        - if dimension > 1
          - fail, with message
      - if featureType==profile and cf_role==profile_id
        - if dimension > 1
          - fail, with message
      - if featureType==point
        - do nothing

Shout out any questions or inputs you may have

@mwengren
Copy link
Member

mwengren commented Apr 2, 2020

Looks good to me.

We could add to the check_single_platform criteria that the name detected for the platform variable attribute match the attribute with the corresponding cf_role value for each featureType, but that is compounding tests, not sure if that would overcomplicate too much. I'd say this part is optional, depending.

@daltonkell
Copy link
Contributor Author

daltonkell commented Apr 2, 2020

Testing results

TimeSeriesProfile: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_profile

Running Compliance Checker on the datasets from: ['http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_profile']


--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                    ioos:1.2                                    
      https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html      
--------------------------------------------------------------------------------
                               Corrective Actions                               
sun2wave_timeseries_profile has 40 potential issues


                               Highly Recommended                               
--------------------------------------------------------------------------------
CF Discrete Sampling Geometry Compliance
* Dimension length of the variable `s.station` with cf_role=`timeseries_id` (the 'station/trajectory/profile' dimension) must be equal to 1 (it is 2621).
 The IOOS profile restricts timeseriesprofile datasets to a single platform (ie. station/trajectory/profile) per dataset.
* Dimension length of the variable `s.time` with cf_role=`profile_id` (the 'station/trajectory/profile' dimension) must be equal to 1 (it is 2621). The I
OOS profile restricts timeseriesprofile datasets to a single platform (ie. station/trajectory/profile) per dataset.

TimeSeries: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                    ioos:1.2                                    
      https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html      
--------------------------------------------------------------------------------
                               Corrective Actions                               
sun2wave_timeseries_micah has 41 potential issues


                               Highly Recommended                               
--------------------------------------------------------------------------------
CF Discrete Sampling Geometry Compliance
* Dimension length of the variable with cf_role='timeseries_id (the 'station' dimension) is 2. Note that the IOOS profile restricts timeSeries datasets w
ith multiple features to share the same lat/lon position (ie. to exist on the same platform). Datasets that include multiple platforms are not valid and 
will cause harvesting errors.

http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                    ioos:1.2                                    
      https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html      
--------------------------------------------------------------------------------
                               Corrective Actions                               
edu_usf_marine_comps_1407d550 has 21 potential issues


                               Highly Recommended                               
--------------------------------------------------------------------------------
CF Discrete Sampling Geometry Compliance
* Dimension length of the variable with cf_role='timeseries_id (the 'station' dimension) is 249370. Note that the IOOS profile restricts timeSeries datas
ets with multiple features to share the same lat/lon position (ie. to exist on the same platform). Datasets that include multiple platforms are not valid
 and will cause harvesting errors.

TrajectoryProfile: https://gliders.ioos.us/erddap/tabledap/amelia-20180501T0000

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                    ioos:1.2                                    
      https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html      
--------------------------------------------------------------------------------
                               Corrective Actions                               
amelia-20180501T0000 has 30 potential issues


                               Highly Recommended                               
--------------------------------------------------------------------------------
CF Discrete Sampling Geometry Compliance
* Dimension length of the variable `s.trajectory` with cf_role=`trajectory_id` (the 'station/trajectory/profile' dimension) must be equal to 1 (it is 139
7). The IOOS profile restricts trajectoryprofile datasets to a single platform (ie. station/trajectory/profile) per dataset.
* Dimension length of the variable `s.profile_id` with cf_role=`profile_id` (the 'station/trajectory/profile' dimension) must be equal to 1 (it is 1397).
 The IOOS profile restricts trajectoryprofile datasets to a single platform (ie. station/trajectory/profile) per dataset.

@mwengren the above results seem to be in line with the .das of the data, but you had previously said they should pass. Were you referring to the .ncCF versions?

@mwengren
Copy link
Member

mwengren commented Apr 3, 2020

@daltonkell Looks like a good start. There are a few issues though.

  1. The test should vary depending on the specific featureType. These error messages look as if it's checking each of the cf_role attributed variables regardless of which featureType the dataset is. This doesn't work because the restriction we're imposing applies only to one of the two dimensions in the case of the compound types (timeSeriesProfile and trajectoryProfile). See exact rules and accompanying messages here: IOOS:1.2 platform check #748 (comment). So for timeSeriesProfile, we only want to restrict the cf_role=timeseries_id dimension, not the cf_role=profile_id dimension. This test includes error messages for both. Same with trajectoryProfile: only test the cf_role=trajectory_id dimension. The error message should mention the specific restriction pertaining to each featureType as I put in that comment as well. Reason for this is that in both of these cases the 'profile' dimension doesn't actually pertain to the platform itself (adcp or glider for example), it's either the timeseries or trajectory that dictates how many observing 'platforms' in the dataset.

  2. Yes, It needs to request one of the .ncCF or .ncCFMA types from ERDDAP in order to get the properly dimensioned netCDF files to test. The .nc format is the 'flat table-like' structure of ERDDAP and will always have expanded dimensions. See earlier comment: IOOS:1.2 platform check #748 (comment)

For the timeSeries tests you ran, this output for this one looks correct: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah - the 'timseries' dimension is 2 here: http://testing.erddap.axds.co/erddap/tabledap/sun2wave_timeseries_micah.ncCFHeader

But for this one: http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550, the timeseries dimension should actually be 1: http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550.ncCFHeader

The Glider DAC dataset shouldn't fail in the trajectory dimension either: https://gliders.ioos.us/erddap/tabledap/amelia-20180501T0000.ncCFHeader

@daltonkell
Copy link
Contributor Author

Your comments check out for edu_usf_marine_comps_1407d550.ncCF:

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                    ioos:1.2                                    
      https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html      
--------------------------------------------------------------------------------
                               Corrective Actions                               
edu_usf_marine_comps_1407d550.ncCF.nc has 20 potential issues


                               Highly Recommended                               
--------------------------------------------------------------------------------
CF Discrete Sampling Geometry Compliance
* Dimension length of the variable with cf_role='timeseries_id (the 'station' dimension) is 1. Note that the IOOS profile restricts timeSeries datasets with multiple features to share the same lat/lon position (ie. to exist on the same platform). Datasets that include multiple platforms are not valid and will cause harvesting errors.

and for the amelia glider, it's choking on profile_id variable:

    int profile_id(profile=1397);
      :_FillValue = -999; // int
      :actual_range = 1, 1397; // int
      :cf_role = "profile_id";
      :comment = "Sequential profile number within the trajectory.  This value is unique in each file that is part of a single trajectory/deployment.";
      :ioos_category = "Identifier";
      :long_name = "Profile ID";
      :valid_max = 2147483647; // int
      :valid_min = 1; // int

where it should only be checking the trajectory variable:

    char trajectory(trajectory=1, trajectory_strlen=20);
      :_ChunkSizes = 20; // int
      :_Encoding = "ISO-8859-1";
      :cf_role = "trajectory_id";
      :comment = "A trajectory is one deployment of a glider.";
      :ioos_category = "Identifier";
      :long_name = "Trajectory Name";

I'll get on this and re-test

@daltonkell
Copy link
Contributor Author

@mwengren

Testing results, round 2

edu_usf_marine_comps_1407d550.ncCF.nc: ✔️

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                    ioos:1.2                                    
      https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html      
--------------------------------------------------------------------------------
                               Corrective Actions                               
edu_usf_marine_comps_1407d550.ncCF.nc has 20 potential issues


                               Highly Recommended                               
--------------------------------------------------------------------------------
CF Discrete Sampling Geometry Compliance
* Dimension length of the variable with cf_role='timeseries_id (the 'station' dimension) is 1. Note that the IOOS profile restricts timeSeries datasets w
ith multiple features to share the same lat/lon position (ie. to exist on the same platform). Datasets that include multiple platforms are not valid and 
will cause harvesting errors.

amelia-20180501T0000.ncCF.nc: ✔️ (This dataset should have a global "platform" variable though)

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                    ioos:1.2                                    
      https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html      
--------------------------------------------------------------------------------
                               Corrective Actions                               
amelia-20180501T0000.ncCF.nc has 30 potential issues


                               Highly Recommended                               
--------------------------------------------------------------------------------
Conventions
* Conventions must contain the string "IOOS 1.2"

creator_country
* creator_country not present

creator_institution
* creator_institution not present

creator_sector
* creator_sector not present

platform
* If platform variables exist, a global attribute "platform" must also exist
* The global attribute "platform" must be a single string containing no blank characters; it is None

@benjwadams
Copy link
Contributor

@daltonkell: Protip, making checkboxes in Markdown rather than unicode/emoji should automatically create a task list with completed/out of in the GitHub issues. See https://help.github.com/en/github/managing-your-work-on-github/about-task-lists

@daltonkell
Copy link
Contributor Author

Oh, well that's cool!

@daltonkell
Copy link
Contributor Author

@mwengren how do the most recent test results look to you?

@mwengren
Copy link
Member

mwengren commented Apr 7, 2020

@daltonkell sorry, got swamped with other things recently.

On edu_usf_marine_comps_1407d550.ncCF.nc, I think since that dataset has 'station' dimension of 1, it shouldn't alert with the message (only if >1).

For amelia-20180501T0000.ncCF.nc, I think that one looks pretty good.

What are we checking exactly on the test that produces this message though?
If platform variables exist, a global attribute "platform" must also exist

Do we check that the variable-level platform attributes all point to the same named 'Platform Variable'? Eg. 'station' or 'platform' commonly? Looking back at my comments here: #748 (comment)

Other than that, this looks to be good to me.

@daltonkell
Copy link
Contributor Author

Not a problem, that's why I sent an extra ping your way.

What are we checking exactly on the test that produces this message though?
If platform variables exist, a global attribute "platform" must also exist

When the checker looks to see if the dataset contains a single platform (explicitly, not the DSG check), there are several combinations of results:

  1. The dataset has multiple platform variables and a global platform attribute (fail)
  2. The dataset has > 0 platform variables but not a global platform attribute (the message you are seeing above)
  3. The dataset has 0 platform variables and has the global platform attribute (fail)
  4. The dataset has 0 platform variables and no global platform attribute (assumed to be a gridded model, or otherwise data which does not need a platform) (pass)
  5. The dataset has one platform variable and a global platform attribute (pass)

I'll ensure that if the message for the timeseries only displays if > 1 -- I had somehow construed our above discussion to mean "always show the message"

@mwengren
Copy link
Member

mwengren commented Apr 7, 2020

I'll ensure that if the message for the timeseries only displays if > 1 -- I had somehow construed our above discussion to mean "always show the message"

Great! Otherwise it is a bit confusing, but it is relevant for any timeSeries dataset with 'station' dimension > 1.

Re: platform checks, those all look good to me. I guess what threw me off was the message If platform variables exist.... In my mind, it should be singular: If a platform variable exists.... What do you think?

@daltonkell
Copy link
Contributor Author

Hey @mwengren, I addressed your above comment in b8e5370, so we should be all set now. Looking to merge that and release a new RC ASAP.

@mwengren
Copy link
Member

mwengren commented Apr 14, 2020

@daltonkell I'm doing a few checks with latest master for the platform check.

I think one remaining issue to fix if possible is the format CC requests from ERDDAP for the ioos-1.2 test to one of the .ncCF of .ncCFMA types. More info: #748 (comment).

If I run the following on the http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550 dataset, I get errors about the CF DSG check dimensions:

$ compliance-checker --version
IOOS compliance checker version 4.3.3rc2+31.g7cd1a7f
$ compliance-checker -t ioos -f html http://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550
Test Message
CF Discrete Sampling Geometry Compliance Dimension length of the variable with cf_role='timeseries_id (the 'station' dimension) is 252200. Note that the IOOS profile restricts timeSeries datasets with multiple features to share the same lat/lon position (ie. to exist on the same platform). Datasets that include multiple platforms are not valid and will cause harvesting errors.

Ideally, we want data providers to run the ioos-1.2 test directly against ERDDAP datasets as in the command above, without first downloading them. Can this change be made without too much difficulty?

@mwengren
Copy link
Member

mwengren commented May 7, 2020

@daltonkell I've run a few tests today after #801 was merged against some of the sample ERDDAP datasets from #723, and so far it looks like the platform dimension check has been resolved mostly.

It seems to work correctly for the PacIOOS datasets:

Both show a list of 'accepted' GTS ingest variables such as:

The following variables qualified for GTS Ingest: wind_speed, gust_speed, wind_from_direction, shortwave_radiation, photosynthetic_radiation, sea_water_temperature, air_temperature, rainfall_amount

Which they didn't before - good news.

This PacIOOS dataset doesn't include the gts_ingest attribute, so message is produced (which is what we wanted, great!):

Issue:
The first two datasets do have a slight misconfiguration, perhaps, although it's not quite clear what it is from the output message:

Test Message
cf_role attribute cf_role in variable platform1 not present

The issue with the test I think is that they both have a platform1 'Platform Variable' that's referenced from each of the data variables' platform variable attributes, but the actual Platform Variable in the CF sense (the one with cf_role attribute) is called station_name. See this link for more info:

https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/AWS-HIMB.das

Is there a way we can make that message more informative about what the issue is and how it might be fixed? Hopefully the variable name references are available in the code, but something like 'Attribute cf_role in variable platform1 not present. The Platform Variable 'platform1' should match the variable containing the cf_role attribute according to CF/IOOS profile guidelines'?

@benjwadams benjwadams self-assigned this May 14, 2020
@benjwadams
Copy link
Contributor

The test uses a preset attribute check that

  1. checks if the attribute exists - if not report an error you
  2. check that the attribute is part of the values passed in.

I can combine the messages in the two steps.

@benjwadams
Copy link
Contributor

Closing on behalf of changes in #813

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug IOOS:1.2 Issues relating to the IOOS Metadata Profile v1.2
Projects
None yet
Development

No branches or pull requests

5 participants