-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable ncCF format requests to TableDAP #799
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
import io | ||
import urllib.request | ||
import compliance_checker.protocols.opendap as opendap | ||
|
||
def is_tabledap(url): | ||
""" | ||
Identify a dataset as an ERDDAP TableDAP dataset. | ||
|
||
Parameters | ||
---------- | ||
url (str) : URL to dataset | ||
|
||
Returns | ||
------- | ||
bool | ||
""" | ||
|
||
if "tabledap" in url: | ||
return True | ||
return False | ||
|
||
def get_tabledap_bytes(url, ftype): | ||
""" | ||
ERDDAP TableDAP returns an OPeNDAP "sequence" response by default | ||
when no file extensions are provided. If a user wishes to get a dataset | ||
from an ERDDAP TableDAP URL, append the desired file extension and return | ||
a byte buffer object containing the data. | ||
|
||
Parameters | ||
---------- | ||
url (str) : URL to TableDAP dataset | ||
ftype (str) : file format extension | ||
|
||
Return | ||
------ | ||
io.BytesIO buffer object | ||
""" | ||
|
||
vstr = opendap.create_DAP_variable_str(url) # variable str from DDS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor comment: you called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guilty as charged... I might be getting influenced by too many scientists 😆 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait! I'm a scientists!! Am I doing my job wrong ;-p There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you're cut from a different cloth felipe! |
||
_url = f'{".".join([url, ftype])}?{vstr}' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We want to maintain backwards compatibility with Python 3 versions less than 3.6. For this reason, |
||
with urllib.request.urlopen(_url) as resp: | ||
return io.BytesIO(resp.read()) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,8 +4,45 @@ | |
|
||
Functions to assist in determining if the URL is an OPeNDAP endpoint | ||
''' | ||
import io | ||
import requests | ||
import urllib.parse | ||
import urllib.request | ||
|
||
def create_DAP_variable_str(url): | ||
""" | ||
Create a URL-encoded string of variables for a given DAP dataset. | ||
Works on OPeNDAP datasets. | ||
|
||
Parameters | ||
---------- | ||
url (str): endpoint to *DAP dataset | ||
|
||
Returns | ||
------- | ||
str | ||
""" | ||
|
||
# get dds | ||
with urllib.request.urlopen(f"{url}.dds") as resp: | ||
strb = io.StringIO(resp.read().decode()) | ||
|
||
strb.seek(8) # remove "Dataset " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider using a context manager/ |
||
x = strb.read() | ||
strb.close() | ||
|
||
# remove beginning and ending braces, split on newlines | ||
lst = list(filter(lambda x: "{" not in x and "}" not in x, x.split("\n"))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Try to express this as a nested loop rather than reassigning to the same variable. If there are issues with the parsing, it makes it much easier to track down the issues because the variable reference isn't changed. |
||
|
||
# remove all the extra space used in the DDS string | ||
lst = list(filter(None, map(lambda x: x.strip(" "), lst))) | ||
|
||
# now need to split from type, grab only the variable and remove ; | ||
lst = list(map(lambda x: x.split(" ")[-1].strip(";"), lst)) | ||
|
||
# encode as proper URL characters | ||
varstr = urllib.parse.quote(",".join(lst)) | ||
return varstr | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if there is an ERDDAP trick to request the JSON with out any data, they this could be simply a request to get |
||
|
||
def is_opendap(url): | ||
''' | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,7 @@ | |
from compliance_checker.cf.cf import CFBaseCheck | ||
from owslib.sos import SensorObservationService | ||
from owslib.swe.sensor.sml import SensorML | ||
from compliance_checker.protocols import opendap, netcdf, cdl | ||
from compliance_checker.protocols import opendap, netcdf, cdl, erddap | ||
from compliance_checker.base import BaseCheck | ||
from compliance_checker import MemoizedDataset | ||
from collections import defaultdict | ||
|
@@ -723,7 +723,13 @@ def load_remote_dataset(self, ds_str): | |
:param str ds_str: URL to the remote resource | ||
''' | ||
|
||
if opendap.is_opendap(ds_str): | ||
if erddap.is_tabledap(ds_str): | ||
return Dataset( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is slower than downloading the file and running the checks locally as you say, could something like a temporary file be an alternative faster solution (I did not run any tests)? Downloading the file locally and removing it when done could be something like: @contextmanager
def tempnc(data: BinaryIO) -> Generator[str, None, None]:
from tempfile import NamedTemporaryFile
tmp = None
try:
tmp = NamedTemporaryFile(suffix=".nc", prefix="compliance-checker_")
tmp.write(data)
tmp.flush()
yield tmp.name
finally:
if tmp is not None:
tmp.close() There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need some way to limit the amount of data requested? The goal of the check is really to check the dataset structure, not so much the contents of the whole dataset. I'm not really sure how this part of CC works, but is there a row limit or something that can be included in the ERDDAP request, if we're not doing that already? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a few tests that requires the data/coordinates but a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's true, need to check the data for some tests. Not sure about this one though, mostly attribute-only, with some dimension checks like this one. Possibly forgetting some things tho. There's the This is what Bob pointed me to as most useful for checking dataset metadata. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the OPeNDAP protocol, dimension information is typically be found in the
However, a request to an ERDDAP source doesn't bring back the same structure:
I'll dig around in the documentation to see if that's a possibility. For the time being, the logic in this commit produces the desired results of requesting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, yeah. Duh, I've been staring at those forever, should have thought of that. For the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ocefpaf: there's also the option to read a netCDF file into memory. It should probably be preferred over writing to a tempfile when possible and when the netCDF-C library has been compiled with in-memory support. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. But if we are aiming for a broader audience we should not rely on an option that may not be available, right? PS: I believe that the conda-forge pkg is compiled with in-memory support but I'm not 100% sure. I'll check and fix it if not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @daltonkell where do we stand with merging this PR and resolving this dimension check issue? Interested to include this in the next RC, if possible. Can we implement the dimension check, and maybe most or all of the IOOS checks, by requesting the That may be too big a change though. If so, let's go with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think including this in the next RC is appropriate, but we'll have to leave the I'm currently working on merging concepts from this PR and Ben's latest, #800, because his implements a useful abstraction for handling any remote netCDF resource. |
||
ds_str, | ||
mode="r", | ||
memory=erddap.get_tabledap_bytes(ds_str, "ncCF").getbuffer() | ||
) | ||
elif opendap.is_opendap(ds_str): | ||
benjwadams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return Dataset(ds_str) | ||
else: | ||
# Check if the HTTP response is XML, if it is, it's likely SOS so | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using url.lower() here just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use
return "tabledap" in url.lower()
to simplify the return since it's returning boolean type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also can't necessarily guarantee "tabledap" will be in the URL, e.g. Apache/Nginx setups. Usually it will though.