Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findNLDI returns error when API return empty json #623

Closed
mps9506 opened this issue Jul 6, 2022 · 23 comments
Closed

findNLDI returns error when API return empty json #623

mps9506 opened this issue Jul 6, 2022 · 23 comments
Labels

Comments

@mps9506
Copy link

mps9506 commented Jul 6, 2022

Hi,

This isn't fully a bug.

Currently findNLDI returns a somewhat cryptic message and errors when making a query that returns nothing (valid query but nothing is found).

For example I know the following will return nothing:
https://labs.waterdata.usgs.gov/api/nldi/linked-data/wqp/TCEQMAIN-10016/navigation/UM/nwissite?f=json&distance=2

{"type":"FeatureCollection"}

Here is what findNLDI returns when it encounters the same thing:

findNLDI(wqp = "TCEQMAIN-10016",
    nav = "UM",
    find = "nwissite",
    distance_km = 2,
    no_sf = FALSE)

Error: Cannot open "{"type":"FeatureCollection"}"; Check connection parameters.
In addition: Warning messages:
1: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Error 1: Invalid FeatureCollection object. Missing 'features' member.
2: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Error 1: Layer schema generation failed.
3: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Error 4: Failed to read GeoJSON data

I think for a user expecting a response, the current error and warning messages are hard to troubleshoot. The previous iteration of navigate_nldi in nhdPlusTools returned an empty tibble in a similar scenario: DOI-USGS/nhdplusTools#97. Having a message indicating that no features were returned and returning a NULL or empty object/list might be beneficial for users.

As always, thanks for this awesome package!!! Happy to make a pull request if there is a way you prefer this handled.

@mps9506 mps9506 added the bug label Jul 6, 2022
@ksonda
Copy link

ksonda commented Jul 8, 2022

Possibly related, or at least could be resolved in the same place in the code internetofwater/nldi-services#315

@ldecicco-USGS
Copy link
Collaborator

@dblodgett-usgs or @mikejohnson51 can one of you take a look?

mikejohnson51 pushed a commit to mikejohnson51/dataRetrieval that referenced this issue Jul 8, 2022
@mikejohnson51
Copy link

@ldecicco-USGS, @mps9506

Here is a working option that I can send as a PR:

library(dataRetrieval)
findNLDI(wqp = "TCEQMAIN-10016",
              nav = "UM",
              find = "nwissite",
              distance_km = 2,
              no_sf = FALSE)

#> Simple feature collection with 1 feature and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -101.3427 ymin: 35.7422 xmax: -101.3427 ymax: 35.7422
#> Geodetic CRS:  WGS 84
#> # A tibble: 1 × 7
#>   sourceName        identifier comid name      X     Y            geometry
#>   <chr>             <chr>      <chr> <chr> <dbl> <dbl>         <POINT [°]>
#> 1 Water Quality Po… TCEQMAIN-… 1997… DIXO… -101.  35.7 (-101.3427 35.7422)

Created on 2022-07-08 by the reprex package (v2.0.1)

The "design" choice I used in this NLDI client was that if data didn't exist it would not be returned. So in the above case, since no upstream gages were found in 2km, there is no nwis data. This is different than the nhdplusTools design in which an empty tibble is returned of all requested data. For my use cases, the existing option is ideal, but I can see many cases where it could raise confusion. I am not wed to one or the other and changing the output to an empty tibble would be easy, as would eliciting a message that no data was found.

What do you think?

@mps9506
Copy link
Author

mps9506 commented Jul 8, 2022

@mikejohnson51
Thanks! I think this approach plus a message makes sense.

@mikejohnson51
Copy link

mikejohnson51 commented Jul 8, 2022

Awesome - now we're looking at something like this:

library(dataRetrieval)
xx = findNLDI(wqp = "TCEQMAIN-10016",
              nav = "UM",
              find = "nwissite",
              distance_km = 2,
              no_sf = FALSE)
#> No data found for: nwissite?distance=2

xx
#> Simple feature collection with 1 feature and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -101.3427 ymin: 35.7422 xmax: -101.3427 ymax: 35.7422
#> Geodetic CRS:  WGS 84
#> # A tibble: 1 × 7
#>   sourceName        identifier comid name      X     Y            geometry
#>   <chr>             <chr>      <chr> <chr> <dbl> <dbl>         <POINT [°]>
#> 1 Water Quality Po… TCEQMAIN-… 1997… DIXO… -101.  35.7 (-101.3427 35.7422)

Created on 2022-07-08 by the reprex package (v2.0.1)

mikejohnson51 pushed a commit to mikejohnson51/dataRetrieval that referenced this issue Jul 8, 2022
@EthanGrahn
Copy link

The response of {"type":"FeatureCollection"} is an issue with how the API is assembling the response. I would expect it to return something like this for an empty result.

{
    "type": "FeatureCollection",
    "features": []
}

I would expect findNLDI to handle it normally from there. I will create an issue in the NLDI repository to correct this.

@ldecicco-USGS
Copy link
Collaborator

@mps9506 and @EthanGrahn , do you all want to try installing like this:

remotes::install_github("USGS-R/dataRetrieval")

and let us know if you think we can close this issue?

@EthanGrahn
Copy link

I've updated the NLDI to output my previous example for empty results. This seems to introduce a different set of errors. It may be an issue with my tests because I don't typically work with R. Can anybody else confirm this? From what I can tell, findNLDI checks for an empty response (i.e. an empty string) to determine whether no features were found. It should, instead, be accounting for empty GeoJSON.

@mps9506
Copy link
Author

mps9506 commented Jul 11, 2022

I updated to the development version and received the following error:

> findNLDI(wqp = "TCEQMAIN-10016",
+          nav = "UM",
+          find = "nwissite",
+          distance_km = 2,
+          no_sf = FALSE)
Error in if (sf::st_geometry_type(tmp)[1] == "POINT") { : 
  missing value where TRUE/FALSE needed

As @EthanGrahn notes, I believe this is because read_sf() now returns an empty sf object:

> df <- readLines("https://labs.waterdata.usgs.gov/api/nldi/linked-data/wqp/TCEQMAIN-10016/navigation/UM/nwissite?f=json&distance=2")
> sf::read_sf(df)
Simple feature collection with 0 features and 0 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
# A tibble: 0 × 1
# … with 1 variable: geometry <GEOMETRY [°]>

> sf::read_sf(df) |> sf::st_coordinates()
     [,1] [,2]

sessionInfo:

> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dataRetrieval_2.7.11.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3       compiler_4.2.1     pillar_1.7.0       class_7.3-20       remotes_2.4.2     
 [6] tools_4.2.1        jsonlite_1.8.0     lifecycle_1.0.1    tibble_3.1.7       pkgconfig_2.0.3   
[11] rlang_1.0.3        DBI_1.1.3          cli_3.3.0          rstudioapi_0.13    curl_4.3.2        
[16] e1071_1.7-11       dplyr_1.0.9        httr_1.4.3         xml2_1.3.3         generics_0.1.3    
[21] vctrs_0.4.1        classInt_0.4-7     grid_4.2.1         tidyselect_1.1.2   glue_1.6.2        
[26] sf_1.0-7           R6_2.5.1           fansi_1.0.3        purrr_0.3.4        magrittr_2.0.3    
[31] ellipsis_0.3.2     units_0.8-0        assertthat_0.2.1   KernSmooth_2.23-20 utf8_1.2.2        

@EthanGrahn
Copy link

EthanGrahn commented Jul 11, 2022

This raises another question on my end. Would it be better for the end user to receive this empty collection, or be returned a 404 instead?

As a comparison, searching for an invalid source or ID will return a 404 with an error message. https://labs.waterdata.usgs.gov/api/nldi/linked-data/wqp/fake-id?f=json

@mikejohnson51
Copy link

@mps9506 I'll need to dig into why you get that error, on my end I get the desired output and dont see any glaring differences in the package versions:

library(dataRetrieval)

findNLDI(wqp = "TCEQMAIN-10016",
                     nav = "UM",
                     find = "nwissite",
                     distance_km = 2,
                     no_sf = FALSE)
#> No data found for: nwissite?f=json&distance=2
#> Simple feature collection with 1 feature and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -101.3427 ymin: 35.7422 xmax: -101.3427 ymax: 35.7422
#> Geodetic CRS:  WGS 84
#> # A tibble: 1 × 7
#>   sourceName        identifier comid name      X     Y            geometry
#>   <chr>             <chr>      <chr> <chr> <dbl> <dbl>         <POINT [°]>
#> 1 Water Quality Po… TCEQMAIN-… 1997… DIXO… -101.  35.7 (-101.3427 35.7422)


sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dataRetrieval_2.7.11.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3       pillar_1.7.0       compiler_4.1.0     highr_0.9         
#>  [5] class_7.3-20       R.methodsS3_1.8.2  R.utils_2.12.0     tools_4.1.0       
#>  [9] digest_0.6.29      jsonlite_1.8.0     evaluate_0.15      lifecycle_1.0.1   
#> [13] tibble_3.1.7       R.cache_0.15.0     pkgconfig_2.0.3    rlang_1.0.3       
#> [17] reprex_2.0.1       DBI_1.1.3          cli_3.3.0          rstudioapi_0.13   
#> [21] curl_4.3.2         yaml_2.3.5         xfun_0.31          fastmap_1.1.0     
#> [25] e1071_1.7-11       dplyr_1.0.9        withr_2.5.0        styler_1.7.0      
#> [29] stringr_1.4.0      httr_1.4.3         knitr_1.39         xml2_1.3.3        
#> [33] generics_0.1.3     fs_1.5.2           vctrs_0.4.1        tidyselect_1.1.2  
#> [37] grid_4.1.0         classInt_0.4-7     glue_1.6.2         sf_1.0-7          
#> [41] R6_2.5.1           fansi_1.0.3        rmarkdown_2.14     purrr_0.3.4       
#> [45] magrittr_2.0.3     units_0.8-0        ellipsis_0.3.2     htmltools_0.5.2   
#> [49] assertthat_0.2.1   KernSmooth_2.23-20 utf8_1.2.2         proxy_0.4-27      
#> [53] stringi_1.7.6      crayon_1.5.1       R.oo_1.25.0

Created on 2022-07-11 by the reprex package (v2.0.1)

@ldecicco-USGS
Copy link
Collaborator

I was just testing this and I think I can reproduce both results. Initially I didn't have sf installed on my computer (I recently updated to R 4.2). Without sf, I get @mikejohnson51 's output:

> findNLDI(wqp = "TCEQMAIN-10016",
+          nav = "UM",
+          find = "nwissite",
+          distance_km = 2,
+          no_sf = FALSE)
            sourceName     identifier    comid                     name
1 Water Quality Portal TCEQMAIN-10016 19970652 DIXON CREEK NE OF BORGER
          X       Y
1 -101.3427 35.7422

When I installed sf, ran the same thing:

> xx = findNLDI(wqp = "TCEQMAIN-10016",
+               nav = "UM",
+               find = "nwissite",
+               distance_km = 2,
+               no_sf = FALSE)
 Error in if (sf::st_geometry_type(tmp)[1] == "POINT") { : 
missing value where TRUE/FALSE needed

@mikejohnson51
Copy link

mikejohnson51 commented Jul 13, 2022

I have this on todays todo list. I have sf installed though so I think I'm missing something else 😭

Here is a reprex using when sf is installed locally:

library(dataRetrieval)

findNLDI(wqp = "TCEQMAIN-10016",
                     nav = "UM",
                     find = "nwissite",
                     distance_km = 2,
                     no_sf = TRUE)
#>             sourceName     identifier    comid                     name
#> 1 Water Quality Portal TCEQMAIN-10016 19970652 DIXON CREEK NE OF BORGER
#>           X       Y
#> 1 -101.3427 35.7422

findNLDI(wqp = "TCEQMAIN-10016",
         nav = "UM",
         find = "nwissite",
         distance_km = 2,
         no_sf = FALSE)
#> No data found for: nwissite?f=json&distance=2
#> Simple feature collection with 1 feature and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -101.3427 ymin: 35.7422 xmax: -101.3427 ymax: 35.7422
#> Geodetic CRS:  WGS 84
#> # A tibble: 1 × 7
#>   sourceName        identifier comid name      X     Y            geometry
#>   <chr>             <chr>      <chr> <chr> <dbl> <dbl>         <POINT [°]>
#> 1 Water Quality Po… TCEQMAIN-… 1997… DIXO… -101.  35.7 (-101.3427 35.7422)

Created on 2022-07-13 by the reprex package (v2.0.1)

Removing sf, running TRUE/FALSE:

library(dataRetrieval)
remove.packages('sf')
#> Removing package from '/Library/Frameworks/R.framework/Versions/4.2/Resources/library'
#> (as 'lib' is unspecified)
#> Error in find.package(pkgs, lib): there is no package called 'sf'
findNLDI(wqp = "TCEQMAIN-10016",
         nav = "UM",
              find = "nwissite",
              distance_km = 2,
              no_sf = TRUE)
#>             sourceName     identifier    comid                     name
#> 1 Water Quality Portal TCEQMAIN-10016 19970652 DIXON CREEK NE OF BORGER
#>           X       Y
#> 1 -101.3427 35.7422

findNLDI(wqp = "TCEQMAIN-10016",
         nav = "UM",
         find = "nwissite",
         distance_km = 2,
         no_sf = FALSE)
#>             sourceName     identifier    comid                     name
#> 1 Water Quality Portal TCEQMAIN-10016 19970652 DIXON CREEK NE OF BORGER
#>           X       Y
#> 1 -101.3427 35.7422

Created on 2022-07-13 by the reprex package (v2.0.1)

Source of message:

https://github.com/USGS-R/dataRetrieval/blob/7182605f9f45574b768cd21bca376f5678688da6/R/findNLDI.R#L118

@ldecicco-USGS
Copy link
Collaborator

@mps9506 and @EthanGrahn can you try installing again and see if it looks better?

remotes::install_github("USGS-R/dataRetrieval")

@mps9506
Copy link
Author

mps9506 commented Jul 14, 2022

I think this works well. I hesitate to ask since I'm not doing any of the work here. But should findNLDI always return a list or is a sf/tbl ok?

Valid result:

findNLDI(wqp = "TCEQMAIN-10032",
         nav = "UM",
         find = "nwissite",
         distance_km = 2,
         no_sf = TRUE) |> 
  str()

# List of 2
#  $ origin     :'data.frame':	1 obs. of  6 variables:
#   ..$ sourceName: chr "Water Quality Portal"
#   ..$ identifier: chr "TCEQMAIN-10032"
#   ..$ comid     : chr "19972126"
#   ..$ name      : chr "CANADIAN RIVER AT US 60-83"
#   ..$ X         : num -100
#   ..$ Y         : num 35.9
#  $ UM_nwissite:'data.frame':	1 obs. of  8 variables:
#   ..$ sourceName: chr "NWIS Surface Water Sites"
#   ..$ identifier: chr "USGS-07228000"
#   ..$ comid     : chr "19972126"
#   ..$ measure   : num 79.5
#   ..$ reachcode : chr "11090106000027"
#   ..$ name      : chr "Canadian Rv nr Canadian, TX"
#   ..$ X         : num -100
#   ..$ Y         : num 35.9

compared to:

findNLDI(wqp = "TCEQMAIN-10016",
         nav = "UM",
         find = "nwissite",
         distance_km = 2,
         no_sf = FALSE) |> 
  str()

# sf [1 × 7] (S3: sf/tbl_df/tbl/data.frame)
#  $ sourceName: chr "Water Quality Portal"
#  $ identifier: chr "TCEQMAIN-10016"
#  $ comid     : chr "19970652"
#  $ name      : chr "DIXON CREEK NE OF BORGER"
#  $ X         : num -101
#  $ Y         : num 35.7
#  $ geometry  :sfc_POINT of length 1; first list element:  'XY' num [1:2] -101.3 35.7
#  - attr(*, "sf_column")= chr "geometry"
#  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
#   ..- attr(*, "names")= chr [1:6] "sourceName" "identifier" "comid" "name" ...
# Warning message:
# No data returned for: https://labs.waterdata.usgs.gov/api/nldi/linked-data/wqp/TCEQMAIN-10016/navigation/UM/nwissite?f=json&distance=2 

@dblodgett-usgs
Copy link
Contributor

I think it probably should -- I've only ever worked with no_sf TRUE or FALSE in isolation though so never really thought about the difference. Good catch. What do you think @ldecicco-USGS ?

@mikejohnson51
Copy link

This issue here is that in findNLDI single element returns are given as an object while multi element returns are given as a list. The default find option is flowline. Flowlines (and any non-point objects) can only be parsed when sf is active. So when sf is off, the retuned object is just the origin (single element). When its on you get the origin and upper mainstem (multi element).

@dblodgett-usgs
Copy link
Contributor

Right -- but why not return a list with one element when you just have that origin to return?

@mikejohnson51
Copy link

🤷 personal preference haha. I don't like the extra structure when its not needed. I'm happy to change it though if there consensus it makes more sense.

@ldecicco-USGS
Copy link
Collaborator

Sorry I didn't really voice an opinion on this one. I think @dblodgett-usgs 's suggestion of the list of one makes sense because then scripts won't break if you have one or many returns. @mikejohnson51 does that sound OK? (and more importantly, can you make that change?)

@ldecicco-USGS
Copy link
Collaborator

@dblodgett-usgs @mikejohnson51 I'm hoping to do an update to dataRetrieval in the next couple of weeks. Should we switch this to a list of 1 element so that the user doesn't can assume a consistent type of return no matter what the size?

@dblodgett-usgs
Copy link
Contributor

I think that is the right approach, yeah.

@mikejohnson51
Copy link

PR incoming 😃

library(dataRetrieval)

findNLDI(wqp = "TCEQMAIN-10016",
         nav = "UM",
         find = "nwissite",
         distance_km = 20,
         no_sf = FALSE, 
         warn = FALSE)
#> $origin
#> Simple feature collection with 1 feature and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -101.3427 ymin: 35.7422 xmax: -101.3427 ymax: 35.7422
#> Geodetic CRS:  WGS 84
#> # A tibble: 1 × 7
#>   sourceName           ident…¹ comid name      X     Y            geometry
#>   <chr>                <chr>   <chr> <chr> <dbl> <dbl>         <POINT [°]>
#> 1 Water Quality Portal TCEQMA… 1997… DIXO… -101.  35.7 (-101.3427 35.7422)
#> # … with abbreviated variable name ¹​identifier
#> 
#> $UM_nwissite
#> Simple feature collection with 1 feature and 8 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -101.351 ymin: 35.66476 xmax: -101.351 ymax: 35.66476
#> Geodetic CRS:  WGS 84
#> # A tibble: 1 × 9
#>   sourceName               identifier    comid measure reach…¹ name      X     Y
#>   <chr>                    <chr>         <chr>   <dbl> <chr>   <chr> <dbl> <dbl>
#> 1 NWIS Surface Water Sites USGS-07227920 1997…    56.0 110901… Dixo… -101.  35.7
#> # … with 1 more variable: geometry <POINT [°]>, and abbreviated variable name
#> #   ¹​reachcode


findNLDI(wqp = "TCEQMAIN-10016",
         nav = "UM",
         find = "nwissite",
         distance_km = 2,
         no_sf = TRUE,
         warn = FALSE)
#> $origin
#>             sourceName     identifier    comid                     name
#> 1 Water Quality Portal TCEQMAIN-10016 19970652 DIXON CREEK NE OF BORGER
#>           X       Y
#> 1 -101.3427 35.7422

Created on 2022-11-01 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants