Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terrainr: Retrieve Data from the USGS National Map and Transform it for 3D Landscape Visualizations #416

Closed
17 of 27 tasks
mikemahoney218 opened this issue Dec 11, 2020 · 27 comments

Comments

@mikemahoney218
Copy link
Member

mikemahoney218 commented Dec 11, 2020

Submitting Author: Michael Mahoney (@mikemahoney218)
Repository: https://github.com/mikemahoney218/terrainr
Version submitted: 0.2.1
Editor: @ldecicco-USGS
Reviewer 1: @mikejohnson51
Reviewer 2: @sfoks
Archive: TBD
Version accepted: Feb. 19, 2021


  • Paste the full DESCRIPTION file inside a code block below:
Package: terrainr
Type: Package
Title: Retrieve Data from the 'USGS' National Map and Transform it for '3D' Landscape Visualizations
Version: 0.2.0
Authors@R: 
    person(given = "Michael",
           family = "Mahoney",
           role = c("aut", "cre"),
           email = "[email protected]",
           comment = c(ORCID = "0000-0003-2402-304X"))
Description: Functions for the retrieval and manipulation of 'USGS' National Map 
    data ('<https://viewer.nationalmap.gov/services/>'), enabling users to 
    download elevation and image data for areas of interest. Functions are also 
    provided to transform these data sources into formats that can be used to 
    create 3D elevation models in the Unity rendering engine.
URL: https://mikemahoney218.github.io/terrainr/, https://github.com/mikemahoney218/terrainr
BugReports: https://github.com/mikemahoney218/terrainr/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports: 
    base64enc,
    httr,
    raster,
    magick (>= 2.5.0),
    gdalUtils,
    methods,
    png,
    utils,
    gdalUtilities,
    sf,
    rlang
RoxygenNote: 7.1.1
Suggests: 
    testthat,
    covr,
    progressr,
    knitr,
    rmarkdown,
    progress,
    ggplot2,
    jpeg,
    tiff
Config/testthat/parallel: true
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

terrainr enables users to retrieve geospatial data from the USGS National Map family of APIs and process it into both generic rasters for use in analysis and a specialized format which may be imported into the Unity rendering engine.

  • Who is the target audience and what are scientific applications of this package?

The data retrieval functions within terrainr are designed to be useful to any researcher interested in geospatial data for areas within the United States, particularly digital elevation models and orthoimagery. The data processing aspect of terrainr is designed to enable more users to use game engines as GIS, producing high-resolution 3D models from spatial data.

I am only aware of two packages which mention the USGS National Map API:

  • quickmapr provides the ability to download a topographic basemap for use in plotting. The URL used does not overlap with the APIs included in this package, and I believe is not actually part of the National Map at all (as images from the National Map are released into the public domain, while the referenced URL indicates that downloaded data is copyright the National Geographic Society). This package was additionally last updated in 2018, and an entry in NEWS.md indicates the overlapping functionality may have been destined for deprecation.
  • A prior package of mine, spacey, provides access to the 3DEP Digital Elevation Model, one of the APIs included in this package. Future development of spacey will move to using terrainr for data retrieval.

The elevatr package also returns elevation data from the USGS Elevation Point Query Service and AWS Terrain Tiles, which are similar -- but not the same -- sources as used in this package.

This package provides access to 9 API endpoints which I do not believe are incorporated in any other R package. Transforming tiles to be imported into Unity as heightmaps depends on a newly-contributed feature to the magick package, which leads me to believe this package is the first to provide this functionality.

N/A

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

N/A

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@melvidoni
Copy link
Contributor

Hello @mikemahoney218, thanks for submitting your package. @ldecicco-USGS will be your Handling Editor. You will hear from her soon.

@ldecicco-USGS
Copy link

Great news, we have our first reviewer @mikejohnson51. I may or may not try to find another Mike be reviewer 2 😬

@ldecicco-USGS
Copy link

I had a few people respond that if I didn't find someone by the new year, they'd volunteer, so I expect a 2nd reviewer any day. In the meantime, I ran a check and goodpractice::gp(). The check was fine.

The gp() left this message:

Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
── GP terrainr ──────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all
    package code in general. 43% of code lines are covered by
    test cases.

    R/combine_overlays.R:119:NA
    R/georeference_overlay.R:76:NA
    R/get_bbox.R:48:NA
    R/get_bbox.R:49:NA
    R/get_tiles.R:133:NA
    ... and 458 more lines

  ✖ write short and simple functions. These
    functions have high cyclomatic complexity:get_tiles (55).
─────────────────────────────────────────────────────────────────────

Which is not bad! If you want to work on the code coverage, that's always good.

@mikemahoney218
Copy link
Member Author

Fantastic! Thank you so much for the update :)

One thing to mention -- a lot of the package tests are set to skip on CRAN, as they either take too long, require internet access, or require GDAL. They should still be able to run locally, however (and on CI) with Sys.setenv(NOT_CRAN='true') -- when I do so, I get the following output:

> Sys.setenv(NOT_CRAN='true'); goodpractice::gp()

Preparing: covr
Preparing: cyclocomp
<snip r cmd check>
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
── GP terrainr ──────────────────────────────────────────────────────
It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 96% of code lines are covered by test cases.

    R/combine_overlays.R:119:NA
    R/get_bbox.R:48:NA
    R/get_bbox.R:49:NA
    R/hit_api.R:160:NA
    R/hit_api.R:161:NA
    ... and 26 more lines

  ✖ write short and simple functions. These functions have high cyclomatic complexity:get_tiles (55).
─────────────────────────────────────────────────────────────────────

Thanks again for the update! 😄

@ldecicco-USGS
Copy link

Ah right! Thanks for the reminder, I just got a new computer and my old computer had NOT_CRAN=true set by default.

@mikemahoney218
Copy link
Member Author

Hi all! Just wanted to note here that I've changed the main branch from master to main in line with GitHub's guidance. There shouldn't be any code changes (well, a single call got restyled) or anything to interrupt review. Sorry if there are any issues associated with the change!

@mikemahoney218
Copy link
Member Author

mikemahoney218 commented Jan 18, 2021

I believe I've spoken too soon! It appears that the transportation API has moved servers, so version 0.2.0 no longer works for that endpoint. I've pushed a fix to the development branch (here's the relevant diff) -- it was only a two-line change to handle transportation in the same way as DEMs and orthoimagery.

I apologize for needing to make this update while the package is under review! The version originally submitted still exists as release v 0.2.0. Does it make sense to update the version under review to the version with this fix (would become version 0.2.1), or should I wait and roll this update in with any revisions requested after review?

@ldecicco-USGS
Copy link

I'd vote to update the version under review to 0.2.1 with the fix. If that causes any confusion, I'm sure we can deal with it.

@mikemahoney218
Copy link
Member Author

mikemahoney218 commented Jan 18, 2021

Great! I've cut a release of 0.2.1 and split it into a release branch, both of which should hopefully remain even with main until reviewer comments come in. All the changes from 0.2.0 should be documented on the release page and in NEWS.

Thanks!

@ldecicco-USGS
Copy link

We've got a second reviewer @sfoks 🎉

@mikejohnson51
Copy link

Hi @mikemahoney218 @ldecicco-USGS ,

Please find my review of terrainr attached:

Overall this is a very nice package. I want to emphasize this as, by nature, the following comments are more critique then complimentary. The package functions well, is stable and installed without problems.

I did not get to test any of the Unity functionality since I don't have it. Other then that here are some key points:

Key Points:

  • Really nice website and great documentation.

  • All functions and examples worked (except one noted below)

  • I think there is some nice utility here but there is also some cross over with others packages like FedData, elevatr.

  • Following this, other then elevation, this is not a "spatial data" retrieval package as it returns PNGs that are more similar to map tiles then to spatial data. Thus the functionality is also similar to rosm.

  • I cannot think of many cases for wanting the PNG files in R. That certainly does not mean they are not out there but I think more clarity about what the package is returning (maptiles) and there use would help it gain traction.

  • The major issue I have with the package as it stands is the package specific classes for common spatial structures like coordinates and bounding boxes. They add more confusion and complexity to things that are very well served by the sf package (which is already a dependency). Utilizing the sf utilities would not only allow for more flexible user input, but would drastically reduce the code in this package making it much slimmer and easier to maintain.

  • Aside from the "new" spatial classes, there is no need to reinvent the wheel for bbox buffering. Instead focus on the input/output coordinate reference systems and use st_buffer. Same thing for finding a centroid (st_centroid) and working between units (units::set_units())

  • I highly recommend using the sf utilities for GDAL rather then bringing in other dependencies like gdalUtilities and gdalUtils. Some examples are included here for reducing the clever, but not needed work arounds offered

  • Use httr::RETRY to handle much of your error catching and httr::write_disk() to avoid having to read images in as raw bytes/bits. This will save a large amount of code as well.

  • With all of these I think you can drop: base64enc, gdalUtils, methods, gdalUtilities, gdalUtils and rlang from your dependencies and many of your function can be slimmed down and/or removed. The result (in my option) will be a more lightweight package with higher utility, and more easy integration in spatial workflows.

Branding

  • It's a weird point but one that I think is important. terrainr as a name sells your package short while also being misleading. Terrain is one of 9 endpoints offered but is also anomalous in that it is the only "spatial" data available in your package (in the traditional vector/raster sense).

  • Arguably terrain is also the best served of the resources in the R ecosystem with elevatr and FedData. Elevatr on the surface is a better choice for getting elevation data simply because it offers multiple resolutions and the Mapzen server is quicker then the ArcGIS server.

  • My expectation of the package coming in given the description of "retrieve geospatial data" was that for resources like the NHD, NHDHighRes, WBD, and transportation, I would be getting the vector data. Particularly since these are offered by the National Map. Therefore, it is very important to be upfront that the package is offering endpoints to (effectively) mosaicked web tiles (with one spatial data source).

  • All of this is not to discount the utility of your package because I think it is one of the few to offer access to these resources which makes it very unique! But the functionality needs to be explicit because, as is, someone coming in with the expectation that they will be able to query spatial data will be disappointed, while those looking for map tile data will likely pass your package by.

  • One huge opportunity for your package, that I think you already have worked out, is how to integrated these resources with ggplot as a base-map. It is a constant frustration of mine (and many I know) to get nice base maps for ggplot.

  • Yes, tmap has some utilities for this and ggmap is out there (I have always struggled with it), there is room here for growth. If you could add a geom_terrainr() that played nicely with ggplot, I think that would be immediately useful to a many people.

Code Review:

Below are my notes as I went through the files and package:

Install

devtools::install_github("mikemahoney218/terrainr")
library(terrainr)

Great! 👍

add_bbox_buffer.R

There are a lot of functions in this package that recreate sf functionality. These could be simplified making the code much easier to maintain. While the implementation of these methods is obviously well considered, it is fragile compared to the GDAL/GEOS/PROJ backend supporting sf. For this package to attract wide use I think the package specific classes/structures must be removed For example, add_bbox_buffer:

Could be simplified to something like:

library(sf)
bb = st_as_sfc(st_bbox(c(ymin = 44.17609, ymax = 44.04905,
                         xmin = -74.01188, xmax = -73.83493), crs = 4326))
tmp = st_transform(bb, 5070) # EPSG ensures meters
tmp = st_buffer(tmp, 10, 
                joinStyle = 'MITRE', 
                endCapStyle = "SQUARE", 
                mitreLimit = 2)

out = st_transform(tmp, 4326) # Put back to Lat Long

https://github.com/jhollist/elevatr/blob/4c45d548231400284f72b92b42dff6d9fd9a0928/R/get_elev_raster.R#L9

classes.R

I think your toolset has a lot of value outside the terrainr environment, but it becomes very limited with the imposed classes for things like coordinate sets and bounding boxes which already have well defined and common class structures that play nicely in the R spatial environment. I would like to see all the specific classes removed from the package or else a strong justification for what they add. I am aware this will take some considerable rewrite in the package but think it is well worth it.

get_bbox.R

Again sf::st_bbox does everything needed. No need to write unique methods to extract bbox's from common spatial classes. This removes your dependency of rlang too.

For example:

st_bbox(c(ymin = 44.04905, ymax = 44.17609,
                         xmin = -74.01188, xmax = -73.83493), crs = 4326)

st_bbox(raster::raster(ncol=100, nrow=100))

get_bbox_centroid

Same thing. st_centroid() does what you need. No need for specific classes and logic. For example:

get_bbox_centroid(list(
     c(lat = 44.04905, lng = -74.01188),
     c(lat = 44.17609, lng = -73.83493)))

# Equals 

bb = st_as_sfc(st_bbox(c(ymin = 44.04905, ymax = 44.17609,
                         xmin = -74.01188, xmax = -73.83493), crs = 4326))

st_centroid(bb)

get_tiles.R

  • This function, even for the example, takes a while to run, it would be nice to have some messaging along the way to give confidence it is going if progressr is not installed. Consider using httr::progress() in hit_national_map_api. More on that below.

  • Its nice to be able to provide a tmpfile prefix, but would be equally nice to be able to set where the file is downloaded to (other then the tmp directory). This will come up again below with the GDAL discussion.

  • find a way to avoid the 3 in the beginning of your list name for 3DEP, maybe elevation3DEP?

hit_national_map_api.R

  • I suggest using httr::RETRY("GET", ...) rather then a tryCatch and counters, this will greatly simplify your logic. For example:
#Set UP
url = httr::parse_url("https://elevation.nationalmap.gov/arcgis/rest/services/3DEPElevation/ImageServer/exportImage")

query_arg <- list(
      bboxSR = 4326,
      imageSR = 4326,
      size = paste(8000, 8000, sep = ","),
      format = "tiff",
      pixelType = "F32",
      noDataInterpretation = "esriNoDataMatchAny",
      interpolation = "+RSP_BilinearInterpolation",
      f = "json"
    )

## 
first_corner <- bbox@bl
second_corner <- bbox@tr

bbox_arg <- list(bbox = paste(
    min(first_corner@lng, second_corner@lng),
    min(first_corner@lat, second_corner@lat),
    max(second_corner@lng, first_corner@lng),
    max(second_corner@lat, first_corner@lat),
    sep = ","
  ))

####======
#^all the above could be replaces with paste(st_bbox(...), collapse = ",")

You have a lot of logic wrapping your httr calls which is essentially:

res <- httr::GET(url, query = c(bbox_arg, query_arg))
content = httr::content(res, type = "application/json")

An identical, but way to build in some of that error catching and retrying would be to change it to:

res2 <- httr::RETRY("GET", url, 
                   query = c(bbox_arg, query_arg), 
                   times = 10, pause_cap = 10)

You could use this approach again to get the images and write them to disk, instead of httr::content(..., "raw"). For example:

tmpfile <-tempfile()
content <-httr::content(res2, type = "application/json")

httr::RETRY("GET", content$href, 
                   httr::progress(), 
                   httr::write_disk(tmpfile, overwrite = TRUE),
                   times = 10, pause_cap = 10)

raster::raster(tmpfile) %>% raster::plot()

merge_rasters.R

The GDAL stuff you have here is (reasonably!!) way more than needed. One big reason to avoid gdalUtils in a package is because it requires an instance of GDAL that is different than the version that comes with sf. For many users this can be an issue.
Here is an brief solution to these issues for much less code and much more utility:

The example you have only gives one raster which I assume wouldn't need to be merged? This should probably be changed, but I pulled an example from one of your other functions:

simulated_data <- data.frame(
  id = seq(1, 100, 1),
  lat = runif(100, 44.04905, 44.17609),
  lng = runif(100, -74.01188, -73.83493)
)

bbox <- get_coord_bbox(lat = simulated_data$lat, lng = simulated_data$lng)
bbox <- add_bbox_buffer(bbox, 100)
img_files = get_tiles(bbox, tempfile())

So the goal is to get this to a single raster right? If so try:

target_prj = st_crs(5070)$proj4string
method = "near"
destfile1 = tempfile()

sf::gdal_utils(util = "warp", 
                 source = unlist(img_files), 
                 destination = destfile1,
                 options = c("-t_srs", as.character(target_prj),
                             "-r", method))

raster::raster(destfile1)

# class      : RasterLayer 
# dimensions : 16782, 14984, 251461488  (nrow, ncol, ncell)
# resolution : 1.151324, 1.151324  (x, y)
# extent     : 1736879, 1754131, 2540862, 2560184  (xmin, xmax, ymin, ymax)
# crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 

The cool thing about above is that it is short-and-sweet, only relies on sf and it offers you a direct path to allow users to specify the output CRS and resampling method (exposing target_prj, method, destfile1 as function parameters.)

Another reason to keep your bbox objects in standard classes is that you can use then to quickly crop the merged raster in the same call by first writing the bbox object to a shp. So instead of the above you could do:

# From the same coordinates make a 'sf' bbox of your points:
bbox = st_as_sf(simulated_data, coords = c("lng", "lat"), crs = 4326) %>% 
  st_bbox() %>% 
  st_as_sfc() %>% 
  st_as_sf()

# Write temporary shp
tmp_shp = tempfile(fileext = ".shp")
write_sf(bbox, dsn = tmp_shp)

destfile2 = tempfile()

sf::gdal_utils(util = "warp", 
                 source = unlist(img_files), 
                 destination = destfile2,
                 options = c("-t_srs", as.character(target_prj),
                             "-r", method,
                             '-cutline', tmp_shp,
                             '-crop_to_cutline'))

raster::raster(destfile2)

# class      : RasterLayer 
# dimensions : 14095, 14505, 204447975  (nrow, ncol, ncell)
# resolution : 1.151359, 1.151313  (x, y)
# extent     : 1737200, 1753901, 2542188, 2558415  (xmin, xmax, ymin, ymax)
# crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
# source     : /private/var/folders/_d/jkzhpcss17v0sy0zjr7xhrxr0000gn/T/RtmpyEg8av/file220c7958114b 
# names      : file220c7958114b

raster_to_raw_tiles.R

Use sf::gdal_utils("translate", ...) here instead of gdalUtils/gdalUtilities for the same reasons as above.

It may be worth your time to look at this: https://github.com/ropensci/tiler

georeference

The given example does not run because the arguments are in the wrong order, also it doesn't seem to need alignment? The input and output are identical.

No need for TIFF package (I think) since raster reads tiffs already (raster will also read PNG and JPEG, just into RGBA bands)

utils.R

Check out the units package to handle all these unit conversions, but really, the unit given should match the CRS of the input data. Because of this, I might consider being more strict about the units you allow for a buffering call.

Notes:

  • Readme - you have an 'elev' hanging at the end of the 2nd 'plot' line

Again, there's a lot of good stuff here, I hope this is useful, and feel free to email for any clarification/follow up.

Mike

@ldecicco-USGS
Copy link

Thanks so much @mikejohnson51 ! I heard back from @sfoks that the second review should be done sometime early next week. So @mikemahoney218 , feel free to start responding to this first review or wait for both to be submitted.

@mikemahoney218
Copy link
Member Author

mikemahoney218 commented Feb 18, 2021

Thank you Mike for the thorough review! Below I've attempted to go through your comments line-by-line; I've collapsed comments that were repeated into single items (mostly httr/sf related) and removed the complements (though they were much appreciated :) ) for length. All the changes below are now merged to main as version 0.3.0.

Key Points:

* I think there is some nice utility here but there is also some cross over with others packages like `FedData`, `elevatr`.

* Following this, other then elevation, this is not a "spatial data" retrieval package as it returns PNGs that are more similar to map tiles then to spatial data. Thus the functionality is also similar to `rosm`.

* I cannot think of many cases for wanting the PNG files in R. That **certainly** does not mean they are not out there but I think more clarity about what the package is returning (maptiles) and there use would help it gain traction.

I fully agree that the API code overlaps slightly with a few other packages (particularly the FedData package, which uses two of the same services). As far as the API wrapper code is concerned, I see the primary differences as being:

  • 7 endpoints not serviced by any other R package.
  • USGS data is released to the public domain, unlike OSM (which uses a strong copyleft license with attribution requirements that I personally would not be willing to use in any project; link) and MapZen (which uses CC-BY-4.0 which at least removes the share-alike requirements; link).

With that said, I think the main differentiator of terrainr is the workflow it introduces for creating 3D terrain tiles with overlays in the Unity engine. There's a lot of buzz in visualization research about using game engines for interactive simulations of landscape dynamics, which is why I originally started working on terrainr; the get_tiles %>% merge_raster %>% raster_to_raw_tiles flow reduces what used to be an intensely manual process into an all-but-automated method (see for instance this five minute tutorial that walks through manually downloading a heightmap, wrangling the data in GIMP, and then importing it -- without any guidance on adding overlays).

I've clarified that most endpoints are returned as tiles. I've also attempted to rework documentation to emphasize that the data manipulation and export functionality are at least co-equal goals to the retrieval functions; unfortunately, as these functions need data to manipulate, the data retrieval functions have to before the manipulation functions in any given document. With that said, there are still a number of use cases for these image tiles within R -- see for instance this example with rayshader. I imagine the new geom_spatial_rgb (discussed below) will also be helpful here.

* Use `httr::RETRY` to handle much of your error catching and `httr::write_disk()` to avoid having to read images in as raw bytes/bits.  This will save a large amount of code as well.

I've attempted to use httr::RETRY in the place of my manual backoff before, with no success (I removed my last attempt in October). I attempted again following this review (documented here), but some challenges with this API (sometimes returning failures with a 200 status, returning HTML in lieu of JSON, sometimes returning successes with a 400, and I'm sure other weird aberrant behaviors) can't be handled as well in httr::RETRY as in the manual code. The end result is various CI failures, and retrieving each tile takes notably longer than the current manual backoff approach. As a result, unfortunately I don't see a way to incorporate httr::RETRY at this time.

Following last month's challenge with the transportation endpoint starting to return raw PNGs instead of base64 encoded images, I've changed the way get_tiles works so that instead of having a list of base64 vs raw image endpoints, get_tiles will check the type of the returned data instead (first relevant commit, current version). As a result, I believe I need to load the tiles into memory at some point in order to detect if they need to be decoded. Because of this, I don't think httr::write_disk will save system resources at the moment.

* With all of these I think you can drop: `base64enc`,  `gdalUtils`, `methods`, `gdalUtilities`, and `rlang` from your dependencies and many of your function can be slimmed down and/or removed. The result (in my option) will be a more lightweight package with higher utility, and more easy integration in spatial workflows.

At the end of the day, I was able to drop methods, gdalUtilities, and rlang; gdalUtils is still in use in raster_to_raw_tiles for reasons discussed below and base64enc is required for decoding endpoints that don't return data as a raw vector.

Branding

* It's a weird point but one that I think is important. `terrainr` as a name sells your package short while also being misleading. Terrain is one of 9 endpoints offered but is also anomalous in that it is the only "spatial" data available in your package (in the traditional vector/raster sense).

* Arguably terrain is also the best served of the resources in the R ecosystem with `elevatr` and `FedData`. `Elevatr` on the surface is a better choice for getting elevation data simply because it offers multiple resolutions and the Mapzen server is quicker then the ArcGIS server.

* My expectation of the package coming in given the description of "retrieve geospatial data" was that for resources like the NHD, NHDHighRes, WBD, and transportation, I would be getting the vector data. Particularly since these are offered by the National Map. Therefore, it is very important to be upfront that the package is offering endpoints to (effectively) mosaicked web tiles (with one spatial data source).

* All of this is not to discount the utility of your package because I think it is one of the few to offer access to these resources which makes it very unique! But the functionality needs to be explicit because, as is, someone coming in with the expectation that they will be able to query spatial data will be disappointed, while those looking for map tile data will likely pass your package by.

This comment actually took me by surprise originally! It had never even occurred to me that someone would see the name terrainr and associate it with the elevation endpoints (though of course it makes sense now). The name harkens back to the original goal of the package: creating terrain tiles in Unity. As discussed in the Unity vignette, "terrain" is used as a very precise name for what's being created (physically-explicit heightmap tiles created from planar-interlaced raw image files) which is used throughout the Unity UI. As mentioned in the earlier comments, I've attempted to make this goal of the package more explicit in the documentation and DESCRIPTION file, as well as more explicitly expressing that data is obtained as tiles.

As a quick side note, I want to point out that get_tiles has an argument resolution which lets you set pixel side length in meters.

* One **huge** opportunity for your package, that I think you already have worked out, is how to integrated these resources with ggplot as a base-map. It is a constant frustration of mine (and many I know) to get nice base maps for ggplot.

* Yes, `tmap` has some utilities for this and `ggmap` is out there (I have always struggled with it), there is room here for growth. If you could add a `geom_terrainr()` that played nicely with `ggplot`, I think that would be immediately useful to a many people.

Added geom_spatial_rgb (relevant commit) for displaying RGB tiles.

Code Review:

add_bbox_buffer.R

I've adopted your suggestion about how to shift these to sf. The older behavior is preserved in order to manipulate latitude/longitude data (but still uses units and sf as suggested). Relevant commit.

classes.R

get_bbox.R

get_bbox_centroid

These have been removed from all user-facing code. The classes still exist for communication between package functions, but are no longer the only accepted inputs for functions and are not returned by any function. sf or Raster objects can be used in their place across the board. Relevant commit.

get_tiles.R

* This function, even for the example, takes a while to run, it would be nice to have some messaging along the way to give confidence it is going if `progressr` is not installed. Consider using `httr::progress()` in `hit_national_map_api`. More on that below.

I personally am very persuaded by progressr's motto:

The developer is responsible for providing progress updates but it's only the end user who decides if, when, and how progress should be presented. No exceptions will be allowed.

Letting users control how and why they're updated is the key goal of using progressr, so I'm reluctant to add always-on progress bars (which would trigger three separate times per tile downloaded, I believe) via httr at the moment.

* Its nice to be able to provide a `tmpfile` prefix, but would be equally nice to be able to set where the file is downloaded to (other then the tmp directory). This will come up again below with the GDAL discussion.

Try setting output_prefix to any valid path on your system for this. get_tiles will append a string in the format servicename_xtile_ytile.tif to output_prefix for each tile downloaded.

* find a way to avoid the `3` in the beginning of your list name for 3DEP, maybe elevation3DEP?

I changed the behavior so that now get_tiles returns a list whose names match the names provided to service; this means that if you provide services = "elevation" the output list will now be named elevation. If a user specifies 3DEPElevation, I think it still makes sense to name that element of the list 3DEPElevation in order to have consistent behavior across endpoints; I do wish the official service name was something else! Relevant commit.

merge_rasters.R

I've adopted your suggested sf/gdal_warp code entirely, and cannot believe how much faster and cleaner it runs now. Thanks in particular for this suggestion! Relevant commit.

raster_to_raw_tiles.R

Use sf::gdal_utils("translate", ...) here instead of gdalUtils/gdalUtilities for the same reasons as above.

There's a comment in this file from about five months ago that I left as a warning to my future self:

https://github.com/mikemahoney218/terrainr/blob/851bcba8819e28084548edb84835bc71ed74bb03/R/raster_to_raw_tiles.R#L78

Unfortunately, this is still true. gdalUtilities is a simple wrapper around sf::gdal_utils (with command formatting more similar to command-line gdal), and so using sf::gdal_utils through the wrapper still causes my R session to hard crash for reasons I'm entirely unaware of. Unfortunately, that has left me dragging in gdalUtils for the time being.

It may be worth your time to look at this: https://github.com/ropensci/tiler

Unfortunately, I'm yet to get this package to work on my laptop due to an issue with Python package dependencies.

georeference

The given example does not run because the arguments are in the wrong order, also it doesn't seem to need alignment? The input and output are identical.

Serves me right for not naming my arguments! Addressed both issues here.

No need for TIFF package (I think) since raster reads tiffs already (raster will also read PNG and JPEG, just into RGBA bands)

I had to do a little bit of investigation into why this function works this way, myself! It turns out that while raster::brick is perfectly happy to read in any type of file, it only correctly detects the scale of the bands if the file is entirely read into memory (and assumes 255/8-bit images if not). Given that georeference_overlay has to handle images with unknown scales (for instance, one user of this package edits the tiles in Photoshop) and even the default case (tiles returned by the National Map and vector_to_overlay) is scaled [0, 1], this function has to load the images into memory before creating RasterBricks to avoid all outputs being either black or white squares. I haven't fully investigated if this is true with TIFF images as well (verified for JPEG and PNG), but I believe the current approach is the simplest way to handle varied file types.

utils.R

Check out the units package to handle all these unit conversions, but really, the unit given should match the CRS of the input data. Because of this, I might consider being more strict about the units you allow for a buffering call.

I've swapped over to units for this across the board (except in the case of radians -> degree conversions, where I still use my helper functions -- which are no longer exported -- to make the code a bit shorter/more legible). As someone working in a field (forestry) that still relies upon inches and chains as standard units, I'm reluctant to force people to SI standards.

Notes:

* Readme - you have an 'elev' hanging at the end of the 2nd 'plot' line

Doh! Fixed, thank you.

@sfoks
Copy link

sfoks commented Feb 18, 2021

Package Review


Provided Review Template

  • Briefly describe any working relationship you have (had) with the package authors.
    No relationship other than being asked to review this R package.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
    Installation was okay after updating my Mac to new R, new devtools, etc. I didn't have access to my windows machine this weekend, otherwise I'm sure the installation would have been easier!
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 10 hours

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Additional reviewer questions from guide

  • Does the code comply with general principles in the Mozilla reviewing guide?
    I believe so
  • Does the package comply with the rOpenSci packaging guide?
    yes
  • Are there improvements that could be made to the code style?
    code style is fine
  • Is there code duplication in the package that should be reduced?
    not that I can comment to
  • Are there user interface improvements that could be made?
    NA
  • Are there performance improvements that could be made?
    not that I can comment to
  • Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient? Does it use the principle of multiple points of entry i.e. takes into account the fact that any piece of documentation may be the first encounter the user has with the package and/or the tool/data it wraps?
    I believe the documentation is more than sufficient. I was able to navigate the vignettes in my first encounter, had no errors when running examples, and only problems with navigating the software Unity.
  • Were functions and arguments named to work together to form a common, logical programming API that is easy to read, and autocomplete?
    yes

Review Comments

  • This review is for terrainr up to commit 91a73a4.
  • I was using R version 4.0.3, Rstudio version 1.4.1103, and a macOS Big Sur 11.2.1 when I performed this review.
  • Within the unity_instructions.Rmd, I think merged_outputs <- merge_rasters(raw_tiles$3DEPElevation, tempfile(fileext = ".tif"), .$USGSNAIPPlus, tempfile(fileext = ".tif")) should be changed to: merged_outputs <- merge_rasters(raw_tiles$3DEPElevation, tempfile(fileext = ".tif"), raw_tiles$USGSNAIPPlus, tempfile(fileext = ".tif"))
  • In general, I received a lot of warnings with the unity_instructions vignette, several related to the merge_rasters function. ex: Warning 6: gdalbuildvrt does not support heterogeneous band numbers: expected 3, got 4. Skipping /var/folders/zd/0gcdnc8x21177wtw966dqktm0000gn/T//Rtmp1XicTF/file2176d7036b2.tif; however this didn't seem to influence the generation of output.
  • Other warnings were received upon performing raster_to_raw_tiles function: ex Warning 1: -srcwin 0 8194 4097 4097 falls partially outside raster extent. Going on however. I believe you explain that this is due to National Map API issues in trying to download a certain geographic extent but wanted to make sure you were aware of it. I also believe these warnings (also associated with the raster_to_raw_tiles function) are related to the same National Map API download issues or perhaps my instance of GDAL?
    Warning messages: 1: In CPL_gdaltranslate(source, destination, options, oo, ... : GDAL Message 1: for band 1, nodata value has been clamped to 0, the original value being out of range. 2: In CPL_gdaltranslate(source, destination, options, oo, ... : GDAL Message 1: for band 2, nodata value has been clamped to 0, the original value being out of range. 3: In CPL_gdaltranslate(source, destination, options, oo, ... : GDAL Message 1: for band 3, nodata value has been clamped to 0, the original value being out of range. 4: In CPL_gdaltranslate(source, destination, options, oo, ... : GDAL Message 1: for band 4, nodata value has been clamped to 0, the original value being out of range.
  • Give these warning examples listed above, the raw, xml, and png files exported to my directory correctly.
  • I downloaded Unity to test terrainr output, but could not get my tiles into the software! This is an issue with me never have used Unity, so definitely a user-error. I may try this again at a later date since it seems like such a cool way to visualize this data.
  • I think it would be great to present additional examples of data pulls and use in Unity, but of course this is not necessary for package publication. It appears there is a lot that this package can do and providing more examples of functionality to novice users like me would be awesome.
  • Overall I think this package is well put together, thanks for allowing me to review!

@ldecicco-USGS
Copy link

Thank you so much @sfoks and @mikejohnson51 !!!!

@mikemahoney218
Copy link
Member Author

mikemahoney218 commented Feb 18, 2021

Thank you so much @sfoks ! I've responded to your comments below.

* Within the unity_instructions.Rmd, I think  `merged_outputs <- merge_rasters(raw_tiles$`3DEPElevation`, tempfile(fileext = ".tif"), .$USGSNAIPPlus, tempfile(fileext = ".tif"))` should be changed to: `merged_outputs <- merge_rasters(raw_tiles$`3DEPElevation`, tempfile(fileext = ".tif"), raw_tiles$USGSNAIPPlus, tempfile(fileext = ".tif"))`

Fixed, thank you! Relevant commit.

* In general, I received a lot of warnings with the unity_instructions vignette, several related to the `merge_rasters` function. ex: `Warning 6: gdalbuildvrt does not support heterogeneous band numbers: expected 3, got 4. Skipping /var/folders/zd/0gcdnc8x21177wtw966dqktm0000gn/T//Rtmp1XicTF/file2176d7036b2.tif`; however this didn't seem to influence the generation of output.

This should be resolved thanks to changes made for Mike's review (relevant commit). I've validated that this "works on my machine" without warnings thanks to the new and improved merge_rasters and haven't seen any warnings in CI output yet.

This used to happen when the National Map returned some image tiles with alpha bands and some without, due to a limitation in the workflow we were using; however, whether or not a given tile will have an alpha band seems to change over time (so that running the same query a day later might resolve the issue). I think this resulted in a tile not getting merged into your full raster, resulting in the CPL_gdaltranslate error mentioned below; if I'm right, then I think the commit above solves both issues!

* Other warnings were received upon performing `raster_to_raw_tiles` function: ex `Warning 1: -srcwin 0 8194 4097 4097 falls partially outside raster extent. Going on however. ` I believe you explain that this is due to National Map API issues in trying to download a certain geographic extent but wanted to make sure you were aware of it. I also believe these warnings (also associated with the `raster_to_raw_tiles` function) are related to the same National Map API download issues or perhaps my instance of GDAL?
  `Warning messages: 1: In CPL_gdaltranslate(source, destination, options, oo,  ... : GDAL Message 1: for band 1, nodata value has been clamped to 0, the original value being out of range. 2: In CPL_gdaltranslate(source, destination, options, oo,  ... : GDAL Message 1: for band 2, nodata value has been clamped to 0, the original value being out of range. 3: In CPL_gdaltranslate(source, destination, options, oo,  ... : GDAL Message 1: for band 3, nodata value has been clamped to 0, the original value being out of range. 4: In CPL_gdaltranslate(source, destination, options, oo,  ... : GDAL Message 1: for band 4, nodata value has been clamped to 0, the original value being out of range.`

* Give these warning examples listed above, the raw, xml, and png files exported to my directory correctly.

Thank you for flagging!

Warning 1: -srcwin 0 8194 4097 4097 falls partially outside raster extent. Going on however:

You're exactly right, this warning is related to the National Map endpoint not exactly respecting the requested bounding boxes. I've made this explicit in the vignette (relevant commit), but to explain here as well:

Since Unity expects to import heightmaps as squares with side lengths that are powers of 2, this function will create tiles that are the right size for Unity; in order to make things easier for the user, those tiles will all be the same size (so users don't need to change sizes for each tile they import). That means that if the input raster isn't exactly divisible by side_length, this function will create tiles with no data past a certain point ("falls partially outside raster extent").

I've left this message as standard output (rather than an official R warning) because I believe it's useful for users to know to expect some no-data regions at the edges of their terrain tiles, but it's standard function behavior (so shouldn't interrupt users who have set options(warn=2) or similar). I'm happy to change this behavior (either provide a message from within terrainr, rather than messages from GDAL, or raise an actual warning) if you think something else makes more sense!

In CPL_gdaltranslate(source, destination, options, oo, ... : GDAL Message 1: for band 1, nodata value has been clamped to 0, the original value being out of range.:

I believe this is about the issue with mixed 3 & 4 band rasters as mentioned above, and should hopefully be fixed given the new merge_rasters code.

* I downloaded Unity to test terrainr output, but could not get my tiles into the software! This is an issue with me never have used Unity, so definitely a user-error. I may try this again at a later date since it seems like such a cool way to visualize this data.

If you're willing, I'd love to hear what went wrong! I would love to make the unity_instructions vignette as comprehensive as possible for future users.

* I think it would be great to present additional examples of data pulls and use in Unity, but of course this is not necessary for package publication. It appears there is a lot that this package can do and providing more examples of functionality to novice users like me would be awesome.

Thank you! I fully hope and intend to keep adding examples to the documentation moving forward.

* Overall I think this package is well put together, thanks for allowing me to review!

Thank you so much for reviewing!

@ldecicco-USGS
Copy link

@mikemahoney218 thanks for detailed response!

@sfoks and @mikejohnson51 Could you please indicate whether you are happy with the authors' response? Template https://devguide.ropensci.org/approval2template.html

@sfoks
Copy link

sfoks commented Feb 19, 2021

Reviewer Response

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

My earlier comment: I downloaded Unity to test terrainr output, but could not get my tiles into the software! This is an issue with me never have used Unity, so definitely a user-error. I may try this again at a later date since it seems like such a cool way to visualize this data.
Author response: If you're willing, I'd love to hear what went wrong! I would love to make the unity_instructions vignette as comprehensive as possible for future users.

  • I was lost at the import_raw step. I followed your instructions, matched the screenshot values exactly, and tried the "Windows" option and "Mac" option (since I was on a Mac when testing terrainr). I loaded the raw file in both cases apparently, but neither seemed produce anything that looked like Mt. Marcy (the terrain tile was still highlighted in orange and it still had checker-boarded fill). So I'm not sure if I am missing a crucial step here. A screen shot of what a successful, first terrain tile looks like might help users or me perhaps, but really these are issues with using Unity and ultimately have no influence on publishing. I could probably cruise YouTube videos to see what I am doing wrong here. Basically no response is needed here in terms of adjusting the vignette, just adding more details to the problems I had with Unity.
  • thanks for allowing me to review!

Estimated hours spent reviewing: 11 hours total

@mikejohnson51
Copy link

mikejohnson51 commented Feb 19, 2021

Reviewer Response

Kudos, great work @mikemahoney218 I am happy both as a reviewer and as a future user. The new ggplot tools seem great and all of the answers were well articulated with strong reasoning for keeping/rejecting changes.

Figuring out the remaining gdal issue would be nice and, if you can provide a reprex, I am happy to take a look. But that is not needed for package acceptance.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6

@ldecicco-USGS
Copy link

Approved! Thanks @mikemahoney218 for submitting and @sfoks and @mikejohnson51 for your reviews!

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname). If Appveyor does not pick up new commits after transfer, you might need to delete and re-create the Appveyor project. (Repo transfers are smoother with Travis CI and GitHub Actions)
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@mikemahoney218
Copy link
Member Author

Thank you so much all!

I'd like to include both of you as reviewers ("rev" role) in the DESCRIPTION; I see that @sfoks has specifically agreed to this, but would that be alright with you, @mikejohnson51 ?

@mikemahoney218
Copy link
Member Author

Also, @ldecicco-USGS , I received this error when trying to transfer the repo:
Screenshot from 2021-02-19 16-33-42

Do you have any idea what might be going wrong here? Thank you!

@ldecicco-USGS
Copy link

Ha sure enough... I think you should have an invitation now. Give it another try...and if THAT doesn't work let me know (this is always the step I manage to mess up).

@mikemahoney218
Copy link
Member Author

Second time's the charm! Transferred the repo over: https://github.com/ropensci/terrainr
Thank you!

@mikemahoney218
Copy link
Member Author

Alright, I believe I've done everything on that checklist now @ldecicco-USGS ! I haven't seen the docs build yet on docs.ropensci -- is there anything I need to do here other than wait for the repo to get picked up by CI?

I would love to submit a blog for the site! I'll take a look at the guide over the weekend and would love to talk with @stefaniebutland about timing.

Thanks again, everyone!

@mikejohnson51
Copy link

@mikemahoney218 thanks for the offer, that would great. Congrats on your package!

@stefaniebutland
Copy link
Member

Congratulations on passing peer review @mikemahoney218.

When you're ready, please submit a draft post by pull request, following instructions in https://blogguide.ropensci.org/. Then my colleague @steffilazerte will review it and suggest a publication date. Followup questions can move to Slack. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants