Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hddtools package #73

Closed
13 tasks done
cvitolo opened this issue Sep 5, 2016 · 39 comments
Closed
13 tasks done

hddtools package #73

cvitolo opened this issue Sep 5, 2016 · 39 comments

Comments

@cvitolo
Copy link

cvitolo commented Sep 5, 2016

Summary

  • What does this package do? (explain in 50 words or less):

This R package is an open source project designed to facilitate non-programmatic access to a variety of online open data sources: the Global Runoff Data Center (GRDC), the Scottish Environment Protection Agency (SEPA), the Top-Down modelling Working Group (Data60UK and MOPEX), Met Office Hadley Centre Observation Data (HadUKP Data) and NASA's Tropical Rainfall Measuring Mission (TRMM).

  • Paste the full DESCRIPTION file inside a code block below:
Package: hddtools
Type: Package
Title: Hydrological Data Discovery Tools
Version: 0.3.0
Date: 2016-09-05
Authors@R: c(person("Claudia", "Vitolo", email = "[email protected]", role = c("aut", "cre")))
Maintainer: Claudia Vitolo <[email protected]>
URL: https://github.com/cvitolo/hddtools/
BugReports: https://github.com/cvitolo/hddtools/issues
Description: Facilitates discovery and handling of hydrological data, non-programmatic access to catalogues and databases.
Depends:
    R (>= 3.0.2)
Imports: zoo, sp, RCurl, XML, rnrfa, Hmisc, raster, stringr
Suggests: testthat
License: GPL-3
Repository: CRAN
RoxygenNote: 5.0.1
  • URL for the package (the development repository, not a stylized html page):

https://github.com/cvitolo/hddtools/

  • Who is the target audience?

Hydrologists, environmental scientists and practitioners.

  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
  • If this is a resubmission following rejection, please explain the change in circumstances:
  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@masalmon

@noamross
Copy link
Contributor

noamross commented Sep 6, 2016

Editor checks:

  • Fit: The package meets or criteria fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thank you for the submission, @cvitolo! Before we progress to finding reviewers, we would like the package to have a test suite with a set of tests that cover a good portion of the code. I see unit tests mentioned in your NEWS file - perhaps they're just not committed for some reason?


In addition, I ran goodpractice::gp(), which we are starting to do to flag some things. Here are the results, which it may be helpful to get started with before review:

── GP hddtools ──────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. At least some lines are not covered in this package.

    R/bboxSpatialPolygon.R:23:NA
    R/bboxSpatialPolygon.R:24:NA
    R/bboxSpatialPolygon.R:25:NA
    R/bboxSpatialPolygon.R:26:NA
    R/bboxSpatialPolygon.R:27:NA
    ... and 768 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.
  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to
    read your code for them if you use '<-'.

    R/GRDC.R:188:11

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/Data60UK.R:71:1
    R/Data60UK.R:78:1
    R/GRDC.R:194:1
    R/GRDC.R:196:1
    R/GRDC.R:213:1
    ... and 25 more lines

  ✖ avoid calling setwd(), it changes the global environment. If you need it, consider using on.exit() to restore the working
    directory.

    R/TRMM.R:55:3

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/getContent.R:23:9

  ✖ fix this R CMD check error/warning/note: Found the following hidden files and directories: .travis.yml These were most
    likely included in error. See section ‘Package structure’ in the ‘Writing R Extensions’ manual.  [Editor's note: this file should be in .Rbuildignore]
  ✖ avoid 'T' and 'F', as they are just variables which are set to the logicals 'TRUE' and 'FALSE' by default, but are not
    reserved words and hence can be overwritten by the user.  Hence, one should always use 'TRUE' and 'FALSE' for the
    logicals.

    R/GRDC.R:NA:NA
    R/HadDAILY.R:NA:NA

UPDATE: Here is the package coverage report, now that test are included:

> Sys.setenv("NOT_CRAN" = "true")
> covr::package_coverage()
hddtools Coverage: 65.86%
R/getContent.R: 0.00%
R/KGClimateClass.R: 19.18%
R/GRDC.R: 75.56%
R/TRMM.R: 79.67%
R/SEPA.R: 80.88%
R/MOPEX.R: 82.22%
R/Data60UK.R: 93.00%
R/bboxSpatialPolygon.R: 96.30%
R/HadDAILY.R: 100.00%

Reviewers: @mdsumner @ledell
Due date: 2016-09-28

@cvitolo
Copy link
Author

cvitolo commented Sep 6, 2016

Thanks @noamross! You are right, I forgot to remove the tests folder from the .gitignore file.

I have also amended the package based on the above suggestions.
I have two internal functions in the package: bboxSpatialPolygon() and getContent(). These are the only two functions for which I do not have unit tests.

Please let me know if there is any outstanding issue to resolve.

@mdsumner
Copy link

mdsumner commented Sep 6, 2016

@cvitolo just some unsolicited input here, I noticed your mention of bboxSpatialPolygon: I recently put "spex" package on CRAN which tries to fill this gap. An extra package depend is probably not welcome, but thought it might be helpful. I am constantly repeating the pattern of as(extent(x) , "SpatialPolygons") then putting on the CRS again etc. (Really, extent() should have CRS, and though the lower level sp::Spatial() does have it, we usually want an actual object and not just the abstraction).

@noamross
Copy link
Contributor

noamross commented Sep 6, 2016

Thanks, @cvitolo. Now seeking reviewers.

@noamross
Copy link
Contributor

noamross commented Sep 6, 2016

@cvitolo I see that you submitted this package to JOSS separately: openjournals/joss-reviews#56. This is fine but the "automatic submission" option we have up top envisions a process where the package goes to JOSS after our review (so that they can approve based on our reviews, removing an additional review step). If this makes sense to you, perhaps you should pause the review process over at JOSS?

@cvitolo
Copy link
Author

cvitolo commented Sep 7, 2016

Of course, I just let JOSS know I have submitted to ropensci as well.

@ropenscibot
Copy link

@mdsumner @ledell - hey there, it's been 16 days, please get your review in by Sep 27, thanks 😺 (ropensci-bot)

@mdsumner
Copy link

mdsumner commented Sep 26, 2016

Hello, sorry but it's going to take me a little longer. No more than a few
days I hope, apologies.

@ropenscibot
Copy link

@mdsumner @ledell - hey there, it's been 21 days, please get your review in soon, thanks 😺 (ropensci-bot)

@ledell
Copy link

ledell commented Sep 30, 2016

Working on my review now...

@ledell
Copy link

ledell commented Oct 5, 2016

@noamross @cvitolo Here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly staing problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4


Review Comments

hddtools, which stands for Hydrological Data Discovery Tools, is an R interface to several online hydrology dataset repositories. The package is useful and relevant particularly to hydrologists, but also to environmental scientists and practitioners in general.

This package simplifies the following processes for the user: download of a metadata catalogue, selection of information needed, formal request for dataset(s), de-compression, conversion, manual filtering and parsing.

The package also can provide offline modes in addition to online modes, which makes it efficient to use if the user regularly works with the same datasets (no need to re-download on each session when data licenses allow for redistribution).

hddtools fills a gap in R data tools for hydrologists and environmental scientists alike and would be a good contribution to the rOpenSci package ecosystem.

Package Metadata

Depends:
    R (>= 3.0.2)

Does the package actually not work with R 3.0.1 or less, or was this just the version that was used in development and/or testing?

Suggests: testthat

I'd add the rgdal package to at least Suggests and maybe Depends since the KGClimateClass() function depends on it.

I'd possibly consider moving the Imports to Depends for the packages that provide primary utility to the hddtools package.

License: GPL-3

Are you familiar with permissive vs non-permissive licencing? A majority of commercial businesses will stray away from GPL-licensed software, so you will have a potentially wider user base if you license with a permissive license such as MIT, BSD or Apache 2.0. Just something to consider as you license your package, however if it's mostly academics who will use this package, then it probably doesn't matter too much. If you want to keep it GPL'd, you may consider with GPL (>= 2) instead of GPL-3 because it will be more compatible with other packages.

Documentation

In the description of the package, you state, "This R package is an open source project designed to facilitate non-programmatic access to a variety of online open data sources relevant for hydrologists and, more in general, environmental scientists and practitioners."

When I hear "non-programmatic access", I think of something like a GUI or website. An R API would actually be the converse -- "programmatic access". I would consider changing the wording here.

Code style

Try to avoid using a dot in variable or function names, as in new.packages. I recommend the following edit to your README.md:

Change:

packs <- c('zoo', 'sp', 'RCurl', 'XML', 'rnrfa', 'Hmisc', 'raster', 
           'stringr', 'devtools')
new.packages <- packs[!(packs %in% installed.packages()[,'Package'])]
if(length(new.packages)) install.packages(new.packages)

To this:

packs <- c('zoo', 'sp', 'RCurl', 'XML', 'rnrfa', 'Hmisc', 'raster', 
           'stringr', 'devtools')
new_packs <- packs[!(packs %in% installed.packages()[,'Package'])]
if(length(new_packs)) install.packages(new_packs)

The convention for R code is to use spaces around all variable names and equals signs. For example:

This:

bbox <- list(lonMin=-10,latMin=48,lonMax=5,latMax=62)

Instead of this:

bbox <- list(lonMin = -10, latMin = 48, lonMax = 5, latMax = 62)
Grammar/Spelling

README.md:

  • "formal request for" -> "a formal request for"
  • "All those operation" -> "All those operations"
  • LaTeX style quotes are not needed in Markdown: Update quotes ``Long-Term Mean Monthly Discharges'' (in this section)
  • You are missing the name of the function in this sentence, "The hddtools provides a function, called , to download "

I think it would be nice to have a few more hyperlinks to the data sources, for example, you could add a link to the GDRC or the Data60UK initiative.

DESCRIPTION

  • ": Many governmental bodies" -> "Many governmental bodies"
  • "Especially when programmatic access to data resources is not allowed, downloading metadata catalogue, select the information needed, request datasets, de-compression, conversion, manual filtering and parsing can become rather tedious." -> "Especially when programmatic access to data resources is not allowed, downloading a metadata catalogue, selecting the information needed, requesting datasets, de-compression, conversion, manual filtering and parsing can become rather tedious."
  • "The package facilitate non programmatic access to various online data sources" -> "The package facilitates programmatic access to various online data sources"
  • "This package complements R's growing functionality in environmental web technologies to bridge the gap" -> "This package complements R's growing functionality in environmental web technologies by bridging the gap"
  • "data consumers and it is designed to be the starting building block" -> "data consumers. It is designed to be an initial building block"
  • "Facilitates discovery and handling of hydrological data, non-programmatic access to catalogues and databases." -> "Facilitates discovery and handling of hydrological data via programmatic access to catalogues and databases."

Functionality

There was a missing package when I tried to run the first example:

KGClimateClass(bbox,updatedBy='Peel')

Error:

Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer",  : 
  Cannot create RasterLayer object from this file; perhaps you need to install rgdal first

To remedy this, either add this package to the required/dependent packages in the README.md Installation section, or add a line to install the package right before using the function that depends on it. For example:

install.packages('rgdal')
KGClimateClass(bbox,updatedBy='Peel')

Found a possible bug:

> library(raster)
Loading required package: sp
> b <- brick('trmm_acc.tif')
> plot(b)
Error in graphics::par(old.par) : 
  invalid value specified for graphical parameter "pin"

Is there any way you can add the size of the datasets (assuming they stay static), to the docmentation for the data downloading functions? For example, this function:

x <- catalogueData60UK()

My plot for the following function looks quite different, I'm assuming that's because it's different data. I'd add a comment about how the plots will look different to the user, but that the following are examples of what the plots may look like.

y <- tsSEPA(hydroRefNumber = c(234253, 234174, 234305))
plot(y[[1]])

Also, these plot all on the same plot (4 graphs on one plot). Is that intentional?

> plot(y[[1]])
> plot(y[[2]])
> plot(y[[3]])
> x <- HadDAILY()
> plot(x$EWP)

R Documentation

Is bboxSpatialPolygon a user-level function or an internal function? It's not available to the user, apparently...

> library(hddtools)
> bbox <- list(lonMin=-180,latMin=-50,lonMax=+180,latMax=+50)
> bbSP <- bboxSpatialPolygon(bbox)
Error: could not find function "bboxSpatialPolygon"

You can add linkable URLS using the \url{} command. Your links in the R docs are not currently hyperlinked.

In the R docs for the catalogue functions, I'd save the output to a data.frame. For example:

# Retrieve the whole catalogue
x <- catalogueGRDC()

The only one that currently has this is:

x <- catalogueSEPA()

The getContent() function is not exported correctly:

>   url <- "ftp://hydrology.nws.noaa.gov/pub/gcip/mopex/US_Data/Us_438_Daily"
>   getContent(url)
Error: could not find function "getContent"

The example for tsData60UK() is simply commented out instead of using the "Example Not Run" command.

Sometimes you specify the argument name in the example and sometimes you don't. It might be best to fully specify the arguments in all the examples:

Specified:

x <- tsGRDC(stationID=1107700)

Not specified:

x <- tsMOPEX("14359000")

It would be good to note when a hydrometric reference number is a character string because some of the functions use numeric IDs and some use strings.

I'd change the title/name of "SEPAcatalogue" to "Data set: The SEPA Catalogue" to make sure that it's clear that it's a dataset. It might even be useful to call out any data sources as well using the term, "Data source: [name of catalogue]"

@noamross
Copy link
Contributor

noamross commented Oct 5, 2016

Thanks @ledell, for a thorough review! One quick thing - could you also look at the short paper (in inst/paper) for the JOSS section?

@sckott
Copy link
Contributor

sckott commented Oct 7, 2016

@mdsumner - hey there, it's been 31 days, please get your review in soon, thanks 😺 (ropensci-bot)

@cvitolo
Copy link
Author

cvitolo commented Oct 9, 2016

Thanks for your review @ledell! I'll start working on the changes as soon as both reviews are in.

@ropenscibot
Copy link

@mdsumner - hey there, it's been 36 days, please get your review in soon, thanks 😺 (ropensci-bot)

@mdsumner
Copy link

mdsumner commented Oct 13, 2016

Package Review

The package hddtools provides direct access to number of online data services for hydrological data. These data come in a variety of forms, and most include a catalogue of available data as an index for making further data requests. The package provides the data in easily useable forms, leveraging existing tools in R for mapping, spatial data handling, and time series handling and plotting.

I have made a number of detailed suggestions below, and I'm happy to follow up and help with any of those.

Finally, apologies for the delay in this review.

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • [x ] A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • [x ] Installation instructions: for the development version of package and any non-standard dependencies in README
  • [x ] Vignette(s) demonstrating major functionality that runs successfully locally
  • [x ] Function Documentation: for all exported functions in R help
  • [x ] Examples for all exported functions in R Help that run successfully locally
  • [x ] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

I installed into a fresh R 3.3.1 on Windows, using the Github dev instructions. The devtools github install added rgdal during the process.

I had some issues with some functions, which I submitted as issues - but it may well be a problem specific to a local machine I have - it works on different machine.

ropensci/hddtools#2

ropensci/hddtools#3

Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • [x ] A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • [x ] A statement of need clearly staing problems the software is designed to solve and its target audience.
  • [x ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

There's no list of authors here, but I haven't explored what is required in detail either.

Functionality

  • [x ] Installation: Installation succeeds as documented.
  • [x ] Functionality: Any functional claims of the software been confirmed.
  • [x ] Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • [x ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

I had some quirks with TRMM function download not working correctly on specific machines, I will try to follow it up: ropensci/hddtools#3 (because now it's working for me).

Code coverage is curretly at 55% so I recommend putting in more tests to increse this.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

I'd like to see the examples explore the catalogue data so there's more clarity about what data are available and how to use the filters.

Some catalogue help pages have a full listing of their contents (e.g. SEPA ) but others do not.

Suggestions

  • It would be handy to see links to all the providers of data (GRDC, MOPEX, Hadley etc) in the hddtools-package doc page
  • use raster::extent() rather than a raw list, could migrate to this default by providing an extent method for your raw lists
  • note that bbox is a function in sp, so you might want a different name
  • clarify timeExtent is really an axis of steps, not a range/extent - though I think it is only used as a range (??) so range(timeExtent) would do, the seq(,by=) part is not relevant I think
  • include checks for sense in bounding boxes, I inadvertently tried bb <- list(lonMin=-100,latMin=43,lonMax=-50,latMax=42) which doesn't complain but will never return results (this would be provided automatically by using raster::extent)
  • README example KGClimateClass: KGClimate-class problem hddtools#2
  • examples use x for different calls to catalogueGRDC, please name these objects differently for interactive comparison, and do this throughout examples
  • clarify in comments that filter is bbox plus "keep only those where "statistics" value is 1""
  • leaflet installation should be suggested before library(leaflet)
  • links to GRDC and KGClimate sites, I was looking for descriptions of the columns in catalogueGRDC() and/or a link to the GRDC resources that describe this
  • README: "and, more in general, environmental scientists and practitioners." remove the "more" ?
  • "``Long-Term Mean Monthly Discharges''" in README has funky quotations
  • I suggest not using generic (and repeated) variable names like "x" and "y", use specific names for the data - so an object is standalone from examples
  • use links to URLs in help pages
  • getContent is documented but not exported
  • provide example of how to discover possible values for stationID for tsGRDC, and/or link to catalogueGRDC function in help for stationID argument

E.g.

bb <- list(lonMin=-100,latMin=42,lonMax=-50,latMax=43)

cG <- catalogueGRDC(bbox = bb)

tsg <- tsGRDC(stationID = cG$grdc_no[2])
  • always create the bbox and timeExtent explicitly for each example, so their scope is narrow and easily runnable.
  • show that y can be plotted directly here
hydroRefNumber <- as.numeric(as.character(x$id[1])) # e.g. 55012
y <- tsData60UK(hydroRefNumber, plotOption=FALSE)
plot(y)

  • I ran this twice, expected the download to be cached? It downloads again
y <- tsGRDC(stationID='6122300', plotOption=TRUE)
trying URL 'ftp://ftp.bafg.de/pub/REFERATE/GRDC/ltdata/europe.zip'
  • this example doesn't link the name of the file to the read, trmm_acc.tif could be an output value of TRMM?
bbox <- 
TRMM(timeExtent = timeExtent, bbox = bbox)
library(raster)
b <- brick('trmm_acc.tif')
plot(b)
  • doc in TRMM about value is not complete "Data is loaded as rasterbrick, then converted to a multilayer Geotiff that can" also, it's really the opposite - "multilayer GeoTIFF is downloaded, loaded as a RasterBrick"

Suggest a better workflow is this, but it could go further and build the RasterBrick and return it. The raster object will record that it's linked to a file, so that seems reasonable.

trmmfile <- TRMM(timeExtent = timeExtent, bbox = bbox)
library(raster)
b <- brick(trmmfile)
  • I get this problem, but on one machine it worked fine - I'll try to follow up
 TRMM(timeExtent = timeExtent, bbox = bbox)
latMax of bbox is out of the maximum extent [-50,50], new latMax of bbox is modified as follows: latMax = 50
Beware this function was tested on a unix machine. If you are not using a unix machine your results could be wrong. Please report any problem to [email protected], thanks!
Error in .local(.Object, ...) : 
  Unable to open C:\Users\michae_sum\Documents\projects\hddtools\3B43.*.
No error

 Show Traceback

 Rerun with Debug
 Error in .rasterObjectFromFile(x, objecttype = "RasterBrick", ...) : 
  Cannot create a RasterLayer object from this file.

@noamross
Copy link
Contributor

Thanks for the review, @mdsumner. @cvitolo, let us know if you have any questions.

@cvitolo
Copy link
Author

cvitolo commented Oct 16, 2016

Thanks for the comments @ledell @mdsumner! I'll try and address them all in the next couple of weeks.

@sckott
Copy link
Contributor

sckott commented Nov 4, 2016

@mdsumner do you remember your estimated hours spent reviewing?

@mdsumner
Copy link

mdsumner commented Nov 5, 2016

4.5 hours

@sckott
Copy link
Contributor

sckott commented Nov 5, 2016

thanks much

@cvitolo
Copy link
Author

cvitolo commented Dec 22, 2016

Responses to reviewers' questions

Many thanks to both reviewers (@ledell and @mdsumner) for their extremely valuable suggestions. I realised there were some redundancies: README and vignettes contain the same examples. Therefore I decided to leave only installation and metadata in the README and left all the examples in the vignette.

I have addressed all comments and suggestions, but I have not solved open issues yet (see here). I hope to get this done by the end of the year.

Below are my answers to your suggestions.

REVIEWER 1: @ledell

Package Metadata

Does the package actually not work with R 3.0.1 or less, or was this just the version that was used in development and/or testing?

Thanks for pointing this out! That was just the version I used for development. However I have just tested the package on Windows with various R-Rtools versions and it seems that only stable-patched-oldrel pass all the tests. I have updated the Depends field in the DESCRIPTION file accordingly.

Suggests: testthat
I'd add the rgdal package to at least Suggests and maybe Depends since the KGClimateClass() function depends on it.
I'd possibly consider moving the Imports to Depends for the packages that provide primary utility to the hddtools package.

Thanks, rgdal is now in Depends. I cannot move the other packages to Depends because I get a warning saying 'Depends: includes the non-default packages: ‘zoo’ ‘sp’ ‘RCurl’ ‘XML’ ‘raster’. Adding so many packages to the search path is excessive and importing selectively is preferable.' For this reason I decided to include them all in Imports.

License: GPL-3
Are you familiar with permissive vs non-permissive licencing? A majority of commercial businesses will stray away from GPL-licensed software, so you will have a potentially wider user base if you license with a permissive license such as MIT, BSD or Apache 2.0. Just something to consider as you license your package, however if it's mostly academics who will use this package, then it probably doesn't matter too much. If you want to keep it GPL'd, you may consider with GPL (>= 2) instead of GPL-3 because it will be more compatible with other packages.

I see your point, using a permissive license would definitely widen the range of potential users but my package depends on the raster package which is under GPL-3. I thought I should apply a licence compatible with the most restrictive license of the dependencies.

Documentation
In the description of the package, you state, "This R package is an open source project designed to facilitate non-programmatic access to a variety of online open data sources relevant for hydrologists and, more in general, environmental scientists and practitioners."
When I hear "non-programmatic access", I think of something like a GUI or website. An R API would actually be the converse -- "programmatic access". I would consider changing the wording here.

Thanks, I have removed the expression 'non-programmatic' from the documentation.

Code style
Try to avoid using a dot in variable or function names, as in new.packages. I recommend the following edit to your README.md:
Change:
packs <- c('zoo', 'sp', 'RCurl', 'XML', 'rnrfa', 'Hmisc', 'raster',
'stringr', 'devtools')
new.packages <- packs[!(packs %in% installed.packages()[,'Package'])]
if(length(new.packages)) install.packages(new.packages)
To this:
packs <- c('zoo', 'sp', 'RCurl', 'XML', 'rnrfa', 'Hmisc', 'raster',
'stringr', 'devtools')
new_packs <- packs[!(packs %in% installed.packages()[,'Package'])]
if(length(new_packs)) install.packages(new_packs)

Thanks, this was amended.

The convention for R code is to use spaces around all variable names and equals signs. For example:
This:
bbox <- list(lonMin=-10,latMin=48,lonMax=5,latMax=62)
Instead of this:
bbox <- list(lonMin = -10, latMin = 48, lonMax = 5, latMax = 62)

Thanks, I have applied this convention throughout. Hope I did not miss anything.

Grammar/Spelling
README.md:
"formal request for" -> "a formal request for"
"All those operation" -> "All those operations"

Thanks, this was amended.

LaTeX style quotes are not needed in Markdown: Update quotes ``Long-Term Mean Monthly Discharges'' (in this section)

Thanks, this was amended.

You are missing the name of the function in this sentence, "The hddtools provides a function, called , to download "

Thanks, this was amended.

I think it would be nice to have a few more hyperlinks to the data sources, for example, you could add a link to the GDRC or the Data60UK initiative.

Thanks, I added hyperlinks for all the data providers.

DESCRIPTION
": Many governmental bodies" -> "Many governmental bodies"
"Especially when programmatic access to data resources is not allowed, downloading metadata catalogue, select the information needed, request datasets, de-compression, conversion, manual filtering and parsing can become rather tedious." -> "Especially when programmatic access to data resources is not allowed, downloading a metadata catalogue, selecting the information needed, requesting datasets, de-compression, conversion, manual filtering and parsing can become rather tedious."
"The package facilitate non programmatic access to various online data sources" -> "The package facilitates programmatic access to various online data sources"
"This package complements R's growing functionality in environmental web technologies to bridge the gap" -> "This package complements R's growing functionality in environmental web technologies by bridging the gap"
"data consumers and it is designed to be the starting building block" -> "data consumers. It is designed to be an initial building block"
"Facilitates discovery and handling of hydrological data, non-programmatic access to catalogues and databases." -> "Facilitates discovery and handling of hydrological data via programmatic access to catalogues and databases."

Thanks, these sentences were all amended.

Functionality
There was a missing package when I tried to run the first example:
KGClimateClass(bbox,updatedBy='Peel')
Error:
Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer", :
Cannot create RasterLayer object from this file; perhaps you need to install rgdal first
To remedy this, either add this package to the required/dependent packages in the README.md Installation section, or add a line to install the package right before using the function that depends on it. For example:
install.packages('rgdal')
KGClimateClass(bbox,updatedBy='Peel')

Thanks, rgdal is now in Depends.

Found a possible bug:
library(raster)
Loading required package: sp
b <- brick('trmm_acc.tif')
plot(b)
Error in graphics::par(old.par) :
invalid value specified for graphical parameter "pin"

I cannot reproduce your error here, have you generated the file trmm_acc.tiff using the TRMM() function before running the brick() command?
I have opened an issue here.

Is there any way you can add the size of the datasets (assuming they stay static), to the docmentation for the data downloading functions? For example, this function:
x <- catalogueData60UK()

Thanks for the suggestion, I have added the size of the dataset to the documentation.

My plot for the following function looks quite different, I'm assuming that's because it's different data. I'd add a comment about how the plots will look different to the user, but that the following are examples of what the plots may look like.
y <- tsSEPA(hydroRefNumber = c(234253, 234174, 234305))
plot(y[[1]])

I have added a comment.

Also, these plot all on the same plot (4 graphs on one plot). Is that intentional?
plot(y[[1]])
plot(y[[2]])
plot(y[[3]])

I have changed the plot to show the three time series in different colors.

R Documentation
Is bboxSpatialPolygon a user-level function or an internal function? It's not available to the user, apparently...
library(hddtools)
bbox <- list(lonMin=-180,latMin=-50,lonMax=+180,latMax=+50)
bbSP <- bboxSpatialPolygon(bbox)
Error: could not find function "bboxSpatialPolygon"

bboxSpatialPolygon is now exported and available to the user.

You can add linkable URLS using the \url{} command. Your links in the R docs are not currently hyperlinked.

Links in the R docs are now hyperlinked.

In the R docs for the catalogue functions, I'd save the output to a data.frame. For example:

Retrieve the whole catalogue

x <- catalogueGRDC()
The only one that currently has this is:
x <- catalogueSEPA()

This is now done.

The getContent() function is not exported correctly:
url <- "ftp://hydrology.nws.noaa.gov/pub/gcip/mopex/US_Data/Us_438_Daily"
getContent(url)
Error: could not find function "getContent"

getContent is now exported and available to the user.

The example for tsData60UK() is simply commented out instead of using the "Example Not Run" command.

Thanks, this is amended now.

Sometimes you specify the argument name in the example and sometimes you don't. It might be best to fully specify the arguments in all the examples:
Specified:
x <- tsGRDC(stationID=1107700)
Not specified:
x <- tsMOPEX("14359000")

The arguments in all the examples are now fully specified.

It would be good to note when a hydrometric reference number is a character string because some of the functions use numeric IDs and some use strings.

The hydrometric reference number is now always a string.

I'd change the title/name of "SEPAcatalogue" to "Data set: The SEPA Catalogue" to make sure that it's clear that it's a dataset.

The title of SEPAcatalogue is now Data set: The SEPA Catalogue

It might even be useful to call out any data sources as well using the term, "Data source: [name of catalogue]"

Thanks for the suggestion, the title of the various data sources is now Data source: [name of catalogue].

REVIEWER 2: @mdsumner

I'd like to see the examples explore the catalogue data so there's more clarity about what data are available and how to use the filters.

Thanks for the suggestion, I have now added more examples to the function documentation to clarify hw to use the filters.

Some catalogue help pages have a full listing of their contents (e.g. SEPA) but others do not.

I have now added information to all help pages.

Suggestions
It would be handy to see links to all the providers of data (GRDC, MOPEX, Hadley etc) in the hddtools-package doc page

Done, thanks for the suggestion.

use raster::extent() rather than a raw list, could migrate to this default by providing an extent method for your raw lists

Thanks, hddtools now uses raster::extent() rather than a raw list to define a bounding box.

note that bbox is a function in sp, so you might want a different name

Thanks, this is now called areaBox.

clarify timeExtent is really an axis of steps, not a range/extent - though I think it is only used as a range (??) so range(timeExtent) would do, the seq(,by=) part is not relevant I think

I agree range(timeExtent) could suffice in the case of tsData60UK. However, the sequence from start to end of the time range is needed for the TRMM function because files are stored by month/day. Therefore, for a matter of consistency, I always define the timeExtent as a sequence of dates.

include checks for sense in bounding boxes, I inadvertently tried bb <- list(lonMin=-100,latMin=43,lonMax=-50,latMax=42) which doesn't complain but will never return results (this would be provided automatically by using raster::extent)

Thanks, as mentioned above the bounding box is now defined using raster::extent.

README example KGClimateClass: ropensci/hddtools#2

I'll follow up on this in the issue (#2).

examples use x for different calls to catalogueGRDC, please name these objects differently for interactive comparison, and do this throughout examples

Thanks for spotting this, objects in the examples are now named differently.

clarify in comments that filter is bbox plus "keep only those where "statistics" value is 1

This is now done. Thanks!

leaflet installation should be suggested before library(leaflet)

This was added to the vignette.

links to GRDC and KGClimate sites, I was looking for descriptions of the columns in catalogueGRDC() and/or a link to the GRDC resources that describe this

I have added more information and a link to the help page of the GRDC catalogue.
The KGClimateClass help page contains links to the websites hosting the maps.

README: "and, more in general, environmental scientists and practitioners." remove the "more" ?

This is now done.

"``Long-Term Mean Monthly Discharges''" in README has funky quotations

Funky quotations are now removed.

I suggest not using generic (and repeated) variable names like "x" and "y", use specific names for the data - so an object is standalone from examples

I have changed variable names to something more meaningful.

use links to URLs in help pages

This is now done.

getContent is documented but not exported

This is intentional as it is only a function used internally that wouldn't be of much use on its own.

provide example of how to discover possible values for stationID for tsGRDC, and/or link to catalogueGRDC function in help for stationID argument
E.g.
bb <- list(lonMin=-100,latMin=42,lonMax=-50,latMax=43)
cG <- catalogueGRDC(bbox = bb)
tsg <- tsGRDC(stationID = cG$grdc_no[2])

This is now done.

always create the bbox and timeExtent explicitly for each example, so their scope is narrow and easily runnable.

This is now done.

show that y can be plotted directly here
hydroRefNumber <- as.numeric(as.character(x$id[1])) # e.g. 55012
y <- tsData60UK(hydroRefNumber, plotOption=FALSE)
plot(y)

This is now done.

I ran this twice, expected the download to be cached? It downloads again
y <- tsGRDC(stationID='6122300', plotOption=TRUE)
trying URL 'ftp://ftp.bafg.de/pub/REFERATE/GRDC/ltdata/europe.zip'

No, data is not cached.

this example doesn't link the name of the file to the read, trmm_acc.tif could be an output value of TRMM?
bbox <-
TRMM(timeExtent = timeExtent, bbox = bbox)
library(raster)
b <- brick('trmm_acc.tif')
plot(b)

This is now done.

doc in TRMM about value is not complete "Data is loaded as rasterbrick, then converted to a multilayer Geotiff that can" also, it's really the opposite - "multilayer GeoTIFF is downloaded, loaded as a RasterBrick"

This is now amended.

Suggest a better workflow is this, but it could go further and build the RasterBrick and return it. The raster object will record that it's linked to a file, so that seems reasonable.
trmmfile <- TRMM(timeExtent = timeExtent, bbox = bbox)
library(raster)
b <- brick(trmmfile)

This is now done.

I get this problem, but on one machine it worked fine - I'll try to follow up
TRMM(timeExtent = timeExtent, bbox = bbox)
latMax of bbox is out of the maximum extent [-50,50], new latMax of bbox is modified as follows: latMax = 50
Beware this function was tested on a unix machine. If you are not using a unix machine your results could be wrong. Please report any problem to [email protected], thanks!
Error in .local(.Object, ...) :
Unable to open C:\Users\michae_sum\Documents\projects\hddtools\3B43.*.
No error
Show Traceback
Rerun with Debug
Error in .rasterObjectFromFile(x, objecttype = "RasterBrick", ...) :
Cannot create a RasterLayer object from this file.

I'll follow up on this in the issue (#3).

@cvitolo
Copy link
Author

cvitolo commented Dec 23, 2016

I have now addressed the following issues:

  • KGClimate-class problem #2
  • problem with TRMM on Windows and Linux #3
  • Error in graphics::par(old.par) #4

Could you please check if the problems persist? Many thanks!

@cvitolo
Copy link
Author

cvitolo commented Jan 3, 2017

@sckott @ledell @mdsumner First of all happy new year!

I have done my best to address all the comments and issues. Please let me know how you want to proceed. Many thanks for your insights and comments, they greatly improved this package!

@noamross
Copy link
Contributor

noamross commented Jan 3, 2017

Thanks for the updates, @cvitolo! @ledell and @mdsumner, can you look at the updates to see if all your concerns have been addressed?

@mdsumner
Copy link

mdsumner commented Jan 9, 2017

I think it's fine, well done!

@ledell
Copy link

ledell commented Jan 11, 2017

Looks good!

@cvitolo
Copy link
Author

cvitolo commented Jan 12, 2017

Many thanks @ledell and @mdsumner, your suggestions greatly improved the package!

@noamross Is there anything else needed on my side? Maybe create a new release? Many thanks!

@noamross
Copy link
Contributor

I'm traveling but will do final editor's checks early next week and then will have instructions for staffing up!

@noamross
Copy link
Contributor

Er, wrapping up.

@cvitolo
Copy link
Author

cvitolo commented Jan 12, 2017

great, thanks!

@noamross
Copy link
Contributor

Approved! Remaining steps:

  • You need rmarkdown and knitr in Suggests
  • You need an entry in your DESCRIPTION file of: VignetteBuilder: knitr
  • I've added you to an hddtools team for the rOpenSci repository. Accept and transfer the repo (under "Settings") to rOpenSci, and then update any links for CI, coverage, and documentation.
  • Create a new release and provide the updated Zenodo DOI if needed. I'll see if we can re-start the JOSS process without the need to submit again.
  • Let me know if you are interested in doing a blog post about your package on the rOpenSci blog, either long-form (https://ropensci.org/blog/) or short (https://ropensci.org/tech-notes/)

@cvitolo
Copy link
Author

cvitolo commented Jan 16, 2017

Hi @noamross,

I have updated the DESCRIPTION file as you requested, transferred the repository to rOpenSci, created a new release and updated the links.

Unfortunately, the new repo ropensci/hddtools is not appearing in Zenodo and I cannot generate a new DOI. I suppose you need to allow Zenodo application to access the new repository. Would it be possible to do that please?

Also, I have a website in the docs folder, would you mind to give me access to the settings of this new repo so that I can set up the website to be rendered in github?

Happy to write a blog post about my package.

Many thanks!

Claudia

@noamross
Copy link
Contributor

noamross commented Jan 16, 2017

You should have admin access to the ropensci repository now. Can you tell me whether you are able to give Zenodo access?

@cvitolo
Copy link
Author

cvitolo commented Jan 16, 2017

@noamross I do not see the 'settings' anymore. Therefore I cannot give zenodo access.

@cvitolo
Copy link
Author

cvitolo commented Jan 16, 2017

See screenshot here

@noamross
Copy link
Contributor

Ah, I think I gave access to onboarding instead of hddtools as admin! Give a refresh?

@cvitolo
Copy link
Author

cvitolo commented Jan 16, 2017

I can see the settings now, thanks!

@cvitolo
Copy link
Author

cvitolo commented Jan 16, 2017

@noamross the new Zenodo DOI is 10.5281/zenodo.247842

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants