-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hddtools package #73
Comments
Editor checks:
Editor commentsThank you for the submission, @cvitolo! Before we progress to finding reviewers, we would like the package to have a test suite with a set of tests that cover a good portion of the code. I see unit tests mentioned in your NEWS file - perhaps they're just not committed for some reason? In addition, I ran
UPDATE: Here is the package coverage report, now that test are included:
|
Thanks @noamross! You are right, I forgot to remove the tests folder from the .gitignore file. I have also amended the package based on the above suggestions. Please let me know if there is any outstanding issue to resolve. |
@cvitolo just some unsolicited input here, I noticed your mention of bboxSpatialPolygon: I recently put "spex" package on CRAN which tries to fill this gap. An extra package depend is probably not welcome, but thought it might be helpful. I am constantly repeating the pattern of as(extent(x) , "SpatialPolygons") then putting on the CRS again etc. (Really, extent() should have CRS, and though the lower level sp::Spatial() does have it, we usually want an actual object and not just the abstraction). |
Thanks, @cvitolo. Now seeking reviewers. |
@cvitolo I see that you submitted this package to JOSS separately: openjournals/joss-reviews#56. This is fine but the "automatic submission" option we have up top envisions a process where the package goes to JOSS after our review (so that they can approve based on our reviews, removing an additional review step). If this makes sense to you, perhaps you should pause the review process over at JOSS? |
Of course, I just let JOSS know I have submitted to ropensci as well. |
Hello, sorry but it's going to take me a little longer. No more than a few |
Working on my review now... |
@noamross @cvitolo Here is my review: Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 4 Review Commentshddtools, which stands for Hydrological Data Discovery Tools, is an R interface to several online hydrology dataset repositories. The package is useful and relevant particularly to hydrologists, but also to environmental scientists and practitioners in general. This package simplifies the following processes for the user: download of a metadata catalogue, selection of information needed, formal request for dataset(s), de-compression, conversion, manual filtering and parsing. The package also can provide offline modes in addition to online modes, which makes it efficient to use if the user regularly works with the same datasets (no need to re-download on each session when data licenses allow for redistribution). hddtools fills a gap in R data tools for hydrologists and environmental scientists alike and would be a good contribution to the rOpenSci package ecosystem. Package Metadata
Does the package actually not work with R 3.0.1 or less, or was this just the version that was used in development and/or testing?
I'd add the rgdal package to at least Suggests and maybe Depends since the I'd possibly consider moving the Imports to Depends for the packages that provide primary utility to the hddtools package.
Are you familiar with permissive vs non-permissive licencing? A majority of commercial businesses will stray away from GPL-licensed software, so you will have a potentially wider user base if you license with a permissive license such as MIT, BSD or Apache 2.0. Just something to consider as you license your package, however if it's mostly academics who will use this package, then it probably doesn't matter too much. If you want to keep it GPL'd, you may consider with GPL (>= 2) instead of GPL-3 because it will be more compatible with other packages. DocumentationIn the description of the package, you state, "This R package is an open source project designed to facilitate non-programmatic access to a variety of online open data sources relevant for hydrologists and, more in general, environmental scientists and practitioners." When I hear "non-programmatic access", I think of something like a GUI or website. An R API would actually be the converse -- "programmatic access". I would consider changing the wording here. Code styleTry to avoid using a dot in variable or function names, as in Change: packs <- c('zoo', 'sp', 'RCurl', 'XML', 'rnrfa', 'Hmisc', 'raster',
'stringr', 'devtools')
new.packages <- packs[!(packs %in% installed.packages()[,'Package'])]
if(length(new.packages)) install.packages(new.packages) To this: packs <- c('zoo', 'sp', 'RCurl', 'XML', 'rnrfa', 'Hmisc', 'raster',
'stringr', 'devtools')
new_packs <- packs[!(packs %in% installed.packages()[,'Package'])]
if(length(new_packs)) install.packages(new_packs) The convention for R code is to use spaces around all variable names and equals signs. For example: This: bbox <- list(lonMin=-10,latMin=48,lonMax=5,latMax=62) Instead of this: bbox <- list(lonMin = -10, latMin = 48, lonMax = 5, latMax = 62) Grammar/Spelling
I think it would be nice to have a few more hyperlinks to the data sources, for example, you could add a link to the GDRC or the Data60UK initiative.
FunctionalityThere was a missing package when I tried to run the first example: KGClimateClass(bbox,updatedBy='Peel') Error:
To remedy this, either add this package to the required/dependent packages in the README.md Installation section, or add a line to install the package right before using the function that depends on it. For example: install.packages('rgdal')
KGClimateClass(bbox,updatedBy='Peel') Found a possible bug: > library(raster)
Loading required package: sp
> b <- brick('trmm_acc.tif')
> plot(b)
Error in graphics::par(old.par) :
invalid value specified for graphical parameter "pin" Is there any way you can add the size of the datasets (assuming they stay static), to the docmentation for the data downloading functions? For example, this function: x <- catalogueData60UK() My plot for the following function looks quite different, I'm assuming that's because it's different data. I'd add a comment about how the plots will look different to the user, but that the following are examples of what the plots may look like. y <- tsSEPA(hydroRefNumber = c(234253, 234174, 234305))
plot(y[[1]]) Also, these plot all on the same plot (4 graphs on one plot). Is that intentional? > plot(y[[1]])
> plot(y[[2]])
> plot(y[[3]])
> x <- HadDAILY()
> plot(x$EWP) R DocumentationIs
You can add linkable URLS using the In the R docs for the catalogue functions, I'd save the output to a data.frame. For example: # Retrieve the whole catalogue
x <- catalogueGRDC() The only one that currently has this is: x <- catalogueSEPA() The > url <- "ftp://hydrology.nws.noaa.gov/pub/gcip/mopex/US_Data/Us_438_Daily"
> getContent(url)
Error: could not find function "getContent" The example for Sometimes you specify the argument name in the example and sometimes you don't. It might be best to fully specify the arguments in all the examples: Specified: x <- tsGRDC(stationID=1107700) Not specified: x <- tsMOPEX("14359000") It would be good to note when a hydrometric reference number is a character string because some of the functions use numeric IDs and some use strings. I'd change the title/name of "SEPAcatalogue" to "Data set: The SEPA Catalogue" to make sure that it's clear that it's a dataset. It might even be useful to call out any data sources as well using the term, "Data source: [name of catalogue]" |
Thanks @ledell, for a thorough review! One quick thing - could you also look at the short paper (in |
@mdsumner - hey there, it's been 31 days, please get your review in soon, thanks 😺 (ropensci-bot) |
Thanks for your review @ledell! I'll start working on the changes as soon as both reviews are in. |
@mdsumner - hey there, it's been 36 days, please get your review in soon, thanks 😺 (ropensci-bot) |
Package ReviewThe package I have made a number of detailed suggestions below, and I'm happy to follow up and help with any of those. Finally, apologies for the delay in this review.
DocumentationThe package includes all the following forms of documentation:
I installed into a fresh R 3.3.1 on Windows, using the Github dev instructions. The devtools github install added rgdal during the process. I had some issues with some functions, which I submitted as issues - but it may well be a problem specific to a local machine I have - it works on different machine.
There's no list of authors here, but I haven't explored what is required in detail either. Functionality
I had some quirks with TRMM function download not working correctly on specific machines, I will try to follow it up: ropensci/hddtools#3 (because now it's working for me). Code coverage is curretly at 55% so I recommend putting in more tests to increse this. Final approval (post-review)
Estimated hours spent reviewing: Review CommentsI'd like to see the examples explore the catalogue data so there's more clarity about what data are available and how to use the filters. Some catalogue help pages have a full listing of their contents (e.g. SEPA ) but others do not. Suggestions
E.g. bb <- list(lonMin=-100,latMin=42,lonMax=-50,latMax=43)
cG <- catalogueGRDC(bbox = bb)
tsg <- tsGRDC(stationID = cG$grdc_no[2])
Suggest a better workflow is this, but it could go further and build the RasterBrick and return it. The raster object will record that it's linked to a file, so that seems reasonable.
|
@mdsumner do you remember your estimated hours spent reviewing? |
4.5 hours |
thanks much |
Responses to reviewers' questionsMany thanks to both reviewers (@ledell and @mdsumner) for their extremely valuable suggestions. I realised there were some redundancies: README and vignettes contain the same examples. Therefore I decided to leave only installation and metadata in the README and left all the examples in the vignette. I have addressed all comments and suggestions, but I have not solved open issues yet (see here). I hope to get this done by the end of the year. Below are my answers to your suggestions. REVIEWER 1: @ledellPackage Metadata
Thanks for pointing this out! That was just the version I used for development. However I have just tested the package on Windows with various R-Rtools versions and it seems that only stable-patched-oldrel pass all the tests. I have updated the Depends field in the DESCRIPTION file accordingly.
Thanks, rgdal is now in Depends. I cannot move the other packages to Depends because I get a warning saying 'Depends: includes the non-default packages: ‘zoo’ ‘sp’ ‘RCurl’ ‘XML’ ‘raster’. Adding so many packages to the search path is excessive and importing selectively is preferable.' For this reason I decided to include them all in Imports.
I see your point, using a permissive license would definitely widen the range of potential users but my package depends on the raster package which is under GPL-3. I thought I should apply a licence compatible with the most restrictive license of the dependencies.
Thanks, I have removed the expression 'non-programmatic' from the documentation.
Thanks, this was amended.
Thanks, I have applied this convention throughout. Hope I did not miss anything.
Thanks, this was amended.
Thanks, this was amended.
Thanks, this was amended.
Thanks, I added hyperlinks for all the data providers.
Thanks, these sentences were all amended.
Thanks, rgdal is now in Depends.
I cannot reproduce your error here, have you generated the file trmm_acc.tiff using the TRMM() function before running the brick() command?
Thanks for the suggestion, I have added the size of the dataset to the documentation.
I have added a comment.
I have changed the plot to show the three time series in different colors.
bboxSpatialPolygon is now exported and available to the user.
Links in the R docs are now hyperlinked.
This is now done.
getContent is now exported and available to the user.
Thanks, this is amended now.
The arguments in all the examples are now fully specified.
The hydrometric reference number is now always a string.
The title of SEPAcatalogue is now Data set: The SEPA Catalogue
Thanks for the suggestion, the title of the various data sources is now Data source: [name of catalogue]. REVIEWER 2: @mdsumner
Thanks for the suggestion, I have now added more examples to the function documentation to clarify hw to use the filters.
I have now added information to all help pages.
Done, thanks for the suggestion.
Thanks, hddtools now uses
Thanks, this is now called
I agree range(timeExtent) could suffice in the case of
Thanks, as mentioned above the bounding box is now defined using
I'll follow up on this in the issue (#2).
Thanks for spotting this, objects in the examples are now named differently.
This is now done. Thanks!
This was added to the vignette.
I have added more information and a link to the help page of the GRDC catalogue.
This is now done.
Funky quotations are now removed.
I have changed variable names to something more meaningful.
This is now done.
This is intentional as it is only a function used internally that wouldn't be of much use on its own.
This is now done.
This is now done.
This is now done.
No, data is not cached.
This is now done.
This is now amended.
This is now done.
I'll follow up on this in the issue (#3). |
I think it's fine, well done! |
Looks good! |
I'm traveling but will do final editor's checks early next week and then will have instructions for staffing up! |
Er, wrapping up. |
great, thanks! |
Approved! Remaining steps:
|
Hi @noamross, I have updated the DESCRIPTION file as you requested, transferred the repository to rOpenSci, created a new release and updated the links. Unfortunately, the new repo ropensci/hddtools is not appearing in Zenodo and I cannot generate a new DOI. I suppose you need to allow Zenodo application to access the new repository. Would it be possible to do that please? Also, I have a website in the docs folder, would you mind to give me access to the settings of this new repo so that I can set up the website to be rendered in github? Happy to write a blog post about my package. Many thanks! Claudia |
You should have admin access to the ropensci repository now. Can you tell me whether you are able to give Zenodo access? |
@noamross I do not see the 'settings' anymore. Therefore I cannot give zenodo access. |
See screenshot here |
Ah, I think I gave access to onboarding instead of hddtools as admin! Give a refresh? |
I can see the settings now, thanks! |
@noamross the new Zenodo DOI is 10.5281/zenodo.247842 |
Summary
This R package is an open source project designed to facilitate non-programmatic access to a variety of online open data sources: the Global Runoff Data Center (GRDC), the Scottish Environment Protection Agency (SEPA), the Top-Down modelling Working Group (Data60UK and MOPEX), Met Office Hadley Centre Observation Data (HadUKP Data) and NASA's Tropical Rainfall Measuring Mission (TRMM).
https://github.com/cvitolo/hddtools/
Hydrologists, environmental scientists and practitioners.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.http://dx.doi.org/10.5281/zenodo.61570
Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:@masalmon
The text was updated successfully, but these errors were encountered: