Discuss colocated compute

Permian-Global-Research · Aug 16, 2024 · 55ad0d9 · 55ad0d9
1 parent 0608e3f
commit 55ad0d9
Show file tree

Hide file tree

Showing 3 changed files with 31 additions and 15 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -34,6 +34,8 @@ The goal of rsi is to address several **r**epeated **s**patial **i**nfelicities,
 + A method for downloading STAC data -- excuse me, **r**etriving **S**TAC **i**nformation -- from any STAC server, with additional helpers for downloading Landsat, Sentinel-1, and Sentinel-2 data from free and public STAC servers providing **r**apid **s**atellite **i**magery, 
 + A **r**aster **s**tack **i**ntegration method for combining multiple rasters containing distinct data sets into a single raster stack.
 
+The functions in rsi are designed around letting you use the tools you're familiar with to process raster data using compute that you control -- whether that means grabbing imagery with your laptop to add some context to a map, or grabbing tranches of data to a virtual server hosted near your data provider for lightning fast downloads. The outputs from rsi functions are standard objects -- usually the file paths of raster files saved to your hard drive -- meaning it's easy to incorporate rsi into broader spatial data processing workflows.
+
 ## Installation
 
 You can install rsi via:

diff --git a/README.md b/README.md
@@ -8,8 +8,8 @@
 [![R-CMD-check](https://github.com/Permian-Global-Research/rsi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Permian-Global-Research/rsi/actions/workflows/R-CMD-check.yaml)
 [![Codecov test
 coverage](https://codecov.io/gh/Permian-Global-Research/rsi/branch/main/graph/badge.svg)](https://app.codecov.io/gh/Permian-Global-Research/rsi?branch=main)
-[![License: Apache
-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/license/apache-2-0)
+[![License:
+Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/license/apache-2-0)
 [![Lifecycle:
 maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing)
 [![Project Status: Active – The project has reached a stable, usable
@@ -27,17 +27,29 @@ The goal of rsi is to address several **r**epeated **s**patial
 and help avoid **r**epetitive **s**tress **i**njuries. Specifically, rsi
 provides:
 
-- An interface to the **R**some – excuse me, [*Awesome* Spectral Indices
-  project](https://github.com/awesome-spectral-indices/awesome-spectral-indices),
-  providing the list of indices directly in R as a friendly tibble,
-- A method for efficiently *calculating* those awesome spectral indices
-  using local rasters, enabling **r**apid **s**pectral **i**nference,
-- A method for downloading STAC data – excuse me, **r**etriving **S**TAC
-  **i**nformation – from any STAC server, with additional helpers for
-  downloading Landsat, Sentinel-1, and Sentinel-2 data from free and
-  public STAC servers providing **r**apid **s**atellite **i**magery,
-- A **r**aster **s**tack **i**ntegration method for combining multiple
-  rasters containing distinct data sets into a single raster stack.
+  - An interface to the **R**some – excuse me, [*Awesome* Spectral
+    Indices
+    project](https://github.com/awesome-spectral-indices/awesome-spectral-indices),
+    providing the list of indices directly in R as a friendly tibble,
+  - A method for efficiently *calculating* those awesome spectral
+    indices using local rasters, enabling **r**apid **s**pectral
+    **i**nference,
+  - A method for downloading STAC data – excuse me, **r**etriving
+    **S**TAC **i**nformation – from any STAC server, with additional
+    helpers for downloading Landsat, Sentinel-1, and Sentinel-2 data
+    from free and public STAC servers providing **r**apid **s**atellite
+    **i**magery,
+  - A **r**aster **s**tack **i**ntegration method for combining multiple
+    rasters containing distinct data sets into a single raster stack.
+
+The functions in rsi are designed around letting you use the tools
+you’re familiar with to process raster data using compute that you
+control – whether that means grabbing imagery with your laptop to add
+some context to a map, or grabbing tranches of data to a virtual server
+hosted near your data provider for lightning fast downloads. The outputs
+from rsi functions are standard objects – usually the file paths of
+raster files saved to your hard drive – meaning it’s easy to incorporate
+rsi into broader spatial data processing workflows.
 
 ## Installation
 
@@ -201,7 +213,7 @@ other multi-band rasters from various data sources.
 
 ## Contributing
 
-We love contributions! See our [contribution
+We love contributions\! See our [contribution
 guide](https://github.com/Permian-Global-Research/rsi/blob/main/.github/CONTRIBUTING.md)
 for pointers on how to make your contribution as easy to accept as
 possible – in particular, consider [opening an

diff --git a/vignettes/articles/Downloading-data-from-STAC-APIs-using-rsi.Rmd b/vignettes/articles/Downloading-data-from-STAC-APIs-using-rsi.Rmd
@@ -355,4 +355,6 @@ If we know how to express our desired query in CQL2, we can write arbitrarily co
 
 rsi is not nearly the only package aiming to help R users take advantage of STAC APIs and build cloud-native geospatial workflows. As shown above, rsi is fundamentally built on top of the excellent rstac package, which I think is a fantastic tool for interactively exploring STAC APIs as well as building and executing queries. My hope is that rsi can provide a useful layer of abstraction over rstac for efficiently downloading assets and performing some of the most common rescaling, masking, and compositing tasks involved in standard data processing workflows.
 
-There are also several other packages which also implement workflows for efficiently accessing and processing data from STAC endpoints, among them the [gdalcubes](https://gdalcubes.github.io/) and [sits](https://e-sensing.github.io/sitsbook/) packages. A core difference between rsi and these packages is that rsi does not have a data model: rsi is focused entirely on finding the bits of data you want from remote endpoints, and getting those bits on your local machine for you to process with your normal spatial data tooling. There are no new classes in rsi (other than the band mapping objects), and the outputs of functions are local rasters. This is an approach that fits better in my head than the more abstract delayed computations in some other packages; at the same time, it's possible that this approach can be less efficient, downloading more data at finer resolutions than is actually needed for a given task. As a result, users need to make sure they're only requesting data they actually need.
+There are also several other packages which also implement workflows for efficiently accessing and processing data from STAC endpoints, among them the [gdalcubes](https://gdalcubes.github.io/) and [sits](https://e-sensing.github.io/sitsbook/) packages. A core difference between rsi and these packages is that rsi does not have a data model: rsi is focused entirely on finding the bits of data you want from remote endpoints, and getting those bits on your local machine for you to process with your normal spatial data tooling. There are no new classes in rsi (other than the band mapping objects), and the outputs of functions are local rasters. This is an approach that fits better in my head than the more abstract delayed computations in some other packages; at the same time, it's possible that this approach can be less efficient, downloading more data at finer resolutions than is actually needed for a given task. 
+
+To make data-downloading functions run faster, make sure you're only requesting the data you actually need -- often restricting the number of assets to be downloaded from each item, specifying a more precise bounding box, or using a narrower date range can cut the amount of data to be downloaded significantly. If there's no way around downloading a large amount of data, consider if running your code on a virtual (cloud) server that's physically closer to your data makes sense for your use case. Many data providers publish information on where their collections are hosted (with cloud providers like Microsoft naming the specific region it's in), and using servers that are closer to the data can result in notably faster download speeds.