Skip to content

Commit

Permalink
Discuss colocated compute
Browse files Browse the repository at this point in the history
  • Loading branch information
mikemahoney218 committed Aug 16, 2024
1 parent 0608e3f commit 55ad0d9
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 15 deletions.
2 changes: 2 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ The goal of rsi is to address several **r**epeated **s**patial **i**nfelicities,
+ A method for downloading STAC data -- excuse me, **r**etriving **S**TAC **i**nformation -- from any STAC server, with additional helpers for downloading Landsat, Sentinel-1, and Sentinel-2 data from free and public STAC servers providing **r**apid **s**atellite **i**magery,
+ A **r**aster **s**tack **i**ntegration method for combining multiple rasters containing distinct data sets into a single raster stack.

The functions in rsi are designed around letting you use the tools you're familiar with to process raster data using compute that you control -- whether that means grabbing imagery with your laptop to add some context to a map, or grabbing tranches of data to a virtual server hosted near your data provider for lightning fast downloads. The outputs from rsi functions are standard objects -- usually the file paths of raster files saved to your hard drive -- meaning it's easy to incorporate rsi into broader spatial data processing workflows.

## Installation

You can install rsi via:
Expand Down
40 changes: 26 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
[![R-CMD-check](https://github.com/Permian-Global-Research/rsi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Permian-Global-Research/rsi/actions/workflows/R-CMD-check.yaml)
[![Codecov test
coverage](https://codecov.io/gh/Permian-Global-Research/rsi/branch/main/graph/badge.svg)](https://app.codecov.io/gh/Permian-Global-Research/rsi?branch=main)
[![License: Apache
2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/license/apache-2-0)
[![License:
Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/license/apache-2-0)
[![Lifecycle:
maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing)
[![Project Status: Active – The project has reached a stable, usable
Expand All @@ -27,17 +27,29 @@ The goal of rsi is to address several **r**epeated **s**patial
and help avoid **r**epetitive **s**tress **i**njuries. Specifically, rsi
provides:

- An interface to the **R**some – excuse me, [*Awesome* Spectral Indices
project](https://github.com/awesome-spectral-indices/awesome-spectral-indices),
providing the list of indices directly in R as a friendly tibble,
- A method for efficiently *calculating* those awesome spectral indices
using local rasters, enabling **r**apid **s**pectral **i**nference,
- A method for downloading STAC data – excuse me, **r**etriving **S**TAC
**i**nformation – from any STAC server, with additional helpers for
downloading Landsat, Sentinel-1, and Sentinel-2 data from free and
public STAC servers providing **r**apid **s**atellite **i**magery,
- A **r**aster **s**tack **i**ntegration method for combining multiple
rasters containing distinct data sets into a single raster stack.
- An interface to the **R**some – excuse me, [*Awesome* Spectral
Indices
project](https://github.com/awesome-spectral-indices/awesome-spectral-indices),
providing the list of indices directly in R as a friendly tibble,
- A method for efficiently *calculating* those awesome spectral
indices using local rasters, enabling **r**apid **s**pectral
**i**nference,
- A method for downloading STAC data – excuse me, **r**etriving
**S**TAC **i**nformation – from any STAC server, with additional
helpers for downloading Landsat, Sentinel-1, and Sentinel-2 data
from free and public STAC servers providing **r**apid **s**atellite
**i**magery,
- A **r**aster **s**tack **i**ntegration method for combining multiple
rasters containing distinct data sets into a single raster stack.

The functions in rsi are designed around letting you use the tools
you’re familiar with to process raster data using compute that you
control – whether that means grabbing imagery with your laptop to add
some context to a map, or grabbing tranches of data to a virtual server
hosted near your data provider for lightning fast downloads. The outputs
from rsi functions are standard objects – usually the file paths of
raster files saved to your hard drive – meaning it’s easy to incorporate
rsi into broader spatial data processing workflows.

## Installation

Expand Down Expand Up @@ -201,7 +213,7 @@ other multi-band rasters from various data sources.

## Contributing

We love contributions! See our [contribution
We love contributions\! See our [contribution
guide](https://github.com/Permian-Global-Research/rsi/blob/main/.github/CONTRIBUTING.md)
for pointers on how to make your contribution as easy to accept as
possible – in particular, consider [opening an
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -355,4 +355,6 @@ If we know how to express our desired query in CQL2, we can write arbitrarily co

rsi is not nearly the only package aiming to help R users take advantage of STAC APIs and build cloud-native geospatial workflows. As shown above, rsi is fundamentally built on top of the excellent rstac package, which I think is a fantastic tool for interactively exploring STAC APIs as well as building and executing queries. My hope is that rsi can provide a useful layer of abstraction over rstac for efficiently downloading assets and performing some of the most common rescaling, masking, and compositing tasks involved in standard data processing workflows.

There are also several other packages which also implement workflows for efficiently accessing and processing data from STAC endpoints, among them the [gdalcubes](https://gdalcubes.github.io/) and [sits](https://e-sensing.github.io/sitsbook/) packages. A core difference between rsi and these packages is that rsi does not have a data model: rsi is focused entirely on finding the bits of data you want from remote endpoints, and getting those bits on your local machine for you to process with your normal spatial data tooling. There are no new classes in rsi (other than the band mapping objects), and the outputs of functions are local rasters. This is an approach that fits better in my head than the more abstract delayed computations in some other packages; at the same time, it's possible that this approach can be less efficient, downloading more data at finer resolutions than is actually needed for a given task. As a result, users need to make sure they're only requesting data they actually need.
There are also several other packages which also implement workflows for efficiently accessing and processing data from STAC endpoints, among them the [gdalcubes](https://gdalcubes.github.io/) and [sits](https://e-sensing.github.io/sitsbook/) packages. A core difference between rsi and these packages is that rsi does not have a data model: rsi is focused entirely on finding the bits of data you want from remote endpoints, and getting those bits on your local machine for you to process with your normal spatial data tooling. There are no new classes in rsi (other than the band mapping objects), and the outputs of functions are local rasters. This is an approach that fits better in my head than the more abstract delayed computations in some other packages; at the same time, it's possible that this approach can be less efficient, downloading more data at finer resolutions than is actually needed for a given task.

To make data-downloading functions run faster, make sure you're only requesting the data you actually need -- often restricting the number of assets to be downloaded from each item, specifying a more precise bounding box, or using a narrower date range can cut the amount of data to be downloaded significantly. If there's no way around downloading a large amount of data, consider if running your code on a virtual (cloud) server that's physically closer to your data makes sense for your use case. Many data providers publish information on where their collections are hosted (with cloud providers like Microsoft naming the specific region it's in), and using servers that are closer to the data can result in notably faster download speeds.

0 comments on commit 55ad0d9

Please sign in to comment.