Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sign function for Copernicus Data Space STAC #9

Open
mateuszrydzik opened this issue Nov 26, 2023 · 5 comments
Open

sign function for Copernicus Data Space STAC #9

mateuszrydzik opened this issue Nov 26, 2023 · 5 comments
Labels
feature a feature request or enhancement
Milestone

Comments

@mateuszrydzik
Copy link

I tried to access data from Copernicus Data Space Ecosystem STAC using get_stac_data(), but it ends up returning 401 responses:

copernicus <- get_stac_data(
  aoi,
  start_date = "2023-01-01",
  end_date = "2023-01-31",
  stac_source = "https://catalogue.dataspace.copernicus.eu/stac/",
  collection = "SENTINEL-2",
  output_filename = tempfile(fileext = ".tif"),
  query_function = query_planetary_computer
)

If i understand correctly, both query_planetary_computer() and sign_planetary_computer() are based on rstac, which provides sign functions for only Planetary Computer and Brazil Data Cube. Are you planning on adding additional sign functions for other STAC APIs as a part of rsi library, or will it be more dependent or rstac development?

@mikemahoney218
Copy link
Collaborator

I'd recommend opening an issue on rstac, which would be a more natural place for this function to live. If they don't respond or don't want to add a signing function, then maybe it could live in rsi, but I think rstac would make more sense.

As for getting that to work with rsi: the actual sign_planetary_computer() function is extremely straightforward, and basically just handles automatically using your PC credentials if they exist:

> rsi::sign_planetary_computer
function(items, subscription_key = Sys.getenv("rsi_pc_key")) {
  if (subscription_key == "") {
    rstac::items_sign(items, rstac::sign_planetary_computer())
  } else {
    rstac::items_sign(
      items,
      rstac::sign_planetary_computer(
        headers = c("Ocp-Apim-Subscription-Key" = subscription_key)
      )
    )
  }
}

And query_planetary_computer() is even simpler -- the main reason it exists is that some versions of AWS' STAC endpoints require post_request() rather than get_request(), and it feels nicer to be able to name the data source rather than needing to just know what HTTP method you need:

> rsi::query_planetary_computer
function(q, subscription_key = Sys.getenv("rsi_pc_key")) {
  rstac::get_request(q)
}

So if you know how to sign these items, and you know what HTTP method they need, it should be pretty straightforward to make rsi work with this endpoint.

Can you share the link to the STAC API you're using here? I'll confess I get pretty mixed up with all the different ESA endpoints.

@mateuszrydzik
Copy link
Author

Thanks for the reply.

I checked rstac::get_request() and found that you can pass in httr::add_headers() with required tokens (e.g. get_request(add_headers("x-api-key" = "MY-TOKEN")). I will test if I can get any use out of it. If not, as you recommended, I will move this issue into rstac.

As for the Dataspace API, here is the link for the documentation with some examples. https://documentation.dataspace.copernicus.eu/APIs/STAC.html

The main catalog can be accessed with this link https://catalogue.dataspace.copernicus.eu/stac/
As an example, Sentinel-2 items are stored in https://catalogue.dataspace.copernicus.eu/stac/collections/SENTINEL-2/items

@mikemahoney218
Copy link
Collaborator

Thank you! I'm not saying I'll add support for the Dataspace API soon, but I think it would make sense for there to at least be support in sentinel2_band_mapping for that endpoint, and will do that at some point (with a soft target of "before the first CRAN release in early 2024").

@mikemahoney218 mikemahoney218 added this to the 0.1.0 milestone Jan 2, 2024
@mikemahoney218
Copy link
Collaborator

Ah, I remember the issue now: the Dataspace STAC API returns assets that link to its OData service, which would require a different approach to downloading than the other STAC APIs that rsi currently works with.

The core issues are:

  1. the API requires passing tokens as headers, rather than signing URLs;
  2. data is returned as zip files,
  3. users are limited to only four downloads at a time,
  4. the API is so slow,
  5. links to assets time out for non-obvious reasons,
  6. items are composed of a preview image, and then a zipped tile of (presumably) all other relevant data

Dealing with issue 1 should be possible; GDAL can use a file of key: value headers when downloading via curl.

Dealing with issue 2 might be possible by using /vsizip/ when downloading from the Copernicus API. I'm still waiting for my trial download to finish to inspect what's actually in the downloaded file.

Dealing with issue 3 gets tricky, and will require not using (or at least limiting) parallelism when downloading from this API.

Issue 4 seems intractable 😆

Issue 5 might be due to trying a URL and getting a 401, then trying again; it might also be due to a time-out. The first of these is easy to deal with (don't retry failed downloads), the second would be harder (would need to have a way to re-query the API one downloads started failing).

Issue 6 just makes this endpoint unappealing, since users can't filter their downloads to only the relevant bands. Maybe if the vsizip trick works, this is something that can be controlled via the -b flag to gdalwarp?

All this said: I think this would take a bigger rewrite than I had expected, and wouldn't be super useful due to how slow the API is and the fact it returns zipped versions of entire tiles. I'm going to move this off the 0.1.0 milestone but leave the issue open, in case the API changes to use a more... normal way of sharing assets, or someone finds an easy way to work with this API in the same way as other STAC APIs.

Pure rstac example of downloading from this API (assuming the rsi_cdse_key envvar is an access token):

nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))
ashe <- nc[1, ] |> 
  sf::st_transform(4326) |> 
  sf::st_bbox()

items <- rstac::stac("https://catalogue.dataspace.copernicus.eu/stac") |> 
  rstac::stac_search(
    collections = "SENTINEL-2",
    datetime = "2021-01-01/2021-12-31",
    bbox = ashe
  ) |> 
  rstac::get_request()
  
items$features <- items$features[1]

items |> 
  rstac::assets_download(
    config = httr::add_headers(
      Authorization = paste("Bearer", Sys.getenv("rsi_cdse_key"))
    )
  )

@mikemahoney218 mikemahoney218 removed this from the 0.1.0 milestone Jan 3, 2024
@mikemahoney218 mikemahoney218 added the feature a feature request or enhancement label Jan 3, 2024
@mikemahoney218
Copy link
Collaborator

A bit more context now that my download finished -- it seems like it took about 40 minutes on my residential connection to download just under 1GB of data.

I'm seeing now that GDAL's Sentinel-2 driver understands how to process these zip files directly, so it might not actually be that painful to rework the download method. Doing band name reassignments (and providing a friendly method for selecting specific bands) might be trickier.

@mikemahoney218 mikemahoney218 added this to the future milestone Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants