S3 support for a file which is not on active storage #94

bnlawrence · 2023-06-16T15:43:04Z

We need to include support within PyActiveStorage for the situation where the remote server does not support ActiveStorage but the client has requested ActiveStorage support - in this situation we should fail over to calculate the operations ourselves. To do that, we need our version of reduce_chunk to grab the necessary blocks and do the operations itself, as it currently does for POSIX.

In the long-term we would hope that netcdf4-python would do this transparently, but for the moment we need to use h5netcdf to do it.

The text was updated successfully, but these errors were encountered:

markgoddard · 2023-06-19T10:03:22Z

This is a similar scenario to when S3 active storage is broken or too busy to handle the request.

Should activestorage.s3.reduce_chunk handle these cases transparently, or raise an error that is handled by Active which propagates the request to activestorage.storage.reduce_chunk? I lean towards the former approach, keeping all S3 interaction within the s3 module. In that case it would make sense to extract some of the Numpy operations to a common module to be shared by the storage and s3 modules.

bnlawrence · 2023-06-22T11:40:10Z

I think we need to handle s3 independently of s3 active storage. There are going to be a lot of use cases where the dask workflow has identified a need to bring all the data back to the client whether or not there is active storage present.
We think the error needs to propagate up to PyActiveStorage so it can avoid making unnecessary repeated requests which would introduce extra latency on each block.

Context: each computational chunk in Dask has it's own PyActiveStorage instance ... they are likely to be requesting in parallel, so once a computational chunk sees a problem it should give up using active, but some may still work fine.

bnlawrence assigned bnlawrence and valeriupredoi Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 support for a file which is not on active storage #94

S3 support for a file which is not on active storage #94

bnlawrence commented Jun 16, 2023

markgoddard commented Jun 19, 2023

bnlawrence commented Jun 22, 2023 •

edited

Loading

S3 support for a file which is not on active storage #94

S3 support for a file which is not on active storage #94

Comments

bnlawrence commented Jun 16, 2023

markgoddard commented Jun 19, 2023

bnlawrence commented Jun 22, 2023 • edited Loading

bnlawrence commented Jun 22, 2023 •

edited

Loading