Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move gdalinfo call to executor #948

Open
EmileSonneveld opened this issue Nov 25, 2024 · 1 comment
Open

Move gdalinfo call to executor #948

EmileSonneveld opened this issue Nov 25, 2024 · 1 comment
Assignees

Comments

@EmileSonneveld
Copy link
Contributor

Now, gdalinfo is called on output assets in the driver. In case of gtiff output on S3, the assets where written on an executor, and need to get downloaded again in the driver.
In case of fusemount it happens implicitly, in case of direct S3 access, it happens explicitly here:

if not abs_asset_path.exists() and asset_href.startswith("s3://"):
try:
abs_asset_path.parent.mkdir(parents=True, exist_ok=True)
with open(abs_asset_path, "wb") as f:
for chunk in stream_s3_binary_file_contents(asset_href):
f.write(chunk)

Moving gdalinfo to the executor and passing the info on would avoid this extra download.

This might avoid OOM like this: #809
And would have avoided this log deadlock: #906

cc @jdries

@jdries
Copy link
Contributor

jdries commented Nov 25, 2024

Requires that scala code makes the gdalinfo call, but also that we have a way to pass the resulting metadata back to the driver.
This could perhaps be achieved by assembling the stac json files already in executors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants