Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Upload has finished. While uploading chunk by chunk #301

Open
devjunhong opened this issue Nov 14, 2024 · 1 comment
Open

ValueError: Upload has finished. While uploading chunk by chunk #301

devjunhong opened this issue Nov 14, 2024 · 1 comment

Comments

@devjunhong
Copy link

devjunhong commented Nov 14, 2024

Hi oittaa,

Thanks for your continue effort on this project.

Describe the bug

I wanted to try chunk a large file and then upload to the gcp-storage-emulator.

This failed with ValueError. However, the real google cloud storage works fine.

import os
from pathlib import Path
from typing import Generator

from google.cloud import storage

CHUNK_SIZE = 1 * 1024 * 1024  # 1 MB


def chunk_file(file_full_path: str) -> Generator[bytes, None, None]:
    file = Path(file_full_path)
    with file.open("rb") as f:
        while True:
            chunk = f.read(CHUNK_SIZE)
            if not chunk:
                break
            yield chunk


if __name__ == "__main__":
    test_bucket_name = "test-bucket"
    filepath = "train.jsonl"
    blob_name = "train.jsonl"

    os.environ["STORAGE_EMULATOR_HOST"] = "http://gcs:9023"

    client = storage.Client()
    bucket = client.bucket(test_bucket_name)
    blob = bucket.blob(blob_name)

    with blob.open("wb", chunk_size=CHUNK_SIZE) as blob_writer:
        for piece in chunk_file(filepath):
            blob_writer.write(piece)

    for b in bucket.list_blobs():
        print(b.name)
Traceback (most recent call last):
  File "/app/main.py", line 33, in <module>
    blob_writer.write(piece)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 357, in write
    self._upload_chunks_from_buffer(num_chunks)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 417, in _upload_chunks_from_buffer
    upload.transmit_next_chunk(transport, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py", line 503, in transmit_next_chunk
    method, url, payload, headers = self._prepare_request()
                                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/_upload.py", line 611, in _prepare_request
    raise ValueError("Upload has finished.")
ValueError: Upload has finished.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/app/main.py", line 31, in <module>
    with blob.open("wb", chunk_size=CHUNK_SIZE) as blob_writer:
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 437, in close
    self._upload_chunks_from_buffer(1)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 417, in _upload_chunks_from_buffer
    upload.transmit_next_chunk(transport, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py", line 503, in transmit_next_chunk
    method, url, payload, headers = self._prepare_request()
                                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/_upload.py", line 611, in _prepare_request
    raise ValueError("Upload has finished.")
ValueError: Upload has finished.

To Reproduce

To reproduce this error, I had to download a sample file.

Hence, I wrapped the script with docker-compose.

Hopefully, this is helpful to reproduce the issue.

https://github.com/devjunhong/large-file-issue

Expected behavior

It should finish uploading without an error.

System (please complete the following information)

  • OS version: MacOS 15.1
  • Python version: 3.11.10
  • gcp-storage-emulator version: v2024.08.03
@devjunhong
Copy link
Author

devjunhong commented Nov 14, 2024

Fortunately, there's a workaround of this issue. If you set the chunk size to be slightly more than the file size, it will work without an issue.

I mean

CHUNK_SIZE = 1 * 1024 * 1024 # 1 MB <- if this is bigger than the file size, it is okay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant