Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect length of file-like object? #1490

Closed
2 tasks done
skshetry opened this issue Mar 1, 2021 · 4 comments · Fixed by #1619
Closed
2 tasks done

detect length of file-like object? #1490

skshetry opened this issue Mar 1, 2021 · 4 comments · Fixed by #1619
Labels
enhancement New feature or request
Milestone

Comments

@skshetry
Copy link
Contributor

skshetry commented Mar 1, 2021

Checklist

  • There are no similar issues or pull requests for this yet.
  • I discussed this idea on the community chat and feedback is positive.

Is your feature related to a problem? Please describe.

I'm using httpx to upload files to a WebDAV server, which uses PUT. The problem here is that not a lot of server has chunked uploading support for PUT.

When uploading a file-like object, request tries to detect file size to send it in Content-Length header using super_len(). But httpx uses Transfer-Encoding: chunked. So, the files end up empty or the server throws a 411 Length Required response.

The difference in behaviour with respect to requests can be demonstrated as follows:

>>> import requests, httpx
>>> from io import BytesIO

>>> requests.put("https://httpbin.org/anything", data=BytesIO(b"foobar")).request.headers
 {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '6'}

>>> httpx.put("https://httpbin.org/anything", data=BytesIO(b"foobar")).request.headers
Headers({'host': 'httpbin.org', 'accept': '*/*', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'user-agent': 'python-httpx/0.16.1', 'transfer-encoding': 'chunked'})

Describe the solution you would like.

It'd be great to have the same behaviour as requests here, i.e. trying to detect the size and then falling back to chunked encoding.

Describe alternatives you considered

As a workaround, I can use peek_filelike_length and a custom header Content-Length which is respected, that works well for me.

Additional context

We had a long discussion in iterative/dvc#4796 regarding it. nextcloud-snap that I use locally for testing, does not have support this either due to a bug in Apache (or, some modules in it): nextcloud-snap/nextcloud-snap#365.

And, Sabredav (owncloud/nextcloud uses them internally) wiki has a good insight on this (though the article is >6 years old):

Finder uses Transfer-Encoding: Chunked in PUT request bodies. This is a little used HTTP feature, and therefore not implemented in a bunch of web servers. The only server I've seen so far that handles this reasonably well is Apache + mod_php. Nginx and Lighttpd respond with 411 Length Required, which is completely ignored by Finder. This was seen on nginx 0.7.63. It was recently reported that a development release (1.3.8) no longer had this issue.

When using this with Apache + FastCGI PHP completely drops the request body, so it will seem as if the PUT request was successful, but the file will end up empty.

Note that the recent servers do support chunked encoding, but we still find a lot of old servers or are using certain modules that are buggy wrt chunked encoding.


I understand that it's the server that is not http/1.1 compliant. And, if httpx does not want to support, I'm okay with the workaround as well. But I'd happy to make a PR if we decide to do so. Thanks. 🙂

@tomchristie
Copy link
Member

Okay, so it looks to me like there's a more sensible behaviour that we could be using for binary data, where we attempt to peek the file(like) length, in the same way that we currently do for multipart data.

Currently we're always treating binary data as having an unknown length, and using chunked encoding for it...

httpx/httpx/_content.py

Lines 77 to 103 in 24a55d7

def encode_content(
content: Union[str, bytes, ByteStream]
) -> Tuple[Dict[str, str], ByteStream]:
if isinstance(content, (str, bytes)):
body = content.encode("utf-8") if isinstance(content, str) else content
content_length = str(len(body))
headers = {"Content-Length": content_length} if body else {}
stream = PlainByteStream(body)
return headers, stream
elif isinstance(content, (Iterable, AsyncIterable)):
headers = {"Transfer-Encoding": "chunked"}
# Generators should be wrapped in GeneratorStream/AsyncGeneratorStream
# which will raise `StreamConsumed` if the stream is accessed more
# than once. (Eg. Following HTTP 307 or HTTP 308 redirects.)
if inspect.isgenerator(content):
generator_stream = GeneratorStream(content) # type: ignore
return headers, generator_stream
if inspect.isasyncgen(content):
agenerator_stream = AsyncGeneratorStream(content) # type: ignore
return headers, agenerator_stream
# Other iterables may be passed through as-is.
return headers, content # type: ignore
raise TypeError(f"Unexpected type for 'content', {type(content)!r}")

We probably ought to instead be attempting to use our peek_filelike_length function, and only using chunked encoding when we're unable to obtain the expected length...

def peek_filelike_length(stream: typing.IO) -> int:

@skshetry
Copy link
Contributor Author

skshetry commented Mar 11, 2021

@tomchristie, thanks. I'll try to come up with a PR. I just wanted to hear if that was a good idea that httpx want to have. 🙂

@tomchristie
Copy link
Member

Okay, I think there's two distinctly separate PRs for us to tackle here...

Firstly, a bit of refactoring so that peek_filelike_length returns None when it can't obtain a length instead of raising an exception. Referenced in the multipart implementation, here...

httpx/httpx/_multipart.py

Lines 94 to 100 in 24a55d7

try:
file_length = peek_filelike_length(self.file)
except OSError:
# As a last resort, read file and cache contents for later.
assert not hasattr(self, "_data")
self._data = to_bytes(self.file.read())
file_length = len(self._data)

Then once that pull request is done, we can follow up with a pull request using peek_filelike_length inside encode_content.

One other thing we'll want to be careful about here is what behaviour we end up with if files opened in text mode are used. There's a note in requests about disallowing these in the future. (Which makes sense for us)

@skshetry
Copy link
Contributor Author

skshetry commented May 5, 2021

Thanks, @tomchristie for the fix. I have been meaning to contribute, but life got in the way. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants