detect length of file-like object? #1490

skshetry · 2021-03-01T08:38:50Z

Checklist

There are no similar issues or pull requests for this yet.
I discussed this idea on the community chat and feedback is positive.

Is your feature related to a problem? Please describe.

I'm using httpx to upload files to a WebDAV server, which uses PUT. The problem here is that not a lot of server has chunked uploading support for PUT.

When uploading a file-like object, request tries to detect file size to send it in Content-Length header using super_len(). But httpx uses Transfer-Encoding: chunked. So, the files end up empty or the server throws a 411 Length Required response.

The difference in behaviour with respect to requests can be demonstrated as follows:

>>> import requests, httpx
>>> from io import BytesIO

>>> requests.put("https://httpbin.org/anything", data=BytesIO(b"foobar")).request.headers
 {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '6'}

>>> httpx.put("https://httpbin.org/anything", data=BytesIO(b"foobar")).request.headers
Headers({'host': 'httpbin.org', 'accept': '*/*', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'user-agent': 'python-httpx/0.16.1', 'transfer-encoding': 'chunked'})

Describe the solution you would like.

It'd be great to have the same behaviour as requests here, i.e. trying to detect the size and then falling back to chunked encoding.

Describe alternatives you considered

As a workaround, I can use peek_filelike_length and a custom header Content-Length which is respected, that works well for me.

Additional context

We had a long discussion in iterative/dvc#4796 regarding it. nextcloud-snap that I use locally for testing, does not have support this either due to a bug in Apache (or, some modules in it): nextcloud-snap/nextcloud-snap#365.

And, Sabredav (owncloud/nextcloud uses them internally) wiki has a good insight on this (though the article is >6 years old):

Finder uses Transfer-Encoding: Chunked in PUT request bodies. This is a little used HTTP feature, and therefore not implemented in a bunch of web servers. The only server I've seen so far that handles this reasonably well is Apache + mod_php. Nginx and Lighttpd respond with 411 Length Required, which is completely ignored by Finder. This was seen on nginx 0.7.63. It was recently reported that a development release (1.3.8) no longer had this issue.

When using this with Apache + FastCGI PHP completely drops the request body, so it will seem as if the PUT request was successful, but the file will end up empty.

Note that the recent servers do support chunked encoding, but we still find a lot of old servers or are using certain modules that are buggy wrt chunked encoding.

I understand that it's the server that is not http/1.1 compliant. And, if httpx does not want to support, I'm okay with the workaround as well. But I'd happy to make a PR if we decide to do so. Thanks. 🙂

The text was updated successfully, but these errors were encountered:

tomchristie · 2021-03-11T11:06:44Z

Okay, so it looks to me like there's a more sensible behaviour that we could be using for binary data, where we attempt to peek the file(like) length, in the same way that we currently do for multipart data.

Currently we're always treating binary data as having an unknown length, and using chunked encoding for it...

httpx/httpx/_content.py

Lines 77 to 103 in 24a55d7

    
           def encode_content( 
        
               content: Union[str, bytes, ByteStream] 
        
           ) -> Tuple[Dict[str, str], ByteStream]: 
        
               if isinstance(content, (str, bytes)): 
        
                   body = content.encode("utf-8") if isinstance(content, str) else content 
        
                   content_length = str(len(body)) 
        
                   headers = {"Content-Length": content_length} if body else {} 
        
                   stream = PlainByteStream(body) 
        
                   return headers, stream 
        
               elif isinstance(content, (Iterable, AsyncIterable)): 
        
                   headers = {"Transfer-Encoding": "chunked"} 
        
                   # Generators should be wrapped in GeneratorStream/AsyncGeneratorStream 
        
                   # which will raise `StreamConsumed` if the stream is accessed more 
        
                   # than once. (Eg. Following HTTP 307 or HTTP 308 redirects.) 
        
                   if inspect.isgenerator(content): 
        
                       generator_stream = GeneratorStream(content)  # type: ignore 
        
                       return headers, generator_stream 
        
                   if inspect.isasyncgen(content): 
        
                       agenerator_stream = AsyncGeneratorStream(content)  # type: ignore 
        
                       return headers, agenerator_stream 
        
                   # Other iterables may be passed through as-is. 
        
                   return headers, content  # type: ignore 
        
               raise TypeError(f"Unexpected type for 'content', {type(content)!r}")

We probably ought to instead be attempting to use our peek_filelike_length function, and only using chunked encoding when we're unable to obtain the expected length...

httpx/httpx/_utils.py

Line 347 in 24a55d7

def peek_filelike_length(stream: typing.IO) -> int:

skshetry · 2021-03-11T11:27:34Z

@tomchristie, thanks. I'll try to come up with a PR. I just wanted to hear if that was a good idea that httpx want to have. 🙂

tomchristie · 2021-03-11T11:36:42Z

Okay, I think there's two distinctly separate PRs for us to tackle here...

Firstly, a bit of refactoring so that peek_filelike_length returns None when it can't obtain a length instead of raising an exception. Referenced in the multipart implementation, here...

httpx/httpx/_multipart.py

Lines 94 to 100 in 24a55d7

    
           try: 
        
               file_length = peek_filelike_length(self.file) 
        
           except OSError: 
        
               # As a last resort, read file and cache contents for later. 
        
               assert not hasattr(self, "_data") 
        
               self._data = to_bytes(self.file.read()) 
        
               file_length = len(self._data)

Then once that pull request is done, we can follow up with a pull request using peek_filelike_length inside encode_content.

One other thing we'll want to be careful about here is what behaviour we end up with if files opened in text mode are used. There's a note in requests about disallowing these in the future. (Which makes sense for us)

skshetry · 2021-05-05T16:14:22Z

Thanks, @tomchristie for the fix. I have been meaning to contribute, but life got in the way. 🙂

tomchristie added this to the v1.0 milestone Apr 29, 2021

tomchristie added the enhancement New feature or request label Apr 29, 2021

tomchristie mentioned this issue Apr 30, 2021

Prefer Content-Length over Transfer-Encoding: chunked for content=<file-like> cases. #1619

Merged

tomchristie closed this as completed in #1619 Apr 30, 2021

skshetry mentioned this issue May 6, 2021

Tracking for content-length issue with httpx skshetry/webdav4#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect length of file-like object? #1490

detect length of file-like object? #1490

skshetry commented Mar 1, 2021

tomchristie commented Mar 11, 2021

skshetry commented Mar 11, 2021 •

edited

Loading

tomchristie commented Mar 11, 2021

skshetry commented May 5, 2021

detect length of file-like object? #1490

detect length of file-like object? #1490

Comments

skshetry commented Mar 1, 2021

Checklist

Is your feature related to a problem? Please describe.

Describe the solution you would like.

Describe alternatives you considered

Additional context

tomchristie commented Mar 11, 2021

skshetry commented Mar 11, 2021 • edited Loading

tomchristie commented Mar 11, 2021

skshetry commented May 5, 2021

skshetry commented Mar 11, 2021 •

edited

Loading