-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable request body streaming with an IO object #409
Conversation
68c2f43
to
c4798f5
Compare
lib/http/request/writer.rb
Outdated
@request_header << "#{Headers::CONTENT_LENGTH}: 0" | ||
elsif @body.is_a?(Enumerable) && CHUNKED != @headers[Headers::TRANSFER_ENCODING] | ||
raise(RequestError, "invalid transfer encoding") | ||
unless @headers[Headers::CONTENT_LENGTH] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to use guard here:
return if @headers[Headers::CONTENT_LENGTH]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, updated in abe1b08.
lib/http/request/writer.rb
Outdated
if @body.is_a?(String) | ||
@request_header << "#{Headers::CONTENT_LENGTH}: #{@body.bytesize}" | ||
elsif @body.respond_to?(:read) | ||
@request_header << "#{Headers::CONTENT_LENGTH}: #{@body.size}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are checking respond_to? :read
we should also check here respond_to? size
, no? So I believe the best will be:
if @body.respond_to?(:bytesize)
@request_header << "#{Headers::CONTENT_LENGTH}: #{@body.bytesize}"
elsif @body.respond_to?(:size) && @body.respond_to?(:read)
@request_header << "#{Headers::CONTENT_LENGTH}: #{@body.size}"
elsif @body.nil?
@request_header << "#{Headers::CONTENT_LENGTH}: 0"
elsif ...
end
@httprb/core please take a look if you see any gotchas here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I wasn't checking here whether the IO object responds to size
was because I wanted that it's required for the IO object to respond to size
if Content-Length
wasn't passed in, because for non-chunked requests Content-Length
header is required (I would document this requirement in the wiki).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you are checking for #read
method, but using #size
then which might be not available:
reader, writer = IO.pipe
reader.respond_to? :read # => true
reader.respond_to? :size # => false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I got what you mean. I guess then we should raise an exception:
elsif @body.respond_to? :read
raise "IO object must respond to #size" unless @body.respond_to? :size
# ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when IO.pipe
is passed in and Content-Length
is not explicitly passed in, then the request will fail with a NoMethodError
because IO.pipe
doesn't respond to #size
. This can be solved by passing Content-Length
explicitly:
reader, writer = IO.pipe
writer.write("content")
writer.close
HTTP.headers("Content-Length" => 7).post("http://example.org", body: reader)
If the exception isn't raised when Content-Length
is absent and the IO doesn't respond to #size
, the user could accidentally send the request without a Content-Length
, which isn't a valid HTTP request (unless it's a chunked request).
The error is currently just a NoMethodError
, which probably doesn't communicate well to the user, would it help if I improved the error message? Or are you in disagreement with this behaviour in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your last comment only after I posted the previous one, great, I'll update the error message then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in e53c71f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Let's wait for input (if any) of @httprb/core and then I'll fix rubocop and merge! |
http://ruby-doc.org/core-2.0.0/IO.html I'm curious why this wasn't working with the existing |
@tarcieri While First, HTTP.rb's Second, Third, by relying only on
|
We definitely need to improve work with IO bodies... so that it will allow streaming chunked data of undetermined length (e.g. stream of JSON documents) |
Hmm, isn't that what we already have with the |
@janko-m I guess so: require "http"
require "yaml"
rd, wd = IO.pipe
writer = Thread.new(wd) do |io|
Thread.stop
puts "READER: Start feeding data"
%w(a b c).each do |char|
io.write(%({"#{char}":#{rand}}\n))
sleep 1
end
puts "READER: All data was feeded"
io.close
end
reader = Thread.new(rd) do |io|
client = HTTP.headers({ "Transfer-Encoding" => "chunked" })
puts "WRITER: start post request"
response = client.post("http://httpbin.org/post", :body => io).flush
puts "WRITER: post request finished"
puts "== RESPONSE ==\n#{response.parse.to_yaml}"
end
sleep 0.1 until "sleep" == writer.status
writer.wakeup
reader.join |
Ignore me! It's httpbin is not understanding chunked requests. Tried that with netcat - all correct:
|
I think Looks like the Could bodies with a |
IMHO it worth to create |
You're right, the current code wouldn't work with an IO object which isn't
You mean, that if a body doesn't respond to Enumerator.new do |yielder|
loop do
data = @io.read(Connection::BUFFER_SIZE, buffer ||= String.new)
break if data.nil?
yielder << data
end
end
I'm sold 💰 |
I extracted body-related behaviour into an During this refactoring, I implemented @britishtea's suggestion of allowing I also modified tests to match all of the content that was written to the socket, not just prefixes and suffixes, so that I have greater confidence that I didn't break anything. I also tested streaming to the socket with both Enumerable and non-Enumerable IOs, to verify that Enumerable IOs are treated as IOs, not Enumerables. I'm not sure whether this is a breaking change, because this will change the chunked request body for Enumerable IOs, but then again it shouldn't matter in what format is the chunked request body, as long as all of the content is sent. I see now that I've triggered quite a few Rubocop offences due to my style, I will let you correct or ignore offences which suit your preference. Investigating the JRuby failure now. |
I opened a bug report for the JRuby failure: jruby/jruby#4583. Normally this bug would be harmless, but we are reading chunks using a buffer (to avoid creating a new string for each chunk), so each retrieved chunk will override the next chunk. This means that if we request the first chunk, and JRuby starts executing the code after the first I think at this point it probably makes more sense to work around this by not using a buffer, even at the cost of more string objects being allocated. This wouldn't introduce a performance regression, since this is a new feature. |
lib/http/request/body.rb
Outdated
chunk = String.new # rubocop:disable Style/EmptyLiteral | ||
yield chunk while @body.read(BUFFER_SIZE, chunk) | ||
loop do | ||
data = @body.read(BUFFER_SIZE) or break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd prefer something like:
while (data = @body.read(BUFFER_SIZE))
yield data
end
It better communicates intent, IMO, despite assigning in a branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarcieri I would also prefer that, but Ruby prints a warning when you attempt to use an assignment in conditionals (Ruby thinks that you wanted to use ==
). It happens even when warning aren't enabled, just when running ruby -e 'nil if a = 1'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try the exact code I gave (with parens). It does not print a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is wonderful, I'll update the code then 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarcieri Updated.
RuboCop ProTip(TM): |
By changing all components to use an IO object as a base, we can implement a common IO interface for all components, which delegates to the underlying IO object. This enables streaming multipart data into the request body, avoiding loading the whole multipart data into memory when File parts are backed by File objects. See httprb/http#409 for the new streaming API.
This is a squashed commit of [PR#12][] with some tiny cleanups applied on top of that. [PR#12]: #12 Allow any IO object in FormData::File ------------------------------------- Previously we allowed only File and StringIO objects as an input to `FormData::File`, but we can generalize that to any IO object that responds to `#read` and `#size` (which includes `Tempfile`, `ActionDispatch::Http::UploadedFile` etc). Open File for given path in binary mode --------------------------------------- That way different operating systems won't attempt to convert newline characters to their internal representation, instead the file content will always be retrieved byte-for-byte as is. Officially support Pathname in FormData::File.new ------------------------------------------------- Previously Pathname was implicitly supported, though extracting filename wasn't working. With the recent refactoring this stopped working, so we make the Pathname support explicit. Make all components into IO objects ----------------------------------- By changing all components to use an IO object as a base, we can implement a common IO interface for all components, which delegates to the underlying IO object. This enables streaming multipart data into the request body, avoiding loading the whole multipart data into memory when File parts are backed by File objects. See httprb/http#409 for the new streaming API. Make CompositeIO convert strings to StringIOs --------------------------------------------- By delegating handling strings to CompositeIO we can remove a lot of the StringIO.new clutter when instantiating CompositeIO objects. Use a buffer when reading IO files in CompositeIO ------------------------------------------------- This way we're not creating a new string for each chunk read, instead each chunk will be read into an existing string object (a "buffer"), replacing any previous content.
I updated I was thinking whether we need some integration tests between This PR should be complete now. |
Please rebase this branch and I'll be happy to merge it down |
This change adds the ability to stream content of an IO object (an object that responds to #read) into the request body. This is convenient when you want to send large amounts of data as the request body, but you don't want to load all of it into memory. HTTP.post("http://example.org/upload", body: File.open("video.mp4")) HTTP.post("http://example.org/upload", body: StringIO.new("file data")) It was already possible to send the request body in chunks by passing an Enumeable to :body, but that's only defined for "Transfer-Encoding: chunked" requests. This will also allow us to extend form_data.rb to stream the multipart-encoded request, instead having to load the whole body into memory before writing it to the socket.
Before creating chunked requests was only possible with Enumerable body. If "Transfer-Encoding: chunked" was set but the body wasn't Enumerable, request would be calculated as if it was a regular request, which meant inclusion of "Content-Length" header and writing request body as is. If body was an Enumerable but "Transfer-Encoding: chunked" wasn't set, an error would be raised. This commit enables sending chunked requests with any body type, by abstracting how the request body is read into an #each method. If "Transfer-Encoding: chunked" is detected, the yielded chunks are encoded accordingly before writing, otherwise the chunks are written as is. This allows making a streaming chunked request using an IO object, where file would be written in chunks of 16 KB, which wasn't previously possible.
As described in jruby/jruby#4583, JRuby starts executing code after the first `yield` even though we requested only the first element, resulting in the first chunk being overriden with the second chunk before it was even returned. We work around this by not using a buffer string, therefore each retrieved chunk is a new string, so even if JRuby immediately retrieves the second chunk, it won't affect the first chunk.
3cd6931
to
bb4479f
Compare
@ixti Done! |
Awesome! Thanks! |
pkgsrc changes: - sort DEPENDS Upstream changes (from CHANGES.md): ## 3.0.0 (2017-10-01) * Drop support of Ruby `2.0` and Ruby `2.1`. ([@ixti]) * [#410](httprb/http#410) Infer `Host` header upon redirects. ([@janko-m]) * [#409](httprb/http#409) Enables request body streaming on any IO object. ([@janko-m]) * [#413](httprb/http#413), [#414](httprb/http#414) Fix encoding of body chunks. ([@janko-m]) * [#368](httprb/http#368), [#357](httprb/http#357) Fix timeout issue. ([@HoneyryderChuck])
This change adds the ability to stream content of an IO object (an object that responds to #read) into the request body. This is convenient when you want to send large amounts of data as the request body, but you don't want to load all of it into memory.
It was already possible to send the request body in chunks by passing an Enumerable to
:body
, but that can (and should) only be used for making "Transfer-Encoding: chunked" requests.This will also allow us to extend form_data.rb to stream the multipart-encoded request, instead having to load the whole body into memory before writing it to the socket.