-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
response.body is ASCII-8BIT when Content-Type is text/xml; charset=utf-8 #139
Comments
Just encountered the same issue. Any ideas? |
Workaround I used is: response.body.force_encoding('utf-8') Yahuda has a dissertation about the problem here. |
I'm pretty sure Faraday just passes on the response body from the underlying adapter. I'm not sure I want to raise errors or perform lossy conversions of the data in Faraday. That can be done in a custom middleware if you really need it. |
Fair enough. If the problem is elsewhere, as it appears, I guess it will be cleaned up in time. It's not a show stopper for me. |
Closing because it's not a bug with Faraday. |
I'm not sure the underlying adapter - at least net/http - does any encoding transformation. You can set Ruby's Encoding.default_external to something like 'US-ASCII', then hit an endpoint with Content-Type = '...; charset=utf-8' ... net/http will parse the charset string and make it available, but does nothing to the encoding of the body string. Maybe net/http should be responsible for that, but if it isn't, the ParseJson middleware (for example) can blow up. |
Did some more research on this - some of the underlying adapters handle the Content-Type charset, some don't: EM-HTTP-Request does. [commit]. I guess the nicest thing to do would be to perhaps offer an optional middleware for adapters that don't try, but, yeah, I'd agree, this probably shouldn't be Faraday's responsibility. |
- The `force_encoding` option in WebsiteAgent is moved to WebRequestConcern so other users of the concern such as RssAgent can benefit from it. - WebRequestConcern detects a charset specified in the Content-Type header to decode the content properly, and if it is missing the content is assumed to be encoded in UTF-8 unless it has a binary MIME type. Not all Faraday adopters handle character encodings, and Faraday passes through what is returned from the backend, so we need to do this on our own. (cf. lostisland/faraday#139) - WebRequestConcern now converts text contents to UTF-8, so agents can handle non-UTF-8 data without having to deal with encodings themselves. Previously, WebsiteAgent in "json"/"text" modes and RssAgent would suffer from encoding errors when dealing with non-UTF-8 contents. WebsiteAgent in "html"/"xml" modes did not have this problem because Nokogiri would always return results in UTF-8 independent of the input encoding. This should fix #608.
- The `force_encoding` and `unzip` options in WebsiteAgent is moved to WebRequestConcern so other users of the concern such as RssAgent can benefit from them. - WebRequestConcern detects a charset specified in the Content-Type header to decode the content properly, and if it is missing the content is assumed to be encoded in UTF-8 unless it has a binary MIME type. Not all Faraday adopters handle character encodings, and Faraday passes through what is returned from the backend, so we need to do this on our own. (cf. lostisland/faraday#139) - WebRequestConcern now converts text contents to UTF-8, so agents can handle non-UTF-8 data without having to deal with encodings themselves. Previously, WebsiteAgent in "json"/"text" modes and RssAgent would suffer from encoding errors when dealing with non-UTF-8 contents. WebsiteAgent in "html"/"xml" modes did not have this problem because Nokogiri would always return results in UTF-8 independent of the input encoding. This should fix #608.
- The `force_encoding` and `unzip` options in WebsiteAgent are moved to WebRequestConcern so other users of the concern such as RssAgent can benefit from them. - WebRequestConcern detects a charset specified in the Content-Type header to decode the content properly, and if it is missing the content is assumed to be encoded in UTF-8 unless it has a binary MIME type. Not all Faraday adopters handle character encodings, and Faraday passes through what is returned from the backend, so we need to do this on our own. (cf. lostisland/faraday#139) - WebRequestConcern now converts text contents to UTF-8, so agents can handle non-UTF-8 data without having to deal with encodings themselves. Previously, WebsiteAgent in "json"/"text" modes and RssAgent would suffer from encoding errors when dealing with non-UTF-8 contents. WebsiteAgent in "html"/"xml" modes did not have this problem because Nokogiri would always return results in UTF-8 independent of the input encoding. This should fix #608.
@chrismo you're my hero. Thanks for doing that research! |
This has been implemented for Net HTTP adapter: lostisland/faraday-net_http#6 |
First time using faraday, so I might be doing things incorrectly, but the
response.body
encoding in the following is ASCII-8BIT:In 1.9.2 this causes REXML to throw an
Encoding::CompatibilityError
.I couldn't find a way to force faraday to provide
response.body
in UTF-8.What is the preferred solution to this?
The text was updated successfully, but these errors were encountered: