-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace cgi.FieldStorage
by multipart
#1094
Conversation
…the request body is accessed twice; in those cases, the second access will give an empty result
…ded` (required by some tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from quickly scanning though the code.
@d-maurer Fist: thanks for having worked on this! But when I use a checkout of Zope in the Plone 6 core development buildout, I get test failures due to this change. See plone/buildout.coredev#844 and mostly this comment by @jensens. With your merged PR we get this error in the tests of three packages:
@ale-rt suggests to update Zope to change the signature of Then the question is what we should use as default value for
This test makes me wonder if the following part of your fix is correct:
Why is
In other words, I have prepared a PR. Belatedly, I thought I would try to upload a non-ascii file in an actual browser and a running Plone Site, instead of in a 12 year old test that may or may not make sense. I created a file like this:
This is on Mac with Firefox.
So here we do actually get a filename as input which is latin-1. So your In other words, I am not sure what to make of this, and which part of the code fixes the filename after |
Without this change, some tests (and possibly live code) in Plone fail because they do not pass the new required `charset` parameter. See plone/buildout.coredev#844 And see discussion that I just started here: #1094 (comment) Also fixes an unclosed file in a test.
Maurits van Rees wrote at 2023-3-7 15:52 -0800:
...
This test makes me wonder if the following part of your fix is correct:
```
class FileUpload:
...
def __init__(self, aFieldStorage, charset):
...
self.filename = aFieldStorage.filename.encode("latin-1").decode(charset)
```
Why is `latin-1` hardcoded here?
The data usually used as filename of a `FileUpload` comes from
an HTTP header and HTTP/1.1 uses `latin-1` as `charset` for its headers
(even though modern versions deprecate the use of non-ASCCI characters).
Note that "latin-1" here does not stand for a "true" charset:
it is a technical device to decode a byte sequence into an `str`
and postpone the real decoding until the "true" charset is known.
I made file upload tests with `firefox`.
Apparently, it encodes the filename with the charset associated with
the form to get a sequences of bytes. It puts this byte sequence
into the HTTP request's `Content-Disposition` header (for the upload).
Thus, logically, the header value is the byte sequence corresponding
to the encoding of the filename with the form's charset.
`FileUpload` does not know about the form's charset; it must
be told about it -- which does the new `charset` parameter.
I would not mind to make `charset` optional.
If it is used from `HTTPRequest`, `charset` will be provided.
Thus, from `HTTPRequest`'s point if view, the default value could
be anything.
Note, however, that `FileUpload` assumes that `aFieldStorage`
has values which come from an HTTP request, especially
that its `filename` has a (transparently encoded) byte sequence
as value which gives the correct filename once recoded with
the provided `charset`.
Maybe, this will force you to rework the tests anyway.
The `plone.formwidget.namedfile` test fails because of this when I would change the FileUpload class to use utf-8 as charset. The test plus your code can be simplified like this in pure Python:
```
>>> filename = b'rand\xc3\xb8m.txt'.decode('utf8')
>>> print(filename)
randøm.txt
>>> def handle_filename(filename, charset):
... return filename.encode('latin-1').decode(charset)
...
# Good:
>>> handle_filename(filename, 'latin-1')
'randøm.txt'
# Bad:
>>> handle_filename(filename, 'utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in handle_filename
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 4: invalid start byte
```
The test does not reflect the typical browser situation:
The browser has a unicode filename; it must encode it
with the form's charset and put the result into an HTTP `Content-Disposition`
header which by the HTTP/1.1 spec uses `latin-1` as charset.
Thus, your test should start with a unicode filename and encode
this with the target charset -- leading to a sequence of bytes.
You can decode any sequence of bytes with `latin-1` to get
its standard representation as `str`.
`handle_filename` assumes that the process above has been applied
and recomputes the original unicode filename based on this assumption.
|
Without this change, some tests (and possibly live code) in Plone fail because they do not pass the new required `charset` parameter. See plone/buildout.coredev#844 And see discussion that I just started here: #1094 (comment) Also fixes an unclosed file in a test.
This seems to be what browsers actually use, or how the filename is at this point in the zope publishing machinery. See zopefoundation/Zope#1094 (comment)
…1101) FileUpload: use default encoding as charset when nothing is passed. Without this change, some tests (and possibly live code) in Plone fail because they do not pass the new required `charset` parameter. See plone/buildout.coredev#844 And see discussion that I just started here: #1094 (comment) Also fixes an unclosed file in a test.
Branch: refs/heads/master Date: 2023-03-08T11:42:20+01:00 Author: Maurits van Rees (mauritsvanrees) <[email protected]> Commit: plone/plone.formwidget.namedfile@26c5285 Use latin-1 decoded filename in tests with FileUpload. This seems to be what browsers actually use, or how the filename is at this point in the zope publishing machinery. See zopefoundation/Zope#1094 (comment) Files changed: A news/1094.bugfix M plone/formwidget/namedfile/converter.py M plone/formwidget/namedfile/widget.rst Repository: plone.formwidget.namedfile Branch: refs/heads/master Date: 2023-03-10T21:31:14+01:00 Author: Maurits van Rees (mauritsvanrees) <[email protected]> Commit: plone/plone.formwidget.namedfile@ddbeeec Merge pull request #60 from plone/maurits-widget-tests-latin-1 Use latin-1 decoded filename in tests with FileUpload. Files changed: A news/1094.bugfix M plone/formwidget/namedfile/converter.py M plone/formwidget/namedfile/widget.rst
This PR replaces
cgi.FieldStorage
bymultipart
, avoiding thecgi
module deprecated by Python 3.11.In addition:
:bytes
modifier work as expectedbinary
attributeThe PR uses the
surrogateescape
encoding error handler for its encoding handling (see PEP 383). If form data in an unexpected encoding is processed, surrogates may reach application code and cause delayed problems because the default strict encoding error handler cannot process them. Potentially, a check for surrogates should be implemented to report such problems early.