Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace cgi.FieldStorage by multipart #1094

Merged
merged 7 commits into from
Jan 19, 2023
Merged

replace cgi.FieldStorage by multipart #1094

merged 7 commits into from
Jan 19, 2023

Conversation

d-maurer
Copy link
Contributor

This PR replaces cgi.FieldStorage by multipart, avoiding the cgi module deprecated by Python 3.11.

In addition:

  • fixes the encoding handling and lets the :bytes modifier work as expected
  • marks binary converters with a true binary attribute

The PR uses the surrogateescape encoding error handler for its encoding handling (see PEP 383). If form data in an unexpected encoding is processed, surrogates may reach application code and cause delayed problems because the default strict encoding error handler cannot process them. Potentially, a check for surrogates should be implemented to report such problems early.

@d-maurer d-maurer marked this pull request as ready for review January 17, 2023 14:32
@d-maurer d-maurer requested review from dataflake and icemac January 17, 2023 14:32
Copy link
Member

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from quickly scanning though the code.

src/ZPublisher/Converters.py Outdated Show resolved Hide resolved
@mauritsvanrees
Copy link
Member

@d-maurer Fist: thanks for having worked on this!

But when I use a checkout of Zope in the Plone 6 core development buildout, I get test failures due to this change. See plone/buildout.coredev#844 and mostly this comment by @jensens. With your merged PR we get this error in the tests of three packages:

FileUpload.__init__() missing 1 required positional argument: 'charset'

@ale-rt suggests to update Zope to change the signature of FileUpload and make charset an optional keyword argument. That fixes most of the test failures. I think this makes sense, and is an easy fix.

Then the question is what we should use as default value for charset. I see two options:

  1. charset="latin-1". This is what Alessandro proposes, and it fixes all Plone tests.
  2. charset=default_encoding. This is defined earlier on in HTTPRequest.py as utf-8 (with a remark that it might be overridden in config). With this, some tests in plone.formwidget.namedfile still fail. That starts here with a test that explicitly uses a non-ascii filename. This test was added 12 years ago and it would be sad if this now fails. ;-)

This test makes me wonder if the following part of your fix is correct:

class FileUpload:
...
    def __init__(self, aFieldStorage, charset):
...
        self.filename = aFieldStorage.filename.encode("latin-1").decode(charset)

Why is latin-1 hardcoded here? The plone.formwidget.namedfile test fails because of this when I would change the FileUpload class to use utf-8 as charset. The test plus your code can be simplified like this in pure Python:

>>> filename = b'rand\xc3\xb8m.txt'.decode('utf8')
>>> print(filename)
randøm.txt
>>> def handle_filename(filename, charset):
...     return filename.encode('latin-1').decode(charset)
... 
# Good:
>>> handle_filename(filename, 'latin-1')
'randøm.txt'
# Bad:
>>> handle_filename(filename, 'utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in handle_filename
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 4: invalid start byte

In other words, charset='latin-1' seems the only safe option. But then I wonder if the encoding and decoding is still needed.

I have prepared a PR.

Belatedly, I thought I would try to upload a non-ascii file in an actual browser and a running Plone Site, instead of in a 12 year old test that may or may not make sense. I created a file like this:

echo "randøm.txt" > randøm.txt

This is on Mac with Firefox.
Uploading this file went fine, both with charset latin-1 and utf-8. So that is good.
And actually when I go in with a pdb, I get this:

(Pdb++) aFieldStorage.filename
'randøm.txt'
(Pdb++) aFieldStorage.filename.encode("latin-1")
b'rand\xc3\xb8m.txt'
(Pdb++) aFieldStorage.filename.encode("latin-1").decode("utf-8")
'randøm.txt'
(Pdb++) aFieldStorage.filename.encode("latin-1").decode("latin-1")
'randøm.txt'

So here we do actually get a filename as input which is latin-1. So your encode("latin-1") works fine here.
And no matter whether we then decode to utf-8 or latin-1, the end result when I look in Plone is fine: the id of the content item is normalised to random.txt (done by Plone I think), the title is randøm.txt, and when you download it, you again get a file randøm.txt.

In other words, I am not sure what to make of this, and which part of the code fixes the filename after FileUpload.__init__ has finished. Maybe another part of your PR does that.
Anyway, I still think adding a default value for the charset parameter is helpful. I am happy to get advice on what its value should be.

mauritsvanrees added a commit that referenced this pull request Mar 7, 2023
Without this change, some tests (and possibly live code) in Plone fail because they do not pass the new required `charset` parameter.
See plone/buildout.coredev#844
And see discussion that I just started here:
#1094 (comment)

Also fixes an unclosed file in a test.
@d-maurer
Copy link
Contributor Author

d-maurer commented Mar 8, 2023 via email

mauritsvanrees added a commit that referenced this pull request Mar 8, 2023
Without this change, some tests (and possibly live code) in Plone fail because they do not pass the new required `charset` parameter.
See plone/buildout.coredev#844
And see discussion that I just started here:
#1094 (comment)

Also fixes an unclosed file in a test.
mauritsvanrees added a commit to plone/plone.formwidget.namedfile that referenced this pull request Mar 8, 2023
This seems to be what browsers actually use, or how the filename is at this point in the zope publishing machinery.
See zopefoundation/Zope#1094 (comment)
mauritsvanrees added a commit that referenced this pull request Mar 9, 2023
…1101)

FileUpload: use default encoding as charset when nothing is passed.

Without this change, some tests (and possibly live code) in Plone fail because they do not pass the new required `charset` parameter.
See plone/buildout.coredev#844
And see discussion that I just started here:
#1094 (comment)

Also fixes an unclosed file in a test.
mister-roboto pushed a commit to plone/buildout.coredev that referenced this pull request Mar 10, 2023
Branch: refs/heads/master
Date: 2023-03-08T11:42:20+01:00
Author: Maurits van Rees (mauritsvanrees) <[email protected]>
Commit: plone/plone.formwidget.namedfile@26c5285

Use latin-1 decoded filename in tests with FileUpload.

This seems to be what browsers actually use, or how the filename is at this point in the zope publishing machinery.
See zopefoundation/Zope#1094 (comment)

Files changed:
A news/1094.bugfix
M plone/formwidget/namedfile/converter.py
M plone/formwidget/namedfile/widget.rst
Repository: plone.formwidget.namedfile

Branch: refs/heads/master
Date: 2023-03-10T21:31:14+01:00
Author: Maurits van Rees (mauritsvanrees) <[email protected]>
Commit: plone/plone.formwidget.namedfile@ddbeeec

Merge pull request #60 from plone/maurits-widget-tests-latin-1

Use latin-1 decoded filename in tests with FileUpload.

Files changed:
A news/1094.bugfix
M plone/formwidget/namedfile/converter.py
M plone/formwidget/namedfile/widget.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants