Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified QuotedStringTokenizer #9729

Merged
merged 24 commits into from
May 18, 2023

Conversation

gregw
Copy link
Contributor

@gregw gregw commented May 4, 2023

Now implements a simple subset of quoted-string from RFC9110

gregw added 4 commits May 4, 2023 10:22
Now implements a simple subset of `quoted-string` from RFC9110
Now implements a simple subset of `quoted-string` from RFC9110
@gregw gregw marked this pull request as ready for review May 6, 2023 09:21
@gregw
Copy link
Contributor Author

gregw commented May 6, 2023

This could be a little bit controversial as it is used in a lot of places and I've just removed the ability to quote anything special. i.e. there is now no support of "\uab12" style unicode escapes, nor "\t" tabs etc. Also there is no leniency for browsers that do not quote "".

There was no spec for any of those. We now just implement RFC9110 quoted-string.
It was also silly that we were lenient on a browser that sent "\Some\File\Name", but if they sent "\some\file\name" we would substitute the "\n".

thoughts?

@@ -677,7 +674,7 @@ public void testPartWithBackSlashInFileName() throws Exception

String contents = """
--AaB03x\r
content-disposition: form-data; name="stuff"; filename="Taken on Aug 22 \\ 2012.jpg"\r
content-disposition: form-data; name="stuff"; filename="Taken on Aug 22 \\\\ 2012.jpg"\r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird. Why there are now 2 backslashes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we used to forgive browsers for not properly escaping "" in filenames. Now we are going to insist they quote them correctly.

Could be a bad idea. @joakime can you do some tests on windows to see how "" is treated by current browsers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. But there is no good solution. See comments below.

@sbordet
Copy link
Contributor

sbordet commented May 8, 2023

@gregw FTR maybe create another class with stricter behavior and keep the old one around.
And then have a compliance mode in HttpConfiguration?

@gregw
Copy link
Contributor Author

gregw commented May 8, 2023

After feedback from Simone, I'm going to:

  • get rid of static API
  • get rid of passing the string in constructor
  • Make the construction API about how to configure only
  • Make the API work with iterators rather than enumerations (welcome to 2000!)
  • Extract an interface
  • Write new implementation, but keep old as an option.

@joakime Can you gather some stats from users about filenames passed to multi-part. Specifically do any still have non escaped "" in the filename? Or do any actually use the '\t' or '\n' style escapes.

Putting this back to draft for now.

@gregw gregw marked this pull request as draft May 8, 2023 08:07
gregw added 5 commits May 8, 2023 14:06
…lify-QuotedStringTokenizer

# Conflicts:
#	jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpField.java
…lify-QuotedStringTokenizer

# Conflicts:
#	jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpField.java
@gregw gregw requested a review from sbordet May 11, 2023 11:07
@gregw
Copy link
Contributor Author

gregw commented May 11, 2023

@joakime @sbordet this is still in draft mode, but can I get a pre-review. It is a huge PR now and I'd like to get a sense of your feedback before doing too much more work.

@gregw
Copy link
Contributor Author

gregw commented May 12, 2023

@joakime the tests that you added for multipart file uploads answers the question "do browser correctly quote ''" as we have tests for msie and edge that expect to handle

Content-Disposition: form-data; name="file"; filename="C:\Users\joakim\Pictures\jetty-avatar-256.png"

So I think we do need a mode that doesn't treat '' as an escape.

Do you you know how such browsers escape characters if they don't treat '' specially? Specifically, what do they do if there is a " in the filename?

@gregw
Copy link
Contributor Author

gregw commented May 12, 2023

I think we may need to implement rfc8187 which replaces rfc5987 which is suggest as per
https://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http

We will still need a mode that correctly handles the none escaped '' in filenames, but we should look for filename* for the filename we actually use.

@gregw
Copy link
Contributor Author

gregw commented May 13, 2023

@joakime can you gather the raw browser-capture data (like you did for jetty-core/jetty-http/src/test/resources/multipart) for multi part requests that have filenames containing non ASCII characters (eg ) and perhaps also with a " character in the name (if the FS supports it).

I'm very interested to see if RFC8187 is widely supported.

@gregw
Copy link
Contributor Author

gregw commented May 13, 2023

Just doing a test with Chrome Version 113.0.5672.63 (Official Build) (64-bit)

A file called t€st.txt is uploaded with:

Content-Disposition: form-data; name="userfile1"; filename="t�st.txt"

I can't tell what character that is instead of the euro. It pastes into characters app as \u0000 and into intellij as \u0080, which is the unicode control character???
[edit: interestingly github renders that character as the standard replacement character! click edit to see the real character]

A file called t"est.txt is uploaded with:

Content-Disposition: form-data; name="userfile1"; filename="t%22st.txt"

So no RFC 8187 here... but also no usage of slosh escaping rather % escaping.

Except the very same browser uploads t%st.txt as just

Content-Disposition: form-data; name="userfile1"; filename="t%st.txt"

So they are not consistent! No idea what we should do?
I guess if we implemented RFC8187, then at least if we ever saw a filename*= parameter, then we'd know for sure how it was encoded.

@gregw gregw marked this pull request as ready for review May 13, 2023 12:33
@joakime
Copy link
Contributor

joakime commented May 13, 2023

@joakime can you gather the raw browser-capture data (like you did for jetty-core/jetty-http/src/test/resources/multipart) for multi part requests that have filenames containing non ASCII characters (eg ) and perhaps also with a " character in the name (if the FS supports it).

I'm very interested to see if RFC8187 is widely supported.

I can, but it'll have to wait till I get back.
I'll also capture the 3 examples in your follow-up comment from multiple os/browser combos.

@gregw
Copy link
Contributor Author

gregw commented May 16, 2023

@sbordet I've pushed some javadoc

@gregw gregw merged commit 068a60a into jetty-12.0.x May 18, 2023
@gregw gregw deleted the jetty-12-simplify-QuotedStringTokenizer branch May 18, 2023 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants