Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing url(…) with special characters such as ; or {} #14879

Merged
merged 6 commits into from
Nov 6, 2024

Conversation

RobinMalfait
Copy link
Member

This PR fixes an issue where some special characters (with an actual meaning CSS) were used inside of the url(…) function, would result in incorrectly parsed CSS.

For example, when we encounter a {, then we would start a new "block" for nesting purposes. If we encounter an }, then the block would end. If we encounter a ;, then that would be the end of a declaration.

All of that is true, unless we are in a url(…) function. In that case, we should ignore all of those characters and treat them as part of the URL.

This is only an issue because:

  1. We are allowed to use these characters in URLs.
  2. We can write an url inside url(…) without quotes. With quotes, this would not be an issue.

@RobinMalfait RobinMalfait requested a review from a team as a code owner November 5, 2024 16:45
@RobinMalfait RobinMalfait force-pushed the fix/parsing-url-with-semicolon branch from c37dd4e to 0018d99 Compare November 5, 2024 16:46
@@ -42,6 +42,7 @@ export function parse(input: string) {

let buffer = ''
let closingBracketStack = ''
let parenStack = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the closingBracketStack for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is a number but should be a string if for some reason we can't use closingBracketStack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I did originally, but you would run into issues relatively quickly. When you are parsing CSS like this:

.foo {
  background: url();
}

The moment you reach the ; character, then the closingBracketStack looks like this: {( where { is coming from the selector .foo {, and ( is coming from the url( function. Meaning that if the last character is ( we know that we are not top-level anymore, and we can safely ignore the ;.

However, if we are parsing CSS like this:

.foo {
  background: url(https://example-image-search.org?q={query;limit=5}&ids=[1,2,3]);
}

If we now reach the ;, then the closingBracketStack looks like this: {({, where the first {, comes from the selector .foo {, the ( comes from the url( function and the last { comes from the query string.

In this case we can't just look at the last character in the closingBracketStack, but have to check if any ( exists to know that we can safely ignore the ;. This means that our O(1) lookup turns into an O(n) lookup. Sure, it's a small n, but still not ideal.

A second issue is that we validate that the closingBracketStack is empty at the end of the parsing. Which ensures that every opening bracket has a closing bracket and thus properly balanced.

However, brackets inside the url, don't need to be balanced. This is valid CSS:

.foo {
  background: url(https://example-image-search.org?curlies=}});
}
image

As you can see, the request is actually made with the curlies in the query param.

If we use the closingBracketStack, in this case it would look like this:

  1. { — from the selector .foo {
  2. {( — from the start of the declaration value url(
  3. {(} — from curlies=}, this would also error because improperly balanced. But let's assume it
    didn't error.
  4. {(}} — from curlies=}}.
  5. {(}}) — from closing the url), but the closing bracket is not balanced and in a strange
    state.
  6. {(}})} — from closing the .foo { block }.

Because we don't care about balancing curlies, and because we don't want to use O(n) instead of O(1) lookups, I moved it to a separate variable.

@thecrypticace, I picked a number because this value is only concerned with parentheses, and not with any other type of brackets. Since we don't care about the other types of brackets sandwiched between the parentheses (e.g.: {(})), a number is a bit nicer to work with because we can just increment/decrement the value.

Eventually we do want the url(…) to be closed though, so properly closing the parentheses is still important, but this basically means that the number should be 0 at the end, which we already account for.

If we don't like this solution, and still prefer to use the closingBracketStack, we could also apply this rule instead:

If we see (, ignore every other special character like ; or } until we see an ), because that is essentially exactly what we are trying to achieve. If we then encounter an { or }, we don't even touch the closingBracketStack at all.

It does mean that we have to work with strings instead of numbers a bit more, but that's fine I think? It does reduce the amount of concepts in the parser which is a benefit.

Diff would then look like this:

diff --git a/packages/tailwindcss/src/css-parser.ts b/packages/tailwindcss/src/css-parser.ts
index b91011e0c..3b6e283e5 100644
--- a/packages/tailwindcss/src/css-parser.ts
+++ b/packages/tailwindcss/src/css-parser.ts
@@ -42,7 +42,6 @@ export function parse(input: string) {
 
   let buffer = ''
   let closingBracketStack = ''
-  let parenStack = 0
 
   let peekChar
 
@@ -332,7 +331,10 @@ export function parse(input: string) {
     // }
     // ```
     //
-    else if (currentChar === SEMICOLON && parenStack <= 0) {
+    else if (
+      currentChar === SEMICOLON &&
+      closingBracketStack[closingBracketStack.length - 1] !== ')'
+    ) {
       let declaration = parseDeclaration(buffer)
       if (parent) {
         parent.nodes.push(declaration)
@@ -344,7 +346,10 @@ export function parse(input: string) {
     }
 
     // Start of a block.
-    else if (currentChar === OPEN_CURLY && parenStack <= 0) {
+    else if (
+      currentChar === OPEN_CURLY &&
+      closingBracketStack[closingBracketStack.length - 1] !== ')'
+    ) {
       closingBracketStack += '}'
 
       // At this point `buffer` should resemble a selector or an at-rule.
@@ -369,7 +374,10 @@ export function parse(input: string) {
     }
 
     // End of a block.
-    else if (currentChar === CLOSE_CURLY && parenStack <= 0) {
+    else if (
+      currentChar === CLOSE_CURLY &&
+      closingBracketStack[closingBracketStack.length - 1] !== ')'
+    ) {
       if (closingBracketStack === '') {
         throw new Error('Missing opening {')
       }
@@ -459,17 +467,17 @@ export function parse(input: string) {
 
     // `(`
     else if (currentChar === OPEN_PAREN) {
-      parenStack += 1
+      closingBracketStack += ')'
       buffer += '('
     }
 
     // `)`
     else if (currentChar === CLOSE_PAREN) {
-      if (parenStack <= 0) {
+      if (closingBracketStack[closingBracketStack.length - 1] !== ')') {
         throw new Error('Missing opening (')
       }
 
-      parenStack -= 1
+      closingBracketStack = closingBracketStack.slice(0, -1)
       buffer += ')'
     }

Note: I know the closingBracketStack actually holds the inverse of the actual brackets, but it's a bit easier to reason about it this way.

Copy link
Contributor

@thecrypticace thecrypticace Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I SWEAR when I read parenStack += … I saw a (. My brain is fried. Carry on…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, no worries!

Implemented that last change here: 9177c68

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed info. I do think the diff you ended up and the concept of "(" will not cause any other stack to be created until it's closed again seems good and it feels like the diff is lighter this way. Awesome!

When parsing the `url(…)`, then quotes to create a string are not
required. If you use special characters inside the `url(…)` such as `;`
then we should not end the declaration.

Likewise, when the `url(…)` contains `{` or `}`, then we don't want to
create a new block (for nesting) either.
Some characters, such as `{`, `}`, and `;` are allowed inside CSS
functions, e.g.: `url(https://example?q={query;limit=5})`.

Notice that the example is not in quotes, and this is a valid URL. If we
don't handle these characters when inside of `(` and `)`, it means that
the `{` would start a nested block, `}` would end the block, and `;`
would end the current declaration.

Instead, if we are in parens, we do allow the values.
We still apply the same rules where we don't really care about special
characters such as `{`, `}` or `;` when we are inside of `(` and `)`.

We can safely look at the last character in the `closingBracketStack`
because when we are in parentheses, we don't update the
`closingBracketStack` until we see a closing `)`.
@RobinMalfait RobinMalfait force-pushed the fix/parsing-url-with-semicolon branch from 0018d99 to 9177c68 Compare November 5, 2024 21:48
CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Philipp Spiess <[email protected]>
@RobinMalfait RobinMalfait merged commit 4e16410 into next Nov 6, 2024
1 check passed
@RobinMalfait RobinMalfait deleted the fix/parsing-url-with-semicolon branch November 6, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants