You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The reason will be displayed to describe this comment to others. Learn more.
Which versions of ansi-regex are impacted by this vulnerability? Several scanners are flagging all versions of this library other than 5.0.1 and 6.0.1. I suspect some of the older versions aren't impacted, because the pattern variable is different (just spot checked I think v2.1.1 and v3).
The reason will be displayed to describe this comment to others. Learn more.
Not sure because the security researchers are not doing their due diligence and are instead farming bounty money on e.g. huntr.dev, targeting the most used repositories for clout.
I'm really not sure what to do about this and it's eating a lot of time away. The vulnerability itself is not major - unless you're allowing long AND unsanitized user input to hit the API directly, the vulnerability doesn't affect you.
I have no idea which versions it actually affects because they didn't test it correctly.
I then plugged that unescaped version back in to determine whether there were any problems with the pattern itself, and see that there is an incomplete/unclosed (?: which should probably be removed:
The next (?: ... | ... )? group does contain an alternation, and is qualified by its ? quantifier. Immediately after this optional group, it looks for \u0007 at the end.
Within both alternations, the character groups [-a-zA-Z\d\/#&.:=?%@~_] can be shortened: [a-zA-Z0-9_] or [a-zA-Z\d_] can be replaced with \w, so: [-\w\/#&.:=?%@~].
So, within the first alternation, it matches a semicolon ; followed by one or more of those character-group characters. Within the second alternation, it first looks for letters or numbers, followed by any number of semicolon ; followed by zero or more of those character-group characters. I'm not sure whether that would want to be a + one-or-more, since that would allow "abc;;;;;;;;;" to match there instead of requiring characters between those semicolons.
If you look at this example, I've changed the outer non-capturing (?: ... )? to a capturing group just for illustrative purposes so you can see how the semicolon is matched in one case but not in the other due to this zero or more quantifiers. You can see this when the semicolon is added to the text in this next example.
If both ;[-\w\/#&.:=?%@~] character groups are quantified with zero-or-more *, then subsequent semicolons ;; will match, while if they are quantified with one-or-more +, the match fails due to subsequent semicolons ;;.
The reason will be displayed to describe this comment to others. Learn more.
Would someone be able to help me understand the exact formatting going on here?
@OnlineCop can you first help me understand what exactly is your question here? Are you trying to understand what the code is doing? Are you proposing a readability improvement? Or performance improvement? Sorry, you said a lot of things but I couldn't even understand what is your goal, roughly...
I assume that the \ backslashes are escaped because this is within a string, meaning \u001B must be represented as \\u001B ?
The reason will be displayed to describe this comment to others. Learn more.
In the original, lines 2-5 were looking for:
either \u001B (ESC) or \u009B (¢ cent sign) followed by 0 or more of these characters: []()#;?
After that, it was looking for either:
2. any number of letters, numbers, semi-colons, or special characters that came before \u0007 (BEL)
- There were a lot of 0-or-more groups, all of which was also within its own (?:...)? 0-or-1 (optional) group, before that BEL.
- It was definitely something that could cause ReDoS.
3. or an optional group of digits and semi-colons, followed by a single character in the range [\dA-PR-TZcf-ntqry=><~].
- This optional group also had potential for a ReDoS.
In the revised version, lines 2-5 were looking for the same characters as 1. above, but increased the variety of characters allowed before \u0007 and still has a lot of potential for ReDoS (possibly more than before).
I don't understand what exactly is supposed to be matched here. Both BEFORE and AFTER versions have a high probability to cause catastrophic backtracking. Are there any tests that should match either of the BEFORE or AFTER regexes?
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which versions of ansi-regex are impacted by this vulnerability? Several scanners are flagging all versions of this library other than 5.0.1 and 6.0.1. I suspect some of the older versions aren't impacted, because the pattern variable is different (just spot checked I think v2.1.1 and v3).
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure because the security researchers are not doing their due diligence and are instead farming bounty money on e.g. huntr.dev, targeting the most used repositories for clout.
I'm really not sure what to do about this and it's eating a lot of time away. The vulnerability itself is not major - unless you're allowing long AND unsanitized user input to hit the API directly, the vulnerability doesn't affect you.
I have no idea which versions it actually affects because they didn't test it correctly.
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed 4.1.0 and 3.0.0 as affected testing using the provided reproducer. 2.1.1 does not reproduce the issue.
3.0.0 is the first affected, as that's the first version that includes 69bebf6 that the problematic part of the regex.
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @thoger. 🙂
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thoger Does 5.0.0 or above, are not affected this? Can you confirm on the same?
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arungpro
5.0.0
is affected,5.0.1
is not. Likewise,6.0.0
is affected,6.0.1
is not.8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would someone be able to help me understand the exact formatting going on here?
I assume that the
\
backslashes are escaped because this is within a string, meaning\u001B
must be represented as\\u001B
?I copied the "proposed" escaped original into regex101 and removed the escape characters to produce this:
[\u001B\u009B][[\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\d\/#&.:=?%@~_]+)*|[a-zA-Z\d]+(?:;[-a-zA-Z\d\/#&.:=?%@~_]*)*)?\u0007)
I then plugged that unescaped version back in to determine whether there were any problems with the pattern itself, and see that there is an incomplete/unclosed
(?:
which should probably be removed:[\u001B\u009B][[\]()#;?]*(?:(?:;[-a-zA-Z\d\/#&.:=?%@~_]+)*|[a-zA-Z\d]+(?:;[-a-zA-Z\d\/#&.:=?%@~_]*)*)?\u0007
The next outer
(?: ... )
group appears to have no alternations or quantifiers, so can probably also be removed:(?:(?:;[-a-zA-Z\d\/#&.:=?%@~_]+)*|[a-zA-Z\d]+(?:;[-a-zA-Z\d\/#&.:=?%@~_]*)*)?\u0007)
The next
(?: ... | ... )?
group does contain an alternation, and is qualified by its?
quantifier. Immediately after this optional group, it looks for\u0007
at the end.Within both alternations, the character groups
[-a-zA-Z\d\/#&.:=?%@~_]
can be shortened:[a-zA-Z0-9_]
or[a-zA-Z\d_]
can be replaced with\w
, so:[-\w\/#&.:=?%@~]
.That would look like this:
[\u001B\u009B][[\]()#;?]*(?:(?:;[-\w\/#&.:=?%@~]+)*|[a-zA-Z\d]+(?:;[-\w\/#&.:=?%@~]*)*)?\u0007
So, within the first alternation, it matches a semicolon
;
followed by one or more of those character-group characters. Within the second alternation, it first looks for letters or numbers, followed by any number of semicolon;
followed by zero or more of those character-group characters. I'm not sure whether that would want to be a+
one-or-more, since that would allow "abc;;;;;;;;;" to match there instead of requiring characters between those semicolons.If you look at this example, I've changed the outer non-capturing
(?: ... )?
to a capturing group just for illustrative purposes so you can see how the semicolon is matched in one case but not in the other due to this zero or more quantifiers. You can see this when the semicolon is added to the text in this next example.If both
;[-\w\/#&.:=?%@~]
character groups are quantified with zero-or-more*
, then subsequent semicolons;;
will match, while if they are quantified with one-or-more+
, the match fails due to subsequent semicolons;;
.8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ansi-regex v3 and v4 should now include the vulnerability fixes:
3.0.1
in 419250f and c57d4c24.1.1
in 75a657d8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither 3.0.1, nor 4.1.1 are listed on https://www.npmjs.com/package/ansi-regex (under Versions), btw.
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You sure?
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OnlineCop can you first help me understand what exactly is your question here? Are you trying to understand what the code is doing? Are you proposing a readability improvement? Or performance improvement? Sorry, you said a lot of things but I couldn't even understand what is your goal, roughly...
Yes, exactly.
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original, lines 2-5 were looking for:
\u001B
(ESC) or\u009B
(¢ cent sign) followed by 0 or more of these characters:[]()#;?
After that, it was looking for either:
2. any number of letters, numbers, semi-colons, or special characters that came before
\u0007
(BEL)- There were a lot of 0-or-more groups, all of which was also within its own
(?:...)?
0-or-1 (optional) group, before that BEL.- It was definitely something that could cause ReDoS.
3. or an optional group of digits and semi-colons, followed by a single character in the range
[\dA-PR-TZcf-ntqry=><~]
.- This optional group also had potential for a ReDoS.
In the revised version, lines 2-5 were looking for the same characters as 1. above, but increased the variety of characters allowed before
\u0007
and still has a lot of potential for ReDoS (possibly more than before).I don't understand what exactly is supposed to be matched here. Both BEFORE and AFTER versions have a high probability to cause catastrophic backtracking. Are there any tests that should match either of the BEFORE or AFTER regexes?
8d1d7cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to create a reproduction case that exploits exponential time complexity and file an issue. Otherwise I'm not sure what you're trying to say.