-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Unicode RegExp property escapes #32214
Comments
It is probably more complex.
both of these should work. Another possibility would be to change all the |
regexpu is a transpiler, not a polyfill. It only translates regular expression literals, which seems strictly better than blocking this feature on |
Hello, is there any update? This is a very important feature to support (Babel has done it). I'm relying on RegExp's Unicode Property Escapes to simplify some normalize & CJKV characters extraction functions. For example: const normalizedWord = word.trim().toLowerCase().normalize('NFD')
// normalizedWord.replace(/[\u0300-\u036f]/gu, '')
// normalizedWord.replace(/[\^`\xA8\xAF\xB4\xB7\xB8\u02B0-\u034E\u0350-\u0357\u035D-\u0362\u0374\u0375\u037A\u0384\u0385\u0483-\u0487\u0559\u0591-\u05A1\u05A3-\u05BD\u05BF\u05C1\u05C2\u05C4\u064B-\u0652\u0657\u0658\u06DF\u06E0\u06E5\u06E6\u06EA-\u06EC\u0730-\u074A\u07A6-\u07B0\u07EB-\u07F5\u0818\u0819\u0898-\u089F\u08C9-\u08D2\u08E3-\u08FE\u093C\u094D\u0951-\u0954\u0971\u09BC\u09CD\u0A3C\u0A4D\u0ABC\u0ACD\u0AFD-\u0AFF\u0B3C\u0B4D\u0B55\u0BCD\u0C3C\u0C4D\u0CBC\u0CCD\u0D3B\u0D3C\u0D4D\u0DCA\u0E47-\u0E4C\u0E4E\u0EBA\u0EC8-\u0ECC\u0F18\u0F19\u0F35\u0F37\u0F39\u0F3E\u0F3F\u0F82-\u0F84\u0F86\u0F87\u0FC6\u1037\u1039\u103A\u1063\u1064\u1069-\u106D\u1087-\u108D\u108F\u109A\u109B\u135D-\u135F\u1714\u1715\u17C9-\u17D3\u17DD\u1939-\u193B\u1A75-\u1A7C\u1A7F\u1AB0-\u1ABE\u1AC1-\u1ACB\u1B34\u1B44\u1B6B-\u1B73\u1BAA\u1BAB\u1C36\u1C37\u1C78-\u1C7D\u1CD0-\u1CE8\u1CED\u1CF4\u1CF7-\u1CF9\u1D2C-\u1D6A\u1DC4-\u1DCF\u1DF5-\u1DFF\u1FBD\u1FBF-\u1FC1\u1FCD-\u1FCF\u1FDD-\u1FDF\u1FED-\u1FEF\u1FFD\u1FFE\u2CEF-\u2CF1\u2E2F\u302A-\u302F\u3099-\u309C\u30FC\uA66F\uA67C\uA67D\uA67F\uA69C\uA69D\uA6F0\uA6F1\uA700-\uA721\uA788-\uA78A\uA7F8\uA7F9\uA8C4\uA8E0-\uA8F1\uA92B-\uA92E\uA953\uA9B3\uA9C0\uA9E5\uAA7B-\uAA7D\uAABF-\uAAC2\uAAF6\uAB5B-\uAB5F\uAB69-\uAB6B\uABEC\uABED\uFB1E\uFE20-\uFE2F\uFF3E\uFF40\uFF70\uFF9E\uFF9F\uFFE3\u{102E0}\u{10780}-\u{10785}\u{10787}-\u{107B0}\u{107B2}-\u{107BA}\u{10AE5}\u{10AE6}\u{10D22}-\u{10D27}\u{10F46}-\u{10F50}\u{10F82}-\u{10F85}\u{11046}\u{11070}\u{110B9}\u{110BA}\u{11133}\u{11134}\u{11173}\u{111C0}\u{111CA}-\u{111CC}\u{11235}\u{11236}\u{112E9}\u{112EA}\u{1133C}\u{1134D}\u{11366}-\u{1136C}\u{11370}-\u{11374}\u{11442}\u{11446}\u{114C2}\u{114C3}\u{115BF}\u{115C0}\u{1163F}\u{116B6}\u{116B7}\u{1172B}\u{11839}\u{1183A}\u{1193D}\u{1193E}\u{11943}\u{119E0}\u{11A34}\u{11A47}\u{11A99}\u{11C3F}\u{11D42}\u{11D44}\u{11D45}\u{11D97}\u{16AF0}-\u{16AF4}\u{16B30}-\u{16B36}\u{16F8F}-\u{16F9F}\u{16FF0}\u{16FF1}\u{1AFF0}-\u{1AFF3}\u{1AFF5}-\u{1AFFB}\u{1AFFD}\u{1AFFE}\u{1CF00}-\u{1CF2D}\u{1CF30}-\u{1CF46}\u{1D167}-\u{1D169}\u{1D16D}-\u{1D172}\u{1D17B}-\u{1D182}\u{1D185}-\u{1D18B}\u{1D1AA}-\u{1D1AD}\u{1E130}-\u{1E136}\u{1E2AE}\u{1E2EC}-\u{1E2EF}\u{1E8D0}-\u{1E8D6}\u{1E944}-\u{1E946}\u{1E948}-\u{1E94A}]/gu, '')
normalizedWord.replace(/\p{Diacritic}/gu, '')
// normalizedWord.match(/[\u3006\u3007\u3021-\u3029\u3038-\u303A\u3400-\u4DBF\u4E00-\u9FFF\uF900-\uFA6D\uFA70-\uFAD9\u{16FE4}\u{17000}-\u{187F7}\u{18800}-\u{18CD5}\u{18D00}-\u{18D08}\u{1B170}-\u{1B2FB}\u{20000}-\u{2A6DF}\u{2A700}-\u{2B738}\u{2B740}-\u{2B81D}\u{2B820}-\u{2CEA1}\u{2CEB0}-\u{2EBE0}\u{2F800}-\u{2FA1D}\u{30000}-\u{3134A}]/gu)
normalizedWord.match(/\p{Ideographic}/gu)
// My thanks to this online tool: https://mothereff.in/regexpu
// It was very hard to crawl the Unicode docs and declare accuracy Unicode ranges which matched the properties such as "Diacritic", "Ideographic", etc... Most of the popular browsers are supported this feature since ES2018. However, I'm currently stuck with an outdated V8 Engine that is embedded into the native Android/iOS game runtime. Upgrading the embedded V8 Engine and integrating it into a third-party game engine is not a suitable choice due to the very tight deadline and my skill's limitations (yeah, it's interesting to try, but had better later). I'm hoping that TypeScript will support transpiling this super awesome feature to lower targets such as "ES3, ES5, ES2015 - 2017". So that many developers can benefit from it easily. |
TypeScript currently doesn't transpile Unicode property escapes (of the form
\p{ID_Start}
or\P{ASCII}
) in regular expressions.It would be great if it did!
https://www.typescriptlang.org/play/?target=1#code/MYewdgzgLgBATgUwOYIB4wLwwPQB0AOA3gMrBwCW+UGA4oggNYC+2ArgNwBQokIANggB0fEEgAUiFKkFQE0MQHJAA8AKAlKvZA
Search Terms
regexp, regular expression, Unicode, property escapes, ES2018
Suggestion
Support transpiling Unicode property escapes in regular expressions. Examples:
Use Cases
One particular use case is matching identifier characters in JavaScript parsers. This is currently commonly implemented as a large script-generated regular expression pattern (like in Esprima) or as a magical-looking list of code point ranges (like in TypeScript itself). However, it would be much simpler to use property escapes.
Checklist
My suggestion meets these guidelines:
The text was updated successfully, but these errors were encountered: