Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter utf8 library validation #1

Merged
merged 2 commits into from
Nov 8, 2023
Merged

Stricter utf8 library validation #1

merged 2 commits into from
Nov 8, 2023

Conversation

zeux
Copy link
Contributor

@zeux zeux commented Oct 30, 2023

@Dekkonot
Copy link
Contributor

Because this technically changes behavior, due to Hyrum's law we might break some code that really wants to encode surrogates in UTF-8 despite the fact that only a couple functions support this.

I imagine nobody is encoding them deliberately since it's bad behavior. If they are, I'd love to hear from them and talk about what lead to this choice.

@zeux
Copy link
Contributor Author

zeux commented Oct 31, 2023

Also noting that I've confirmed through ~3B iterations of fuzzing an updated version of the UTF8 decoder and comparing its output with https://bjoern.hoehrmann.de/utf-8/decoder/dfa/ (which as far as I know is widely used in production), that missing surrogate validation was the only omission and after this the results match, which should also make it easier to change the UTF-8 implementation in the future if we need to.

@zeux zeux merged commit c12c6cd into master Nov 8, 2023
@zeux zeux deleted the zeux-utf8-strict branch November 8, 2023 20:03
alexmccord added a commit to luau-lang/luau that referenced this pull request Nov 10, 2023
# What's changed?

- Record the location of properties for table types (closes #802)
- Implement stricter UTF-8 validations as per the RFC
(luau-lang/rfcs#1)
- Implement `buffer` as a new type in both the old and new solvers.
- Changed errors produced by some `buffer` builtins to be a bit more
generic to avoid platform-dependent error messages.
- Fixed a bug where `Unifier` would copy some persistent types, tripping
some internal assertions.
- Type checking rules on relational operators is now a little bit more
lax.
- Improve dead code elimination for some `if` statements with complex
always-false conditions

## New type solver

- Dataflow analysis now generates phi nodes on exit of branches.
- Dataflow analysis avoids producing a new definition for locals or
properties that are not owned by that loop.
- If a function parameter has been constrained to `never`, report errors
at all uses of that parameter within that function.
- Switch to using the new `Luau::Set` to replace `std::unordered_set` to
alleviate some poor allocation characteristics which was negatively
affecting overall performance.
- Subtyping can now report many failing reasons instead of just the
first one that we happened to find during the test.
- Subtyping now also report reasons for type pack mismatches.
- When visiting `if` statements or expressions, the resulting context
are the common terms in both branches.

## Native codegen

- Implement support for `buffer` builtins to its IR for x64 and A64.
- Optimized `table.insert` by not inserting a table barrier if it is
fastcalled with a constant.

## Internal Contributors

Co-authored-by: Aaron Weiss <[email protected]>
Co-authored-by: Alexander McCord <[email protected]>
Co-authored-by: Andy Friesen <[email protected]>
Co-authored-by: Arseny Kapoulkine <[email protected]>
Co-authored-by: Aviral Goel <[email protected]>
Co-authored-by: Lily Brown <[email protected]>
Co-authored-by: Vyacheslav Egorov <[email protected]>
RomanKhafizianov added a commit to RomanKhafizianov/luau that referenced this pull request Nov 27, 2023
# What's changed?

- Record the location of properties for table types (closes #802)
- Implement stricter UTF-8 validations as per the RFC
(luau-lang/rfcs#1)
- Implement `buffer` as a new type in both the old and new solvers.
- Changed errors produced by some `buffer` builtins to be a bit more
generic to avoid platform-dependent error messages.
- Fixed a bug where `Unifier` would copy some persistent types, tripping
some internal assertions.
- Type checking rules on relational operators is now a little bit more
lax.
- Improve dead code elimination for some `if` statements with complex
always-false conditions

## New type solver

- Dataflow analysis now generates phi nodes on exit of branches.
- Dataflow analysis avoids producing a new definition for locals or
properties that are not owned by that loop.
- If a function parameter has been constrained to `never`, report errors
at all uses of that parameter within that function.
- Switch to using the new `Luau::Set` to replace `std::unordered_set` to
alleviate some poor allocation characteristics which was negatively
affecting overall performance.
- Subtyping can now report many failing reasons instead of just the
first one that we happened to find during the test.
- Subtyping now also report reasons for type pack mismatches.
- When visiting `if` statements or expressions, the resulting context
are the common terms in both branches.

## Native codegen

- Implement support for `buffer` builtins to its IR for x64 and A64.
- Optimized `table.insert` by not inserting a table barrier if it is
fastcalled with a constant.

## Internal Contributors

Co-authored-by: Aaron Weiss <[email protected]>
Co-authored-by: Alexander McCord <[email protected]>
Co-authored-by: Andy Friesen <[email protected]>
Co-authored-by: Arseny Kapoulkine <[email protected]>
Co-authored-by: Aviral Goel <[email protected]>
Co-authored-by: Lily Brown <[email protected]>
Co-authored-by: Vyacheslav Egorov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants