Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/HC improvements for zk-regex Noir support #75

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ewynx
Copy link

@ewynx ewynx commented Nov 26, 2024

This PR was previously opened here.

Description

This PR contains implementation of features gen_substrs, ˆ support, $ support and overall bugfixes for the Noir support.

This branch has been tested equally as the circom implementation.
All circom tests from the original zk-regex lib have been added in the test-suite. All tests pass with the added features and bugfixes.

  • ˆ support is realized by prefixing the input array by 255. This is the same in circom
  • $ support is realized by adding an additional accepting state, to which the previous accepting state transitions for any character. This new state is then added to the accepting states. In the case that $ is at the end of the regex this extra transition is not done and inputs continuing after $ are thus rejected. This solution increases the lookup table size by 255 rows
  • gen_substrs lets us extract substrings alongside the regex check. This can be done via decomposed or raw setting.
    • substring is extracted based on transition information in the DFA
    • the return type per substring is BoundedVec<Field,N>, because we don't know the exact length beforehand
    • the total return type for the regex_match function with substring extraction is Vec<BoundedVec<Field,N>> because the total number of substrings is not always known beforehand
    • only if a substring is part of a valid regex it is added to return output (similar to the consecutive check in circom)
    • the implementation strategy for $ is needed also to extract the exact correct substring (otherwise it would just keep extracting until the end of the input)
    • we set gen_substrs in raw to default (this is a change outside of the Noir code, but seemed to make sense)
  • fix: introduction of a "reset" in state. If the next state is 0, we need to consider a possible transition from state 0 at the current moment as well. Example: regex ab and input aab. For the first input a it moves into state 1. For the second input a it moves into state 0. And then it would stay there. Now, we're adding the possibility for the 2nd occurrence of a to move into state 1 again.

Note: multiple accepting states that would occur directly from the regex are not supported, same as in the circom impl. (See README comment of original lib here).

This replaces previously opened PRs: noir-lang#2 and noir-lang#1. (Although the steps for manual verification are still valid)

Additional Context

The test suite is built specifically for the Noir zk-regex library. From a database of regex inputs + samples it will generate the required Noir code, create the desired tests and run them. The database has been filled with the equivalents of the tests for circom. Additionally, there are 2 hardcoded test projects for the circom tests that had more complex circuits (combining multiple templates).

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

olehmisar and others added 9 commits September 19, 2024 14:13
…aw setting.

The substrings are returned as BoundedVec since we don't know their exact length upfront, but we know they're not longer than N.
To support both settings (decomposed and raw) we have to use `substring_ranges` instead of `substring_boundaries`.
…gex and input. This fix makes sure this is supported.

Changes:
- regex_match returns a Vec of substrings instead of an array with known length
- per state where substrings have to be extracted; add the byte either to a new substring or an already started one

Note that substr_count is used to extract the correct "current" substring from the Vec. This is a workaround - first implementation was using `pop` but this gave an error.
For caret anchor: Mark beginning of input byte array with 255, which makes the check for caret anchor (ˆ) works. Note that ^ is only taken into consideration in the decomposed mode.
…states reachable from state 0.

Substrings only get saved when they are part of a path that doesn't reset.
Copy link

@jzaki jzaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the readme to mention the state of Noir support (previously "coming soon").
Also to check that the Theory section is the same for Noir.

@Divide-By-0
Copy link
Member

Divide-By-0 commented Dec 4, 2024

Some notes:

  • Did we ever get the sparse array code compiling in Noir? Without that lookups are infeasibly slow right?
  • We had a number of commits added after your fork 3 months ago, including critical audit fixes like 7754156 -- are these represented in the Noir code yet?
  • We added forwards/backwards regexes for from field parsing -- I'll attach a doc but this seems to be needed for robust non-injectable from field parsing and can't be done with just a single forwards pass :/

@ewynx
Copy link
Author

ewynx commented Dec 5, 2024

pinging @olehmisar also, it seems there is not a separate PR for his initial work on the Noir support
(and I think he tried to work with sparse array lib)

@olehmisar
Copy link

olehmisar commented Dec 5, 2024

@Divide-By-0

  1. Yes, without lookup arrays, it's very expensive. Checkout this comment Noir support via alternative design noir-lang/zk-regex#7 (comment). I am not aware of sparse_array library compiling fine in comptime blocks last time I checked (1 month ago)
  2. Noir code was not based on circom, so circom audit fixes are probably not relevant
  3. -

@ewynx
Copy link
Author

ewynx commented Dec 5, 2024

Updating the readme to mention the state of Noir support (previously "coming soon"). Also to check that the Theory section is the same for Noir.

@jzaki: Adjusted the Noir status. Wrt the Theory section, the Noir support was aligned to the Circom implementation by testing it for the exact same cases.

@Savio-Sou
Copy link

Savio-Sou commented Dec 9, 2024

I am not aware of sparse_array library compiling fine in comptime blocks last time I checked (1 month ago)

@olehmisar would you be able to create a Noir Issue for it? Happy to help follow up with the team.

UPDATE: Should have been fixed since noir-lang/noir#6514, do give Noir ≥0.39.0 a spin and see if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants