Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noir support via alternative design #7

Closed
wants to merge 2 commits into from

Conversation

ewynx
Copy link

@ewynx ewynx commented Sep 24, 2024

Description

Problem*

The initial implementation for Noir support used a very sparse lookup table which didn't seem to have any optimizations. Some experimentation showed that using a large if/else for the state transition has far better gatecount (2-3x less).

Other options:

  • The lookup table being [(Field, Field, Field); 74] and determining the next state by iterating over the array -> better performance than the sparse lookup table but not as good as the giant if else
  • Using a HashMap -> extremely expensive

Summary*

This PR implements

  • the state transitions from the DFA as a large if-else statement
  • consecutive charcodes are grouped together for optimization (for example for [a-z] it checks whether input character is within the range a-z)
  • caret anchor support (ˆ)
  • end anchor support ($)

A previous issue was also accepting (example) alexalex for regex alex$, because it wouldn't go into the correct state upon encountering the second a. This works well in this implementation.

Note that multiple accepting states are not supported, which is in line with zk-regex circom impl.

Additional Context

This can be tested for both raw & decomposed setting. This test suite can be used.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

olehmisar and others added 2 commits September 19, 2024 14:13
…s decided based on current state and input character. Consecutive charcodes are grouped together for optimization. This impl also correctly accepts inputs that show the regex twice (if that is allowed by the regex).

Includes:
- caret anchor support (ˆ)
- end anchor support ($)

Multiple accepting states are not supported, which is in line with zk-regex circom impl.
@olehmisar
Copy link

(disclaimer: I did not test your code)

how big of an input string did you test this on?

I implemented a giant if-else approach as well when I implemented the first version of it here olehmisar@b306052.

It generated code like this:

Open

it was using 90k constrains for a 1024 length input string.

The thing is that with if-else approach, the amount of constraints grows linearly with the length of input string.

With a lookup table, the setup cost is fixed and then you have a very cheap for loop that iterates over the input string.

@olehmisar
Copy link

I tested your code. It generated 79090 constraints for m[10]-[ab]+ regex for 1024 length input string. The sparse lookup table approach generates ~9k constraints for the same regex and input length.

@ewynx
Copy link
Author

ewynx commented Sep 30, 2024

Hi @olehmisar I see the difference. I did the initial testing that I reported to TG with small inputs and didn't realize this, closing this draft PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants