-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC to show performance improvements of not copying token #1561
base: main
Are you sure you want to change the base?
Conversation
main...davisp:datafusion-sqlparser-rs:pd/experiment/less-token-cloning tl;dr - Roughly 28% speedup on the microbenchmarks compared to main. 18% of that by tweaking a few of the main token methods in the last commit. I ended up poking at this today and have managed to get about a 28% improvement on the benchmarks over On a whim, I also poked at optimizing the Token::make_word method that does a binary search over all keywords. That appears to have saved about 400ms (of 1.2s total originally attributed to it). I skimmed the docs for phf but I'm not certain what this project's appetite is for new dependencies. It currently seems quite lean so I didn't pursue that just yet. Lastly, the biggest single commit was the latest that updates a few of the token handling methods themselves to avoid clones. This was the biggest jump at 18% improvement for a total just around 28% (based on my very non-scientific measurements). Also, one thing I did different than you @alamb, I noted CI flagged your EOF_TOKEN, so I just made mine a static since it doesn't have any initialization. All in all, this approach certainly seems feasible as well as actually productive. It turns out the hardest part is accounting for error reporting with the |
I agree we try to keep the dependency set down to a minimum. \ I wonder if we could automatically generate a jumptable with some trickery (maybe we could make a custom match block keyed on individual letters)L Something like match word[0] {
'a' | 'A' => match word[1] {
'b'| 'B' = > //match the third letter here We could then make a script, etc to generate this jumptable from the list of keywords 🤔 |
This sounds really neat @davisp Would you be willing to start making some smaller PRs with parts of your findings? |
This PR was hoping to show that the approach described in this ticket would be effective
Token
s as much #1558While I had this all loaded into my head, I wanted to bash out a PR / POC, but ran out of time.
This PR shows how we could avoid copying tokens
Goals;
peek_token_ref()
and similar functionsparse_prefix
, etc. as shown below)fixed-flamegraph
It will take some restructring of
parse_prefix
to avoid the use of the copying APIs, but I absolutely think it is possible