Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast OnDemand parsing for Neoverse #94

Merged
merged 3 commits into from
Nov 7, 2024

Conversation

emcastillo
Copy link
Contributor

@emcastillo emcastillo commented Sep 11, 2024

This PR uses the same approach than x86 for doing the OnDemand parsing on ARM.
On a NVIDIA Grace cpu this results in a 5x speedup for the twitter benchmark and ~3x for citm_catalog.

We use the simdjson simd8x64 type to obtain a 64 bit mask that allows us to operate on 64 characters at a time. Although the bitmask obtention is expensive and requires several neon instructions, it makes us able to process 64 characters per instruction using the bitmaps. If we instead use the shrn instructions we would be able to process only 16 characters per instruction.

This patch also uses this approach in the sve code but using neon instructions, In the Neoverse v2 optimization guide the comparison operation has a latency of 4 cycles and a throughput of 1 instruction per cycle while for neon instructions the latency is 2 cycles and throughput is 4 instructions per cycle.

Benchmark results build/benchmark/bench --benchmark_filter=SonicOnDema

Master branch

twitter/SonicOnDemand_Normal           111149 ns       111152 ns         6297 bytes_per_second=2.21522Gi/s Normal
citm_catalog/SonicOnDemand_Fronter      33629 ns        33630 ns        20804 bytes_per_second=47.8316Gi/s Fronter
twitter/SonicOnDemand_NotFound         111161 ns       111165 ns         6298 bytes_per_second=2.21496Gi/s NotFound

This PR

twitter/SonicOnDemand_Normal            22625 ns        22624 ns        30718 bytes_per_second=10.8832Gi/s Normal
citm_catalog/SonicOnDemand_Fronter      12861 ns        12862 ns        54399 bytes_per_second=125.067Gi/s Fronter
twitter/SonicOnDemand_NotFound          22423 ns        22422 ns        31349 bytes_per_second=10.9814Gi/s NotFound

This PR is contributed by NVIDIA

@liuq19
Copy link
Collaborator

liuq19 commented Oct 15, 2024

@emcastillo Thanks, need to format the codes

@emcastillo
Copy link
Contributor Author

@liuq19 sorry for the delay. I pushed some formatting changes.
I ran the files through "clang-format" hope is that enough

@liuq19 liuq19 merged commit 91f84fc into bytedance:master Nov 7, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants