-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve .chars().count() #37888
Improve .chars().count() #37888
Conversation
Use a simpler loop to count the `char` of a string: count the number of non-continuation bytes. Use `count += <conditional>` which the compiler understands well and can apply loop optimizations to.
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
@bors: r+ Nice wins! |
📌 Commit 5a3aa2f has been approved by |
Improve .chars().count() Use a simpler loop to count the `char` of a string: count the number of non-continuation bytes. Use `count += <conditional>` which the compiler understands well and can apply loop optimizations to. benchmark descriptions and results for two configurations: - ascii: ascii text - cy: cyrillic text - jp: japanese text - words ascii: counting each split_whitespace item from the ascii text - words jp: counting each split_whitespace item from the jp text ``` x86-64 rustc -Copt-level=3 name orig_ ns/iter cmov_ ns/iter diff ns/iter diff % count_ascii 1,453 (1755 MB/s) 1,398 (1824 MB/s) -55 -3.79% count_cy 5,990 (856 MB/s) 2,545 (2016 MB/s) -3,445 -57.51% count_jp 3,075 (1169 MB/s) 1,772 (2029 MB/s) -1,303 -42.37% count_words_ascii 4,157 (521 MB/s) 1,797 (1205 MB/s) -2,360 -56.77% count_words_jp 3,337 (1071 MB/s) 1,772 (2018 MB/s) -1,565 -46.90% x86-64 rustc -Ctarget-feature=+avx -Copt-level=3 name orig_ ns/iter cmov_ ns/iter diff ns/iter diff % count_ascii 1,444 (1766 MB/s) 763 (3343 MB/s) -681 -47.16% count_cy 5,871 (874 MB/s) 1,527 (3360 MB/s) -4,344 -73.99% count_jp 2,874 (1251 MB/s) 1,073 (3351 MB/s) -1,801 -62.67% count_words_ascii 4,131 (524 MB/s) 1,871 (1157 MB/s) -2,260 -54.71% count_words_jp 3,253 (1099 MB/s) 1,331 (2686 MB/s) -1,922 -59.08% ``` I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time), but the code in this PR was always winning `count_words_ascii` in particular (counting many small strings); this solution is an improvement without tradeoffs.
I'm curious – bytecount is much faster than anything else at counting bytes, and should be adaptable to this situation (count bytes lower than 128) without perf loss. |
Go ahead and experiment. My comment was
I'm leaving the door open to such improvements, but I suggest looking out for the small-input case as well. |
Oh by the way @llogiq did you see this comment? I wanted to tell you, due to possible appication in bytecount, that it can be beneficial (it was to me) to use this kind of raw pointer solution instead of computing separate slice parts up front. (Edit: Oh I now see why you couldn't possibly see that comment). |
By the way, it's not counting just bytes lower than 128, but any (non-)continuation byte. |
Use a simpler loop to count the
char
of a string: count thenumber of non-continuation bytes. Use
count += <conditional>
which thecompiler understands well and can apply loop optimizations to.
benchmark descriptions and results for two configurations:
I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning
count_words_ascii
in particular (countingmany small strings); this solution is an improvement without tradeoffs.