Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement FilterFn for ChunkedArray #794

Merged
merged 14 commits into from
Sep 15, 2024
Merged

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Sep 11, 2024

No description provided.

@a10y
Copy link
Contributor Author

a10y commented Sep 11, 2024

Heh, turns out the naive implementation is 3x slower than develop, pretty much entirely due to find_chunk_idx slowness

@a10y
Copy link
Contributor Author

a10y commented Sep 11, 2024

Alright so currently what I'm seeing is for Q15 vortex-compressed-file:

  • develop: ~112ms
  • This PR, using ChunkedArray::find_chunk_idx: ~700ms, most of that in find_chunk_idx
  • The latest commit that uses a binary search on pre-materialized chunk indices: ~106ms

The new flame graph is showing ~6% less time spent in filter, which is what we'd expect roughly just going off the numbers

@robert3005
Copy link
Member

Right, this can go lower if all the children also implement filter. Good to know that we need to unpack values

@a10y
Copy link
Contributor Author

a10y commented Sep 12, 2024

Most of the children do implement FilterFn, the exception is extension array

EDIT: about half do. The missing ones here are FoR and Extension

@robert3005
Copy link
Member

FoR has it. Notable omission is Bitpacked which could repack itself and it likely would be faster. Also sparse is missing which likely leads to some extra allocation

@robert3005
Copy link
Member

fwiw we should make them in separate prs

@a10y a10y added the benchmark Run benchmarks on this branch label Sep 12, 2024
@a10y a10y marked this pull request as ready for review September 12, 2024 12:30
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 12, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex benchmarks

Benchmark suite Current: d5ecb80 Previous: 750b9c8 Ratio
tpch_q1/vortex-in-memory-no-pushdown 467629937 ns/iter (± 3750524) 482302101 ns/iter (± 5129904) 0.97
tpch_q1/vortex-in-memory-pushdown 515046187 ns/iter (± 1745901) 527534072 ns/iter (± 3463267) 0.98
tpch_q1/arrow 454002116 ns/iter (± 2291129) 451801735 ns/iter (± 3758415) 1.00
tpch_q1/parquet 665259754 ns/iter (± 3313569) 661718171 ns/iter (± 2674937) 1.01
tpch_q1/vortex-file-compressed 610874701 ns/iter (± 2392209) 621200118 ns/iter (± 6749288) 0.98
tpch_q1/vortex-file-uncompressed 633865011 ns/iter (± 2741863) 592370294 ns/iter (± 8018622) 1.07
tpch_q2/vortex-in-memory-no-pushdown 144574917 ns/iter (± 990792) 160802521 ns/iter (± 4678476) 0.90
tpch_q2/vortex-in-memory-pushdown 145267883 ns/iter (± 876640) 159124621 ns/iter (± 3511202) 0.91
tpch_q2/arrow 122184510 ns/iter (± 572539) 127992668 ns/iter (± 614847) 0.95
tpch_q2/parquet 158407129 ns/iter (± 845449) 167523912 ns/iter (± 976972) 0.95
tpch_q2/vortex-file-compressed 154158990 ns/iter (± 731391) 170749370 ns/iter (± 1095064) 0.90
tpch_q2/vortex-file-uncompressed 164798709 ns/iter (± 1537024) 176352506 ns/iter (± 1861263) 0.93
tpch_q3/vortex-in-memory-no-pushdown 154082645 ns/iter (± 945209) 164894661 ns/iter (± 1688574) 0.93
tpch_q3/vortex-in-memory-pushdown 182599172 ns/iter (± 2140747) 204285819 ns/iter (± 1733504) 0.89
tpch_q3/arrow 148805278 ns/iter (± 1268268) 163147591 ns/iter (± 2200349) 0.91
tpch_q3/parquet 339122776 ns/iter (± 2983826) 364056632 ns/iter (± 3161647) 0.93
tpch_q3/vortex-file-compressed 600109967 ns/iter (± 2475613) 238624885 ns/iter (± 3477685) 2.51
tpch_q3/vortex-file-uncompressed 346678665 ns/iter (± 4154641) 264574306 ns/iter (± 4886537) 1.31
tpch_q4/vortex-in-memory-no-pushdown 116713076 ns/iter (± 680431) 119477803 ns/iter (± 1823670) 0.98
tpch_q4/vortex-in-memory-pushdown 140401608 ns/iter (± 1067465) 148794718 ns/iter (± 4839612) 0.94
tpch_q4/arrow 112121350 ns/iter (± 969314) 126580600 ns/iter (± 3893440) 0.89
tpch_q4/parquet 223556527 ns/iter (± 2134392) 233726536 ns/iter (± 2398398) 0.96
tpch_q4/vortex-file-compressed 608122542 ns/iter (± 1411424) 214291423 ns/iter (± 4481945) 2.84
tpch_q4/vortex-file-uncompressed 301939005 ns/iter (± 3284469) 223841277 ns/iter (± 4788043) 1.35
tpch_q5/vortex-in-memory-no-pushdown 300395241 ns/iter (± 2339398) 331680262 ns/iter (± 3094149) 0.91
tpch_q5/vortex-in-memory-pushdown 314484406 ns/iter (± 2458080) 352998734 ns/iter (± 3928054) 0.89
tpch_q5/arrow 296284099 ns/iter (± 4279535) 322701287 ns/iter (± 4014088) 0.92
tpch_q5/parquet 456248488 ns/iter (± 3726847) 491143044 ns/iter (± 5029280) 0.93
tpch_q5/vortex-file-compressed 346751667 ns/iter (± 4316017) 373255236 ns/iter (± 8149263) 0.93
tpch_q5/vortex-file-uncompressed 354653588 ns/iter (± 6344729) 370011144 ns/iter (± 7597998) 0.96
tpch_q6/vortex-in-memory-no-pushdown 41564617 ns/iter (± 702737) 45597608 ns/iter (± 395126) 0.91
tpch_q6/vortex-in-memory-pushdown 91211372 ns/iter (± 453285) 94817690 ns/iter (± 1631882) 0.96
tpch_q6/arrow 37046230 ns/iter (± 687907) 39935601 ns/iter (± 153591) 0.93
tpch_q6/parquet 154464040 ns/iter (± 1078498) 158515552 ns/iter (± 1076605) 0.97
tpch_q6/vortex-file-compressed 77106628 ns/iter (± 349019) 82131752 ns/iter (± 447633) 0.94
tpch_q6/vortex-file-uncompressed 166363477 ns/iter (± 1183907) 164012224 ns/iter (± 1392620) 1.01
tpch_q7/vortex-in-memory-no-pushdown 570770557 ns/iter (± 5628418) 628570919 ns/iter (± 13993824) 0.91
tpch_q7/vortex-in-memory-pushdown 618735527 ns/iter (± 6725917) 685014994 ns/iter (± 4076828) 0.90
tpch_q7/arrow 555973167 ns/iter (± 6869244) 591575674 ns/iter (± 9570461) 0.94
tpch_q7/parquet 733884143 ns/iter (± 7867692) 766331481 ns/iter (± 7275189) 0.96
tpch_q7/vortex-file-compressed 856993945 ns/iter (± 4929339) 664591921 ns/iter (± 7273788) 1.29
tpch_q7/vortex-file-uncompressed 740677757 ns/iter (± 5450486) 736981022 ns/iter (± 11454205) 1.01
tpch_q8/vortex-in-memory-no-pushdown 210734148 ns/iter (± 1364876) 216322787 ns/iter (± 1450325) 0.97
tpch_q8/vortex-in-memory-pushdown 222583044 ns/iter (± 896335) 225993479 ns/iter (± 1636984) 0.98
tpch_q8/arrow 210921272 ns/iter (± 691187) 213002888 ns/iter (± 798766) 0.99
tpch_q8/parquet 481543044 ns/iter (± 2300449) 480810962 ns/iter (± 5440907) 1.00
tpch_q8/vortex-file-compressed 269483461 ns/iter (± 4783378) 250163486 ns/iter (± 533009) 1.08
tpch_q8/vortex-file-uncompressed 292301158 ns/iter (± 9043288) 289075003 ns/iter (± 2810625) 1.01
tpch_q9/vortex-in-memory-no-pushdown 405813821 ns/iter (± 2341096) 411252124 ns/iter (± 5191902) 0.99
tpch_q9/vortex-in-memory-pushdown 406254784 ns/iter (± 2090194) 402404864 ns/iter (± 10300688) 1.01
tpch_q9/arrow 401193441 ns/iter (± 3486194) 407465021 ns/iter (± 4936156) 0.98
tpch_q9/parquet 702975198 ns/iter (± 3508374) 690287931 ns/iter (± 2228853) 1.02
tpch_q9/vortex-file-compressed 454991563 ns/iter (± 4574292) 448813755 ns/iter (± 3147241) 1.01
tpch_q9/vortex-file-uncompressed 487958941 ns/iter (± 3139067) 493338488 ns/iter (± 3770312) 0.99
tpch_q10/vortex-in-memory-no-pushdown 240534797 ns/iter (± 1106459) 237008689 ns/iter (± 999451) 1.01
tpch_q10/vortex-in-memory-pushdown 263164189 ns/iter (± 1948364) 269284156 ns/iter (± 2766466) 0.98
tpch_q10/arrow 232776848 ns/iter (± 1296696) 232630226 ns/iter (± 1125035) 1.00
tpch_q10/parquet 491261591 ns/iter (± 3634032) 491838979 ns/iter (± 2172643) 1.00
tpch_q10/vortex-file-compressed 599147762 ns/iter (± 2913480) 438008551 ns/iter (± 1560277) 1.37
tpch_q10/vortex-file-uncompressed 397099274 ns/iter (± 3897509) 342085663 ns/iter (± 1049070) 1.16
tpch_q11/vortex-in-memory-no-pushdown 220454180 ns/iter (± 2158905) 228213202 ns/iter (± 1929879) 0.97
tpch_q11/vortex-in-memory-pushdown 218691415 ns/iter (± 2299972) 227285629 ns/iter (± 1260022) 0.96
tpch_q11/arrow 176616687 ns/iter (± 1047656) 181955973 ns/iter (± 481860) 0.97
tpch_q11/parquet 190341853 ns/iter (± 1991299) 190665473 ns/iter (± 1014961) 1.00
tpch_q11/vortex-file-compressed 232418289 ns/iter (± 1398293) 234354109 ns/iter (± 2669234) 0.99
tpch_q11/vortex-file-uncompressed 238199675 ns/iter (± 2755331) 236021473 ns/iter (± 2076840) 1.01
tpch_q12/vortex-in-memory-no-pushdown 178888983 ns/iter (± 564729) 181725183 ns/iter (± 665937) 0.98
tpch_q12/vortex-in-memory-pushdown 257220480 ns/iter (± 1246588) 245639824 ns/iter (± 1117857) 1.05
tpch_q12/arrow 168685105 ns/iter (± 435972) 167848416 ns/iter (± 951860) 1.00
tpch_q12/parquet 358077917 ns/iter (± 2317907) 355743248 ns/iter (± 652283) 1.01
tpch_q12/vortex-file-compressed 577060066 ns/iter (± 2253366) 583693695 ns/iter (± 3524490) 0.99
tpch_q12/vortex-file-uncompressed 353753839 ns/iter (± 1158424) 345717904 ns/iter (± 982840) 1.02
tpch_q13/vortex-in-memory-no-pushdown 233378060 ns/iter (± 6792292) 212026022 ns/iter (± 2139385) 1.10
tpch_q13/vortex-in-memory-pushdown 225897062 ns/iter (± 2687703) 215199549 ns/iter (± 2801269) 1.05
tpch_q13/arrow 220008054 ns/iter (± 3476872) 216288457 ns/iter (± 7582246) 1.02
tpch_q13/parquet 365108165 ns/iter (± 8761788) 339023424 ns/iter (± 4982544) 1.08
tpch_q13/vortex-file-compressed 262370823 ns/iter (± 5771851) 242532188 ns/iter (± 3105220) 1.08
tpch_q13/vortex-file-uncompressed 253289040 ns/iter (± 5878713) 230206578 ns/iter (± 4731127) 1.10
tpch_q14/vortex-in-memory-no-pushdown 39976992 ns/iter (± 402331) 40066809 ns/iter (± 123691) 1.00
tpch_q14/vortex-in-memory-pushdown 87308864 ns/iter (± 536472) 88371331 ns/iter (± 993349) 0.99
tpch_q14/arrow 41047857 ns/iter (± 171059) 39916154 ns/iter (± 413480) 1.03
tpch_q14/parquet 222397328 ns/iter (± 984220) 219644722 ns/iter (± 1478739) 1.01
tpch_q14/vortex-file-compressed 90954221 ns/iter (± 1205267) 84406219 ns/iter (± 549154) 1.08
tpch_q14/vortex-file-uncompressed 145316171 ns/iter (± 3272870) 132056672 ns/iter (± 2526723) 1.10
tpch_q15/vortex-in-memory-no-pushdown 71163237 ns/iter (± 568725) 70240367 ns/iter (± 1211358) 1.01
tpch_q15/vortex-in-memory-pushdown 125755999 ns/iter (± 1421365) 120877627 ns/iter (± 662063) 1.04
tpch_q15/arrow 67457581 ns/iter (± 563588) 65814738 ns/iter (± 745743) 1.02
tpch_q15/parquet 295049250 ns/iter (± 2104210) 292964357 ns/iter (± 4176947) 1.01
tpch_q15/vortex-file-compressed 160178907 ns/iter (± 1035165) 149166262 ns/iter (± 3739034) 1.07
tpch_q15/vortex-file-uncompressed 280301533 ns/iter (± 3911470) 245576987 ns/iter (± 4228385) 1.14
tpch_q16/vortex-in-memory-no-pushdown 120676960 ns/iter (± 607001) 121362328 ns/iter (± 501981) 0.99
tpch_q16/vortex-in-memory-pushdown 125649150 ns/iter (± 652981) 125642905 ns/iter (± 739682) 1.00
tpch_q16/arrow 105605122 ns/iter (± 1175945) 106264398 ns/iter (± 1435624) 0.99
tpch_q16/parquet 122493799 ns/iter (± 590708) 120515396 ns/iter (± 330045) 1.02
tpch_q16/vortex-file-compressed 137353235 ns/iter (± 669637) 135440640 ns/iter (± 2489773) 1.01
tpch_q16/vortex-file-uncompressed 136512392 ns/iter (± 797010) 133236531 ns/iter (± 804638) 1.02
tpch_q17/vortex-in-memory-no-pushdown 680071206 ns/iter (± 5626853) 633750402 ns/iter (± 21999972) 1.07
tpch_q17/vortex-in-memory-pushdown 699765651 ns/iter (± 16552381) 644886261 ns/iter (± 28816494) 1.09
tpch_q17/arrow 614163957 ns/iter (± 15290848) 604794412 ns/iter (± 19325686) 1.02
tpch_q17/parquet 600504141 ns/iter (± 3375464) 591494702 ns/iter (± 5886351) 1.02
tpch_q17/vortex-file-compressed 616490641 ns/iter (± 6109550) 618751453 ns/iter (± 5094786) 1.00
tpch_q17/vortex-file-uncompressed 678439856 ns/iter (± 5153733) 662045625 ns/iter (± 5462241) 1.02
tpch_q18/vortex-in-memory-no-pushdown 1102704467 ns/iter (± 10488831) 1107670912 ns/iter (± 28124407) 1.00
tpch_q18/vortex-in-memory-pushdown 1102693920 ns/iter (± 18019955) 1146350113 ns/iter (± 25935429) 0.96
tpch_q18/arrow 1091548880 ns/iter (± 11287668) 1088592199 ns/iter (± 13463439) 1.00
tpch_q18/parquet 1269701350 ns/iter (± 9244969) 1295921800 ns/iter (± 29563581) 0.98
tpch_q18/vortex-file-compressed 1137231576 ns/iter (± 14509774) 1104461804 ns/iter (± 12804403) 1.03
tpch_q18/vortex-file-uncompressed 1199433179 ns/iter (± 13902541) 1142531705 ns/iter (± 9323161) 1.05
tpch_q19/vortex-in-memory-no-pushdown 172419237 ns/iter (± 558277) 173585771 ns/iter (± 474023) 0.99
tpch_q19/vortex-in-memory-pushdown 255865266 ns/iter (± 777800) 263934184 ns/iter (± 1443569) 0.97
tpch_q19/arrow 158947649 ns/iter (± 563373) 158169603 ns/iter (± 605941) 1.00
tpch_q19/parquet 482869625 ns/iter (± 2182275) 485808293 ns/iter (± 3098680) 0.99
tpch_q19/vortex-file-compressed 739251560 ns/iter (± 3891957) 748242129 ns/iter (± 1861891) 0.99
tpch_q19/vortex-file-uncompressed 361335402 ns/iter (± 1566759) 370691529 ns/iter (± 2848285) 0.97
tpch_q20/vortex-in-memory-no-pushdown 276953400 ns/iter (± 4328394) 271361202 ns/iter (± 2598718) 1.02
tpch_q20/vortex-in-memory-pushdown 296899338 ns/iter (± 2968668) 298142045 ns/iter (± 3138390) 1.00
tpch_q20/arrow 253871397 ns/iter (± 4922124) 268988087 ns/iter (± 5230411) 0.94
tpch_q20/parquet 368063779 ns/iter (± 5920070) 376959670 ns/iter (± 2535342) 0.98
tpch_q20/vortex-file-compressed 337625170 ns/iter (± 3942064) 331887989 ns/iter (± 5057692) 1.02
tpch_q20/vortex-file-uncompressed 419819929 ns/iter (± 2448535) 387430930 ns/iter (± 3744062) 1.08
tpch_q21/vortex-in-memory-no-pushdown 881470571 ns/iter (± 5570260) 860034433 ns/iter (± 6315368) 1.02
tpch_q21/vortex-in-memory-pushdown 912642083 ns/iter (± 12158266) 894832538 ns/iter (± 2774021) 1.02
tpch_q21/arrow 873175820 ns/iter (± 4749117) 860412313 ns/iter (± 4835728) 1.01
tpch_q21/parquet 1033653879 ns/iter (± 10326375) 1031986215 ns/iter (± 15406311) 1.00
tpch_q21/vortex-file-compressed 1930001242 ns/iter (± 9281258) 1055005493 ns/iter (± 8739734) 1.83
tpch_q21/vortex-file-uncompressed 1316949047 ns/iter (± 3918408) 1091486875 ns/iter (± 12444901) 1.21
tpch_q22/vortex-in-memory-no-pushdown 96238393 ns/iter (± 774157) 96963365 ns/iter (± 1134408) 0.99
tpch_q22/vortex-in-memory-pushdown 96618668 ns/iter (± 1183404) 97059184 ns/iter (± 835576) 1.00
tpch_q22/arrow 66826555 ns/iter (± 2434487) 66658773 ns/iter (± 251660) 1.00
tpch_q22/parquet 96074157 ns/iter (± 5283274) 95223939 ns/iter (± 748371) 1.01
tpch_q22/vortex-file-compressed 103282488 ns/iter (± 460786) 101715042 ns/iter (± 409685) 1.02
tpch_q22/vortex-file-uncompressed 110932476 ns/iter (± 603983) 109602568 ns/iter (± 584164) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@robert3005
Copy link
Member

@a10y you have a bug somewhere

Failed to get min_index from SparseArray: index 0 out of bounds from 0 to 0
Backtrace:
disabled backtrace


assert!(array.patches().is_some());

let patch_indices = SparseArray::try_from(array.patches().unwrap())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should just return SparseArray directly from BPA::patches. I know we always return Array, but turning it into a SparseArray is pretty much the first thing we always do with it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, I think we should handle non sparse patches. I will work on that

@a10y a10y added benchmark Run benchmarks on this branch labels Sep 12, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 12, 2024
Comment on lines +84 to +86
if SparseArray::try_from(parray)?.indices().is_empty() {
vortex_bail!("cannot construct BitPackedArray using patches without indices");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We depend pretty deeply on this being a SparseArray so I figured it should be checked in the constructor, along with the non-empty indices when present thing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, we will want to change it eventually but then we can find all those places

@a10y a10y added the benchmark Run benchmarks on this branch label Sep 12, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 12, 2024
@a10y a10y added the benchmark Run benchmarks on this branch label Sep 13, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 13, 2024
@a10y
Copy link
Contributor Author

a10y commented Sep 13, 2024

Alright benchmark comment is updated. Most queries stayed roughly same, the following got slower, some by a considerable margin:

  • Q1
  • Q3
  • Q4
  • Q7
  • Q10
  • Q15
  • Q16
  • Q21
  • Q22

Will do some profiling investigation in the AM

@a10y
Copy link
Contributor Author

a10y commented Sep 13, 2024

Alright, the issue was with filter_slices, it was creating so many slices that the overhead of slicing was just eating everything else up, especially when it came time to re-canonicalize before passing the RecordBatches back to DF.

Now, filter_slices will instead look at all of the slices and determine which chunks must be sliced, passed through, or dropped completely. I'll kick off a full bench run but for Q16 this saves us ~20% VS what we had on develop.

@a10y a10y added the benchmark Run benchmarks on this branch label Sep 13, 2024
@a10y a10y added benchmark Run benchmarks on this branch and removed benchmark Run benchmarks on this branch labels Sep 13, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 13, 2024
@@ -34,10 +34,6 @@ pub(crate) fn try_canonicalize_chunks(
validity: Validity,
dtype: &DType,
) -> VortexResult<Canonical> {
if chunks.is_empty() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a leftover from a time before dtype was able to be passed into this method, we had needed to examine the first chunk to get the DType but that's no longer the case

@a10y
Copy link
Contributor Author

a10y commented Sep 13, 2024

Ok benchmark run just completed, looks like the 20ms speed up i was seeing locally for Q16 was maybe just test jitter, in the PR run it's roughly same as develop. Most queries are 0-10% faster. A number are considerably slower, this time due to slow take implementation. May try and implement the accelerated search_sorted here as well

@robert3005
Copy link
Member

@a10y we should just merge this and make follow up prs. There's a lot of noise in the runs so comparing to latest develop runs is a bit difficult. Anyway I will review this and we can make another pr?

@a10y
Copy link
Contributor Author

a10y commented Sep 13, 2024

Alright I'm ok with that if you are. We have the issues for tracking already. I just hate to see graphs go up :P

Comment on lines 206 to 214
fn find_chunk_idx(idx: usize, chunk_ends: &[u64]) -> (usize, usize) {
let chunk_id = chunk_ends
.binary_search(&(idx as u64))
.map(|i| if i == chunk_ends.len() { i - 1 } else { i })
.unwrap_or_else(|end| end - 1);
let chunk_offset = idx - (chunk_ends[chunk_id] as usize);

(chunk_id, chunk_offset)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our search_sorted method is defined on primitive arrays and last I checked it wasn't any different (could be different now since 1.81 changed binary_search)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's that the chunk_ends are usually compressed, so decompressing them for every call to find_chunk_idx adds up, along with the scalar/dtype overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. you can do chunk_ends.search_sorted(&idx, SearchSortedSide::Right)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yea, that is probably better than what I am doing. Lemme change

Comment on lines 129 to 132
#[allow(clippy::needless_range_loop)]
for chunk in (start_chunk + 1)..end_chunk {
chunk_filters[chunk] = ChunkFilter::All;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that the clippy suggestion does is avoid bounds checking but it would only show up on large arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that you can do &mut vector[start...end], which I think is both clear and avoids bounds check

Copy link
Member

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I spent some much time looking at weird arrays I know this has a weird behaviour with empty chunks but we can handle it in a followup. Needs adjustment to find_chunk_idx like I did in a previous pr

@robert3005
Copy link
Member

robert3005 commented Sep 15, 2024

I think I meant something else, you can avoid the scalar/array overhead but still use the search sorted logic, i.e. keep find_chunk_idx in chunked array but you can create a mirror of that function over &[u64]. In the current variant you end up with scalar overhead.

The code I linked was to show that our searching logic is implemented for primitive array slices as well as arrays and you need it to handle empty chunks.

@a10y
Copy link
Contributor Author

a10y commented Sep 15, 2024

like that?

@robert3005 robert3005 merged commit 03e03ce into develop Sep 15, 2024
5 checks passed
@robert3005 robert3005 deleted the aduffy/more-filterfn branch September 15, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants