-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement FilterFn for ChunkedArray #794
Conversation
Heh, turns out the naive implementation is 3x slower than develop, pretty much entirely due to |
Alright so currently what I'm seeing is for Q15 vortex-compressed-file:
The new flame graph is showing ~6% less time spent in filter, which is what we'd expect roughly just going off the numbers |
Right, this can go lower if all the children also implement filter. Good to know that we need to unpack values |
EDIT: about half do. The missing ones here are FoR and Extension |
FoR has it. Notable omission is Bitpacked which could repack itself and it likely would be faster. Also sparse is missing which likely leads to some extra allocation |
fwiw we should make them in separate prs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex benchmarks
Benchmark suite | Current: d5ecb80 | Previous: 750b9c8 | Ratio |
---|---|---|---|
tpch_q1/vortex-in-memory-no-pushdown |
467629937 ns/iter (± 3750524 ) |
482302101 ns/iter (± 5129904 ) |
0.97 |
tpch_q1/vortex-in-memory-pushdown |
515046187 ns/iter (± 1745901 ) |
527534072 ns/iter (± 3463267 ) |
0.98 |
tpch_q1/arrow |
454002116 ns/iter (± 2291129 ) |
451801735 ns/iter (± 3758415 ) |
1.00 |
tpch_q1/parquet |
665259754 ns/iter (± 3313569 ) |
661718171 ns/iter (± 2674937 ) |
1.01 |
tpch_q1/vortex-file-compressed |
610874701 ns/iter (± 2392209 ) |
621200118 ns/iter (± 6749288 ) |
0.98 |
tpch_q1/vortex-file-uncompressed |
633865011 ns/iter (± 2741863 ) |
592370294 ns/iter (± 8018622 ) |
1.07 |
tpch_q2/vortex-in-memory-no-pushdown |
144574917 ns/iter (± 990792 ) |
160802521 ns/iter (± 4678476 ) |
0.90 |
tpch_q2/vortex-in-memory-pushdown |
145267883 ns/iter (± 876640 ) |
159124621 ns/iter (± 3511202 ) |
0.91 |
tpch_q2/arrow |
122184510 ns/iter (± 572539 ) |
127992668 ns/iter (± 614847 ) |
0.95 |
tpch_q2/parquet |
158407129 ns/iter (± 845449 ) |
167523912 ns/iter (± 976972 ) |
0.95 |
tpch_q2/vortex-file-compressed |
154158990 ns/iter (± 731391 ) |
170749370 ns/iter (± 1095064 ) |
0.90 |
tpch_q2/vortex-file-uncompressed |
164798709 ns/iter (± 1537024 ) |
176352506 ns/iter (± 1861263 ) |
0.93 |
tpch_q3/vortex-in-memory-no-pushdown |
154082645 ns/iter (± 945209 ) |
164894661 ns/iter (± 1688574 ) |
0.93 |
tpch_q3/vortex-in-memory-pushdown |
182599172 ns/iter (± 2140747 ) |
204285819 ns/iter (± 1733504 ) |
0.89 |
tpch_q3/arrow |
148805278 ns/iter (± 1268268 ) |
163147591 ns/iter (± 2200349 ) |
0.91 |
tpch_q3/parquet |
339122776 ns/iter (± 2983826 ) |
364056632 ns/iter (± 3161647 ) |
0.93 |
tpch_q3/vortex-file-compressed |
600109967 ns/iter (± 2475613 ) |
238624885 ns/iter (± 3477685 ) |
2.51 |
tpch_q3/vortex-file-uncompressed |
346678665 ns/iter (± 4154641 ) |
264574306 ns/iter (± 4886537 ) |
1.31 |
tpch_q4/vortex-in-memory-no-pushdown |
116713076 ns/iter (± 680431 ) |
119477803 ns/iter (± 1823670 ) |
0.98 |
tpch_q4/vortex-in-memory-pushdown |
140401608 ns/iter (± 1067465 ) |
148794718 ns/iter (± 4839612 ) |
0.94 |
tpch_q4/arrow |
112121350 ns/iter (± 969314 ) |
126580600 ns/iter (± 3893440 ) |
0.89 |
tpch_q4/parquet |
223556527 ns/iter (± 2134392 ) |
233726536 ns/iter (± 2398398 ) |
0.96 |
tpch_q4/vortex-file-compressed |
608122542 ns/iter (± 1411424 ) |
214291423 ns/iter (± 4481945 ) |
2.84 |
tpch_q4/vortex-file-uncompressed |
301939005 ns/iter (± 3284469 ) |
223841277 ns/iter (± 4788043 ) |
1.35 |
tpch_q5/vortex-in-memory-no-pushdown |
300395241 ns/iter (± 2339398 ) |
331680262 ns/iter (± 3094149 ) |
0.91 |
tpch_q5/vortex-in-memory-pushdown |
314484406 ns/iter (± 2458080 ) |
352998734 ns/iter (± 3928054 ) |
0.89 |
tpch_q5/arrow |
296284099 ns/iter (± 4279535 ) |
322701287 ns/iter (± 4014088 ) |
0.92 |
tpch_q5/parquet |
456248488 ns/iter (± 3726847 ) |
491143044 ns/iter (± 5029280 ) |
0.93 |
tpch_q5/vortex-file-compressed |
346751667 ns/iter (± 4316017 ) |
373255236 ns/iter (± 8149263 ) |
0.93 |
tpch_q5/vortex-file-uncompressed |
354653588 ns/iter (± 6344729 ) |
370011144 ns/iter (± 7597998 ) |
0.96 |
tpch_q6/vortex-in-memory-no-pushdown |
41564617 ns/iter (± 702737 ) |
45597608 ns/iter (± 395126 ) |
0.91 |
tpch_q6/vortex-in-memory-pushdown |
91211372 ns/iter (± 453285 ) |
94817690 ns/iter (± 1631882 ) |
0.96 |
tpch_q6/arrow |
37046230 ns/iter (± 687907 ) |
39935601 ns/iter (± 153591 ) |
0.93 |
tpch_q6/parquet |
154464040 ns/iter (± 1078498 ) |
158515552 ns/iter (± 1076605 ) |
0.97 |
tpch_q6/vortex-file-compressed |
77106628 ns/iter (± 349019 ) |
82131752 ns/iter (± 447633 ) |
0.94 |
tpch_q6/vortex-file-uncompressed |
166363477 ns/iter (± 1183907 ) |
164012224 ns/iter (± 1392620 ) |
1.01 |
tpch_q7/vortex-in-memory-no-pushdown |
570770557 ns/iter (± 5628418 ) |
628570919 ns/iter (± 13993824 ) |
0.91 |
tpch_q7/vortex-in-memory-pushdown |
618735527 ns/iter (± 6725917 ) |
685014994 ns/iter (± 4076828 ) |
0.90 |
tpch_q7/arrow |
555973167 ns/iter (± 6869244 ) |
591575674 ns/iter (± 9570461 ) |
0.94 |
tpch_q7/parquet |
733884143 ns/iter (± 7867692 ) |
766331481 ns/iter (± 7275189 ) |
0.96 |
tpch_q7/vortex-file-compressed |
856993945 ns/iter (± 4929339 ) |
664591921 ns/iter (± 7273788 ) |
1.29 |
tpch_q7/vortex-file-uncompressed |
740677757 ns/iter (± 5450486 ) |
736981022 ns/iter (± 11454205 ) |
1.01 |
tpch_q8/vortex-in-memory-no-pushdown |
210734148 ns/iter (± 1364876 ) |
216322787 ns/iter (± 1450325 ) |
0.97 |
tpch_q8/vortex-in-memory-pushdown |
222583044 ns/iter (± 896335 ) |
225993479 ns/iter (± 1636984 ) |
0.98 |
tpch_q8/arrow |
210921272 ns/iter (± 691187 ) |
213002888 ns/iter (± 798766 ) |
0.99 |
tpch_q8/parquet |
481543044 ns/iter (± 2300449 ) |
480810962 ns/iter (± 5440907 ) |
1.00 |
tpch_q8/vortex-file-compressed |
269483461 ns/iter (± 4783378 ) |
250163486 ns/iter (± 533009 ) |
1.08 |
tpch_q8/vortex-file-uncompressed |
292301158 ns/iter (± 9043288 ) |
289075003 ns/iter (± 2810625 ) |
1.01 |
tpch_q9/vortex-in-memory-no-pushdown |
405813821 ns/iter (± 2341096 ) |
411252124 ns/iter (± 5191902 ) |
0.99 |
tpch_q9/vortex-in-memory-pushdown |
406254784 ns/iter (± 2090194 ) |
402404864 ns/iter (± 10300688 ) |
1.01 |
tpch_q9/arrow |
401193441 ns/iter (± 3486194 ) |
407465021 ns/iter (± 4936156 ) |
0.98 |
tpch_q9/parquet |
702975198 ns/iter (± 3508374 ) |
690287931 ns/iter (± 2228853 ) |
1.02 |
tpch_q9/vortex-file-compressed |
454991563 ns/iter (± 4574292 ) |
448813755 ns/iter (± 3147241 ) |
1.01 |
tpch_q9/vortex-file-uncompressed |
487958941 ns/iter (± 3139067 ) |
493338488 ns/iter (± 3770312 ) |
0.99 |
tpch_q10/vortex-in-memory-no-pushdown |
240534797 ns/iter (± 1106459 ) |
237008689 ns/iter (± 999451 ) |
1.01 |
tpch_q10/vortex-in-memory-pushdown |
263164189 ns/iter (± 1948364 ) |
269284156 ns/iter (± 2766466 ) |
0.98 |
tpch_q10/arrow |
232776848 ns/iter (± 1296696 ) |
232630226 ns/iter (± 1125035 ) |
1.00 |
tpch_q10/parquet |
491261591 ns/iter (± 3634032 ) |
491838979 ns/iter (± 2172643 ) |
1.00 |
tpch_q10/vortex-file-compressed |
599147762 ns/iter (± 2913480 ) |
438008551 ns/iter (± 1560277 ) |
1.37 |
tpch_q10/vortex-file-uncompressed |
397099274 ns/iter (± 3897509 ) |
342085663 ns/iter (± 1049070 ) |
1.16 |
tpch_q11/vortex-in-memory-no-pushdown |
220454180 ns/iter (± 2158905 ) |
228213202 ns/iter (± 1929879 ) |
0.97 |
tpch_q11/vortex-in-memory-pushdown |
218691415 ns/iter (± 2299972 ) |
227285629 ns/iter (± 1260022 ) |
0.96 |
tpch_q11/arrow |
176616687 ns/iter (± 1047656 ) |
181955973 ns/iter (± 481860 ) |
0.97 |
tpch_q11/parquet |
190341853 ns/iter (± 1991299 ) |
190665473 ns/iter (± 1014961 ) |
1.00 |
tpch_q11/vortex-file-compressed |
232418289 ns/iter (± 1398293 ) |
234354109 ns/iter (± 2669234 ) |
0.99 |
tpch_q11/vortex-file-uncompressed |
238199675 ns/iter (± 2755331 ) |
236021473 ns/iter (± 2076840 ) |
1.01 |
tpch_q12/vortex-in-memory-no-pushdown |
178888983 ns/iter (± 564729 ) |
181725183 ns/iter (± 665937 ) |
0.98 |
tpch_q12/vortex-in-memory-pushdown |
257220480 ns/iter (± 1246588 ) |
245639824 ns/iter (± 1117857 ) |
1.05 |
tpch_q12/arrow |
168685105 ns/iter (± 435972 ) |
167848416 ns/iter (± 951860 ) |
1.00 |
tpch_q12/parquet |
358077917 ns/iter (± 2317907 ) |
355743248 ns/iter (± 652283 ) |
1.01 |
tpch_q12/vortex-file-compressed |
577060066 ns/iter (± 2253366 ) |
583693695 ns/iter (± 3524490 ) |
0.99 |
tpch_q12/vortex-file-uncompressed |
353753839 ns/iter (± 1158424 ) |
345717904 ns/iter (± 982840 ) |
1.02 |
tpch_q13/vortex-in-memory-no-pushdown |
233378060 ns/iter (± 6792292 ) |
212026022 ns/iter (± 2139385 ) |
1.10 |
tpch_q13/vortex-in-memory-pushdown |
225897062 ns/iter (± 2687703 ) |
215199549 ns/iter (± 2801269 ) |
1.05 |
tpch_q13/arrow |
220008054 ns/iter (± 3476872 ) |
216288457 ns/iter (± 7582246 ) |
1.02 |
tpch_q13/parquet |
365108165 ns/iter (± 8761788 ) |
339023424 ns/iter (± 4982544 ) |
1.08 |
tpch_q13/vortex-file-compressed |
262370823 ns/iter (± 5771851 ) |
242532188 ns/iter (± 3105220 ) |
1.08 |
tpch_q13/vortex-file-uncompressed |
253289040 ns/iter (± 5878713 ) |
230206578 ns/iter (± 4731127 ) |
1.10 |
tpch_q14/vortex-in-memory-no-pushdown |
39976992 ns/iter (± 402331 ) |
40066809 ns/iter (± 123691 ) |
1.00 |
tpch_q14/vortex-in-memory-pushdown |
87308864 ns/iter (± 536472 ) |
88371331 ns/iter (± 993349 ) |
0.99 |
tpch_q14/arrow |
41047857 ns/iter (± 171059 ) |
39916154 ns/iter (± 413480 ) |
1.03 |
tpch_q14/parquet |
222397328 ns/iter (± 984220 ) |
219644722 ns/iter (± 1478739 ) |
1.01 |
tpch_q14/vortex-file-compressed |
90954221 ns/iter (± 1205267 ) |
84406219 ns/iter (± 549154 ) |
1.08 |
tpch_q14/vortex-file-uncompressed |
145316171 ns/iter (± 3272870 ) |
132056672 ns/iter (± 2526723 ) |
1.10 |
tpch_q15/vortex-in-memory-no-pushdown |
71163237 ns/iter (± 568725 ) |
70240367 ns/iter (± 1211358 ) |
1.01 |
tpch_q15/vortex-in-memory-pushdown |
125755999 ns/iter (± 1421365 ) |
120877627 ns/iter (± 662063 ) |
1.04 |
tpch_q15/arrow |
67457581 ns/iter (± 563588 ) |
65814738 ns/iter (± 745743 ) |
1.02 |
tpch_q15/parquet |
295049250 ns/iter (± 2104210 ) |
292964357 ns/iter (± 4176947 ) |
1.01 |
tpch_q15/vortex-file-compressed |
160178907 ns/iter (± 1035165 ) |
149166262 ns/iter (± 3739034 ) |
1.07 |
tpch_q15/vortex-file-uncompressed |
280301533 ns/iter (± 3911470 ) |
245576987 ns/iter (± 4228385 ) |
1.14 |
tpch_q16/vortex-in-memory-no-pushdown |
120676960 ns/iter (± 607001 ) |
121362328 ns/iter (± 501981 ) |
0.99 |
tpch_q16/vortex-in-memory-pushdown |
125649150 ns/iter (± 652981 ) |
125642905 ns/iter (± 739682 ) |
1.00 |
tpch_q16/arrow |
105605122 ns/iter (± 1175945 ) |
106264398 ns/iter (± 1435624 ) |
0.99 |
tpch_q16/parquet |
122493799 ns/iter (± 590708 ) |
120515396 ns/iter (± 330045 ) |
1.02 |
tpch_q16/vortex-file-compressed |
137353235 ns/iter (± 669637 ) |
135440640 ns/iter (± 2489773 ) |
1.01 |
tpch_q16/vortex-file-uncompressed |
136512392 ns/iter (± 797010 ) |
133236531 ns/iter (± 804638 ) |
1.02 |
tpch_q17/vortex-in-memory-no-pushdown |
680071206 ns/iter (± 5626853 ) |
633750402 ns/iter (± 21999972 ) |
1.07 |
tpch_q17/vortex-in-memory-pushdown |
699765651 ns/iter (± 16552381 ) |
644886261 ns/iter (± 28816494 ) |
1.09 |
tpch_q17/arrow |
614163957 ns/iter (± 15290848 ) |
604794412 ns/iter (± 19325686 ) |
1.02 |
tpch_q17/parquet |
600504141 ns/iter (± 3375464 ) |
591494702 ns/iter (± 5886351 ) |
1.02 |
tpch_q17/vortex-file-compressed |
616490641 ns/iter (± 6109550 ) |
618751453 ns/iter (± 5094786 ) |
1.00 |
tpch_q17/vortex-file-uncompressed |
678439856 ns/iter (± 5153733 ) |
662045625 ns/iter (± 5462241 ) |
1.02 |
tpch_q18/vortex-in-memory-no-pushdown |
1102704467 ns/iter (± 10488831 ) |
1107670912 ns/iter (± 28124407 ) |
1.00 |
tpch_q18/vortex-in-memory-pushdown |
1102693920 ns/iter (± 18019955 ) |
1146350113 ns/iter (± 25935429 ) |
0.96 |
tpch_q18/arrow |
1091548880 ns/iter (± 11287668 ) |
1088592199 ns/iter (± 13463439 ) |
1.00 |
tpch_q18/parquet |
1269701350 ns/iter (± 9244969 ) |
1295921800 ns/iter (± 29563581 ) |
0.98 |
tpch_q18/vortex-file-compressed |
1137231576 ns/iter (± 14509774 ) |
1104461804 ns/iter (± 12804403 ) |
1.03 |
tpch_q18/vortex-file-uncompressed |
1199433179 ns/iter (± 13902541 ) |
1142531705 ns/iter (± 9323161 ) |
1.05 |
tpch_q19/vortex-in-memory-no-pushdown |
172419237 ns/iter (± 558277 ) |
173585771 ns/iter (± 474023 ) |
0.99 |
tpch_q19/vortex-in-memory-pushdown |
255865266 ns/iter (± 777800 ) |
263934184 ns/iter (± 1443569 ) |
0.97 |
tpch_q19/arrow |
158947649 ns/iter (± 563373 ) |
158169603 ns/iter (± 605941 ) |
1.00 |
tpch_q19/parquet |
482869625 ns/iter (± 2182275 ) |
485808293 ns/iter (± 3098680 ) |
0.99 |
tpch_q19/vortex-file-compressed |
739251560 ns/iter (± 3891957 ) |
748242129 ns/iter (± 1861891 ) |
0.99 |
tpch_q19/vortex-file-uncompressed |
361335402 ns/iter (± 1566759 ) |
370691529 ns/iter (± 2848285 ) |
0.97 |
tpch_q20/vortex-in-memory-no-pushdown |
276953400 ns/iter (± 4328394 ) |
271361202 ns/iter (± 2598718 ) |
1.02 |
tpch_q20/vortex-in-memory-pushdown |
296899338 ns/iter (± 2968668 ) |
298142045 ns/iter (± 3138390 ) |
1.00 |
tpch_q20/arrow |
253871397 ns/iter (± 4922124 ) |
268988087 ns/iter (± 5230411 ) |
0.94 |
tpch_q20/parquet |
368063779 ns/iter (± 5920070 ) |
376959670 ns/iter (± 2535342 ) |
0.98 |
tpch_q20/vortex-file-compressed |
337625170 ns/iter (± 3942064 ) |
331887989 ns/iter (± 5057692 ) |
1.02 |
tpch_q20/vortex-file-uncompressed |
419819929 ns/iter (± 2448535 ) |
387430930 ns/iter (± 3744062 ) |
1.08 |
tpch_q21/vortex-in-memory-no-pushdown |
881470571 ns/iter (± 5570260 ) |
860034433 ns/iter (± 6315368 ) |
1.02 |
tpch_q21/vortex-in-memory-pushdown |
912642083 ns/iter (± 12158266 ) |
894832538 ns/iter (± 2774021 ) |
1.02 |
tpch_q21/arrow |
873175820 ns/iter (± 4749117 ) |
860412313 ns/iter (± 4835728 ) |
1.01 |
tpch_q21/parquet |
1033653879 ns/iter (± 10326375 ) |
1031986215 ns/iter (± 15406311 ) |
1.00 |
tpch_q21/vortex-file-compressed |
1930001242 ns/iter (± 9281258 ) |
1055005493 ns/iter (± 8739734 ) |
1.83 |
tpch_q21/vortex-file-uncompressed |
1316949047 ns/iter (± 3918408 ) |
1091486875 ns/iter (± 12444901 ) |
1.21 |
tpch_q22/vortex-in-memory-no-pushdown |
96238393 ns/iter (± 774157 ) |
96963365 ns/iter (± 1134408 ) |
0.99 |
tpch_q22/vortex-in-memory-pushdown |
96618668 ns/iter (± 1183404 ) |
97059184 ns/iter (± 835576 ) |
1.00 |
tpch_q22/arrow |
66826555 ns/iter (± 2434487 ) |
66658773 ns/iter (± 251660 ) |
1.00 |
tpch_q22/parquet |
96074157 ns/iter (± 5283274 ) |
95223939 ns/iter (± 748371 ) |
1.01 |
tpch_q22/vortex-file-compressed |
103282488 ns/iter (± 460786 ) |
101715042 ns/iter (± 409685 ) |
1.02 |
tpch_q22/vortex-file-uncompressed |
110932476 ns/iter (± 603983 ) |
109602568 ns/iter (± 584164 ) |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
@a10y you have a bug somewhere
|
|
||
assert!(array.patches().is_some()); | ||
|
||
let patch_indices = SparseArray::try_from(array.patches().unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should just return SparseArray
directly from BPA::patches. I know we always return Array, but turning it into a SparseArray is pretty much the first thing we always do with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, I think we should handle non sparse patches. I will work on that
if SparseArray::try_from(parray)?.indices().is_empty() { | ||
vortex_bail!("cannot construct BitPackedArray using patches without indices"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We depend pretty deeply on this being a SparseArray so I figured it should be checked in the constructor, along with the non-empty indices when present thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now, we will want to change it eventually but then we can find all those places
Alright benchmark comment is updated. Most queries stayed roughly same, the following got slower, some by a considerable margin:
Will do some profiling investigation in the AM |
Alright, the issue was with Now, |
@@ -34,10 +34,6 @@ pub(crate) fn try_canonicalize_chunks( | |||
validity: Validity, | |||
dtype: &DType, | |||
) -> VortexResult<Canonical> { | |||
if chunks.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a leftover from a time before dtype
was able to be passed into this method, we had needed to examine the first chunk to get the DType but that's no longer the case
Ok benchmark run just completed, looks like the 20ms speed up i was seeing locally for Q16 was maybe just test jitter, in the PR run it's roughly same as develop. Most queries are 0-10% faster. A number are considerably slower, this time due to slow take implementation. May try and implement the accelerated search_sorted here as well |
@a10y we should just merge this and make follow up prs. There's a lot of noise in the runs so comparing to latest develop runs is a bit difficult. Anyway I will review this and we can make another pr? |
Alright I'm ok with that if you are. We have the issues for tracking already. I just hate to see graphs go up :P |
fn find_chunk_idx(idx: usize, chunk_ends: &[u64]) -> (usize, usize) { | ||
let chunk_id = chunk_ends | ||
.binary_search(&(idx as u64)) | ||
.map(|i| if i == chunk_ends.len() { i - 1 } else { i }) | ||
.unwrap_or_else(|end| end - 1); | ||
let chunk_offset = idx - (chunk_ends[chunk_id] as usize); | ||
|
||
(chunk_id, chunk_offset) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our search_sorted method is defined on primitive arrays and last I checked it wasn't any different (could be different now since 1.81 changed binary_search)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's that the chunk_ends are usually compressed, so decompressing them for every call to find_chunk_idx adds up, along with the scalar/dtype overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that you can do this https://github.com/spiraldb/vortex/blob/develop/vortex-array/src/compute/search_sorted.rs#L276-L279
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. you can do chunk_ends.search_sorted(&idx, SearchSortedSide::Right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/spiraldb/vortex/blob/develop/vortex-array/src/array/chunked/mod.rs#L98-L101 is the right thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yea, that is probably better than what I am doing. Lemme change
#[allow(clippy::needless_range_loop)] | ||
for chunk in (start_chunk + 1)..end_chunk { | ||
chunk_filters[chunk] = ChunkFilter::All; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that the clippy suggestion does is avoid bounds checking but it would only show up on large arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that you can do &mut vector[start...end]
, which I think is both clear and avoids bounds check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I spent some much time looking at weird arrays I know this has a weird behaviour with empty chunks but we can handle it in a followup. Needs adjustment to find_chunk_idx like I did in a previous pr
I think I meant something else, you can avoid the scalar/array overhead but still use the search sorted logic, i.e. keep find_chunk_idx in chunked array but you can create a mirror of that function over &[u64]. In the current variant you end up with scalar overhead. The code I linked was to show that our searching logic is implemented for primitive array slices as well as arrays and you need it to handle empty chunks. |
like that? |
No description provided.