-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf]: Planning time increases linearly with number of stripes #188
Comments
Hi @leoyvens thank you for report. Lowering stripe max row count will increase stripes for table and thus related metadata. Part of planning is to read table stripe metadata to calculate cost for reading columnar tables and there is where difference is. Although it looks significant 11ms vs. 142ms, both times are still milliseconds. We'll discuss if this is something that is on our priority to work on - so marking this as |
Instead of reading the metadata table directly, which seems unusually expensive for a planning step, perhaps the stripe count and row count could be kept as statistics, that is, updated only on VACUUM or ANALYZE. |
So we can understand your use case better, what are you trying to accomplish by making the maximum stripe size smaller? Did doing so have the desired effect (aside from the increased planning time)? |
I am looking to improve the performance of index scans. As you can also see in the OP, the smaller stripe size resulted in a much faster execution time. In my measurements I saw an average speed up of 5x in index scans when using a 10k stripe size instead of the default 150k. Taking a step back, if I'm fetching a single column of a single row using a btree index, there is no fundamental reason why columnar should perform worse than a heap table. At default configurations, heap performs better because its page size is much smaller than the default columnar stripe size, but that's a tuneable parameter and not a fundamental limitation of columnar. In fact, according to the benchmarks I ran, if this issue and the column overfetching issue discussed in #187 are addressed, columnar index scans at a 10k stripe size should become competitive with heap index scans. |
What's wrong?
I noticed this because I have two tables with the same data, but one was built with the default 150k rows per stripe and the other with 10k rows per stripe. The table has about 1 billion rows. Both tables have a BTree index on
vid
.This is an
explain analyze
for the table with 150k rows per stripe:And now for the table with 10k rows per stripe:
Notice how the query planning increased proportionally to the increase in number of stripes, and to an unreasonable 140ms. And the plan isn't even a columnar scan. This can be worked around to some extent by using prepared statements to take advantage of plan caching, but still this is a performance issue.
The text was updated successfully, but these errors were encountered: