Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support convert_to_state for AVG accumulator #11734

Merged
merged 7 commits into from
Aug 12, 2024
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 30, 2024

Note: There are ~20 lines of code in this PR, the rest is docmentation and tests

Which issue does this PR close?

Rationale for this change

To take advantage of the benefits of #11627 a new method must be implemented for each GroupsAccumulator.

At least one ClickBench query (the one in #6937) uses AVG so let's implement that

What changes are included in this PR?

  1. Implement convert_to_state for AVG accumulator

Are these changes tested?

Yes with new unit tests

Performance benchmarks:

Clickbench on the whole looks better

│ QQuery 28    │ 14589.57ms │        15650.15ms │  1.07x slower │
│ QQuery 31    │  1648.84ms │         1351.42ms │ +1.22x faster │
│ QQuery 32    │  7674.18ms │         4546.37ms │ +1.69x faster │
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)           │ 88225.75ms │
│ Total Time (alamb_support_avg)   │ 85177.63ms │
│ Average Time (main_base)         │  2051.76ms │
│ Average Time (alamb_support_avg) │  1980.88ms │
│ Queries Faster                   │          2 │
│ Queries Slower                   │          1 │
│ Queries with No Change           │         40 │
└──────────────────────────────────┴────────────┘

Interestingly my benchmark shows Q31 and Q32 get faster (they both have avg):

SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;

But Q28 gets slower

SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

Details

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_support_avg ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.65ms │            0.66ms │     no change │
│ QQuery 1     │    68.39ms │           69.21ms │     no change │
│ QQuery 2     │   123.19ms │          124.32ms │     no change │
│ QQuery 3     │   130.87ms │          129.52ms │     no change │
│ QQuery 4     │   975.14ms │          956.34ms │     no change │
│ QQuery 5     │  1073.94ms │         1051.31ms │     no change │
│ QQuery 6     │    65.20ms │           65.15ms │     no change │
│ QQuery 7     │    72.73ms │           74.63ms │     no change │
│ QQuery 8     │  1442.03ms │         1426.67ms │     no change │
│ QQuery 9     │  1359.96ms │         1341.42ms │     no change │
│ QQuery 10    │   453.35ms │          451.60ms │     no change │
│ QQuery 11    │   491.08ms │          487.39ms │     no change │
│ QQuery 12    │  1174.58ms │         1163.25ms │     no change │
│ QQuery 13    │  2175.10ms │         2105.75ms │     no change │
│ QQuery 14    │  1613.52ms │         1591.44ms │     no change │
│ QQuery 15    │  1100.45ms │         1090.39ms │     no change │
│ QQuery 16    │  2887.87ms │         2882.07ms │     no change │
│ QQuery 17    │  2821.88ms │         2801.39ms │     no change │
│ QQuery 18    │  5627.08ms │         5535.29ms │     no change │
│ QQuery 19    │   118.97ms │          118.65ms │     no change │
│ QQuery 20    │  1686.34ms │         1653.84ms │     no change │
│ QQuery 21    │  1998.87ms │         2017.09ms │     no change │
│ QQuery 22    │  4835.67ms │         4795.01ms │     no change │
│ QQuery 23    │ 11438.22ms │        11139.93ms │     no change │
│ QQuery 24    │   750.23ms │          754.42ms │     no change │
│ QQuery 25    │   671.80ms │          671.75ms │     no change │
│ QQuery 26    │   827.26ms │          834.02ms │     no change │
│ QQuery 27    │  2530.97ms │         2520.82ms │     no change │
│ QQuery 28    │ 14589.57ms │        15650.15ms │  1.07x slower │
│ QQuery 29    │   573.58ms │          562.79ms │     no change │
│ QQuery 30    │  1299.57ms │         1296.16ms │     no change │
│ QQuery 31    │  1648.84ms │         1351.42ms │ +1.22x faster │
│ QQuery 32    │  7674.18ms │         4546.37ms │ +1.69x faster │
│ QQuery 33    │  5072.36ms │         5086.50ms │     no change │
│ QQuery 34    │  5030.77ms │         5034.60ms │     no change │
│ QQuery 35    │  1854.58ms │         1829.27ms │     no change │
│ QQuery 36    │   320.08ms │          314.68ms │     no change │
│ QQuery 37    │   218.66ms │          218.22ms │     no change │
│ QQuery 38    │   189.78ms │          196.34ms │     no change │
│ QQuery 39    │   977.91ms │          977.56ms │     no change │
│ QQuery 40    │    85.17ms │           85.73ms │     no change │
│ QQuery 41    │    79.63ms │           78.92ms │     no change │
│ QQuery 42    │    95.74ms │           95.59ms │     no change │
└──────────────┴────────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)           │ 88225.75ms │
│ Total Time (alamb_support_avg)   │ 85177.63ms │
│ Average Time (main_base)         │  2051.76ms │
│ Average Time (alamb_support_avg) │  1980.88ms │
│ Queries Faster                   │          2 │
│ Queries Slower                   │          1 │
│ Queries with No Change           │         40 │
└──────────────────────────────────┴────────────┘

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_support_avg ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  247.69ms │          229.35ms │ +1.08x faster │
│ QQuery 2     │  126.01ms │          126.86ms │     no change │
│ QQuery 3     │  125.02ms │          129.29ms │     no change │
│ QQuery 4     │   94.59ms │           88.65ms │ +1.07x faster │
│ QQuery 5     │  172.20ms │          175.13ms │     no change │
│ QQuery 6     │   59.18ms │           59.55ms │     no change │
│ QQuery 7     │  204.27ms │          208.79ms │     no change │
│ QQuery 8     │  156.08ms │          163.42ms │     no change │
│ QQuery 9     │  253.29ms │          254.54ms │     no change │
│ QQuery 10    │  227.42ms │          232.69ms │     no change │
│ QQuery 11    │   99.08ms │           98.47ms │     no change │
│ QQuery 12    │  146.89ms │          138.62ms │ +1.06x faster │
│ QQuery 13    │  290.02ms │          291.07ms │     no change │
│ QQuery 14    │   82.66ms │           81.43ms │     no change │
│ QQuery 15    │  116.98ms │          135.97ms │  1.16x slower │
│ QQuery 16    │   88.83ms │           88.72ms │     no change │
│ QQuery 17    │  232.60ms │          218.48ms │ +1.06x faster │
│ QQuery 18    │  330.29ms │          330.22ms │     no change │
│ QQuery 19    │  149.17ms │          161.32ms │  1.08x slower │
│ QQuery 20    │  139.35ms │          137.98ms │     no change │
│ QQuery 21    │  280.84ms │          263.72ms │ +1.06x faster │
│ QQuery 22    │   65.21ms │           65.46ms │     no change │
└──────────────┴───────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)           │ 3687.65ms │
│ Total Time (alamb_support_avg)   │ 3679.71ms │
│ Average Time (main_base)         │  167.62ms │
│ Average Time (alamb_support_avg) │  167.26ms │
│ Queries Faster                   │         5 │
│ Queries Slower                   │         2 │
│ Queries with No Change           │        15 │
└──────────────────────────────────┴───────────┘

Are there any user-facing changes?

Faster performance

@alamb alamb marked this pull request as draft July 30, 2024 21:48
@github-actions github-actions bot added documentation Improvements or additions to documentation logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Jul 30, 2024
@alamb alamb force-pushed the alamb/support_avg branch from ee5ac1c to a8b5a05 Compare July 31, 2024 11:29
@alamb alamb marked this pull request as ready for review July 31, 2024 11:37
@alamb alamb marked this pull request as draft July 31, 2024 11:37
@alamb alamb force-pushed the alamb/support_avg branch from a8b5a05 to 7020dcf Compare July 31, 2024 12:09
@alamb
Copy link
Contributor Author

alamb commented Jul 31, 2024

There is something strange going on with AVG in this query -- it is giving different answers when convert to state is enabled vs not. Maybe it is due to float rounding, but I am not confident

@korowa
Copy link
Contributor

korowa commented Jul 31, 2024

There is something strange going on with AVG in this query -- it is giving different answers when convert to state is enabled vs not. Maybe it is due to float rounding, but I am not confident

Kind of expected -- result set is sorted by COUNT(*) which equals 2 for only 4 records, and 1 for all other records, so this ordering may be considered as nondeterministic.

I've got different results after two consecutive runs even on main branch (without any skipped aggregation).

@alamb
Copy link
Contributor Author

alamb commented Jul 31, 2024

I debugged this a bit more and I think the issue may be that AVG uses Float64 internally to accumulate the sum, and since this column has giant integers that can't fit into a float64 precisely, the order of operations affects the final output. I will update the test to use a column other than this giant int64 column I think

@korowa
Copy link
Contributor

korowa commented Jul 31, 2024

Oh, the AVG values themselves, got it.

@alamb alamb force-pushed the alamb/support_avg branch from 7020dcf to 42daa94 Compare August 5, 2024 13:41
@github-actions github-actions bot removed documentation Improvements or additions to documentation logical-expr Logical plan and expressions labels Aug 5, 2024
@alamb alamb marked this pull request as ready for review August 5, 2024 13:47
@alamb alamb force-pushed the alamb/support_avg branch from 42daa94 to b60e1aa Compare August 5, 2024 14:58
@alamb alamb marked this pull request as draft August 5, 2024 15:00
@alamb alamb force-pushed the alamb/support_avg branch from b60e1aa to f05c4cd Compare August 5, 2024 16:11
@alamb
Copy link
Contributor Author

alamb commented Aug 5, 2024

This PR is failing like the following when running on benchmarks on TPCH. I think there may be a bug related to types in the intermediates. I will keep debugging

Query 19 iteration 3 took 161.5 ms and returned 1 rows
Query 19 iteration 4 took 171.7 ms and returned 1 rows
Query 19 avg time: 164.29 ms
Error: External(ArrowError(InvalidArgumentError("column types must match schema types, expected Decimal128(25, 2) but fou\
nd Decimal128(38, 10) at column index 2"), None))

Update: turns out it is q18:

(venv) andrewlamb@Andrews-MBP-2:~/Software/datafusion/benchmarks/data/tpch_sf1$ /Users/andrewlamb/Software/datafusion/datafusion-cli/target/debug/datafusion-cli -f ../../queries/q18.sql

Update: filed real issue here #11832

@alamb alamb force-pushed the alamb/support_avg branch from f05c4cd to e3eb80f Compare August 5, 2024 22:06
@alamb alamb force-pushed the alamb/support_avg branch from 0c4df2e to f3bedc0 Compare August 6, 2024 10:58
@alamb alamb marked this pull request as ready for review August 6, 2024 12:17
@korowa
Copy link
Contributor

korowa commented Aug 8, 2024

LGTM, thank you @alamb.

Regarding q28 slowdown from PR description -- I suppose it's not a stable slowdown, and just a result on single benchmark run (since the regexp over Referrer field in the query doesn't seem to produce high enough cardinality to skip partial aggregation)?

@alamb
Copy link
Contributor Author

alamb commented Aug 8, 2024

LGTM, thank you @alamb.

Regarding q28 slowdown from PR description -- I suppose it's not a stable slowdown, and just a result on single benchmark run (since the regexp over Referrer field in the query doesn't seem to produce high enough cardinality to skip partial aggregation)?

That is my understanding too. I also have high hopes that the StringView work will make that query in particualr faster as well

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you

/// │false│ │ │NULL │ │NULL │
/// │false│ │true │ │true │
/// └─────┘ └─────┘ └─────┘
/// array opt_filter output nulls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like output nulls has typo, should be false; true; false; false; fasle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are correct -- thank you for catching that. I fixed it in 149406b

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb

@@ -267,6 +282,20 @@ FROM aggregate_test_100_null GROUP BY c2 ORDER BY c2;
4 11 14
5 8 7

# Test avg for tinyint / float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering why the test is only for tinyint / floats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that AVG the accumulator is already tested elsewhere -- this test is only to exercise the partial aggregate skipping logic

@alamb
Copy link
Contributor Author

alamb commented Aug 12, 2024

Thanks @comphead -- we are making progress here slowly -- but I am pretty stoked to see it 🚀

@alamb alamb merged commit 00ef820 into apache:main Aug 12, 2024
24 checks passed
@Dandandan
Copy link
Contributor

Nice work 🎉

@alamb alamb deleted the alamb/support_avg branch August 14, 2024 19:54
@alamb
Copy link
Contributor Author

alamb commented Aug 14, 2024

I am very excited to get the StringView work (#11752) done and enabled -- and then rerun the clickbench benchmarks again for DataFusion. 🚀

@Rachelint is also working on some good stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of Avg aggregate: implement convert_to_state
5 participants