-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reported DataFusion performance problem #9148
Comments
Ran this on my M3 Mac and it finished in 144ms andrewlamb@Andrews-MacBook-Pro:~/Downloads$ ./rust_playground
just before df -> 1
reading df -> 6
df aggregation -> 144
results -> Ok([RecordBatch { schema: Schema { fields: [Field { name: "SUM(?table?.trip_time)", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int64>
[
20227776240,
]], row_count: 1 }])
andrewlamb@Andrews-MacBook-Pro:~/Downloads$ When I ran the debug build, it took more like 2 seconds: andrewlamb@Andrews-MacBook-Pro:~/Downloads$ ./rust_playground.debug
just before df -> 6
reading df -> 17
df aggregation -> 1822
results -> Ok([RecordBatch { schema: Schema { fields: [Field { name: "SUM(?table?.trip_time)", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int64>
[
20227776240,
]], row_count: 1 }]) |
So I wonder if the reporter simply didn't run with a release build |
I am going to try this on a linux/less powerful machine |
No, was a simple 'cargo run' with no parameters given. Ok, so this was the reason. |
Ah, got it -- I think you need to run Thanks again for the report @mispp Closing this one down as I think we have found the root cause |
This is consistent on my more limited linux machine too:
|
Describe the bug
Reported in discord by @mispp: https://discord.com/channels/885562378132000778/1166447479609376850/1204163621433639003
So basically the task here is to reproduce the reported performance and see if there is anything wrong or that we could improve
To Reproduce
Original report: https://gist.github.com/mispp/229fdad7d70c8ab974a8f72f4bdfc43c
DataSet: https://d37ci6vzurychx.cloudfront.net/trip-data/fhvhv_tripdata_2023-01.parquet
Cargo.toml
Program:
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: