You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The join order chosen for TPCH query 17 is bad making datafusion take much longer to execute the query
To Reproduce
Create Data:
cd arrow-datafusion/benchmarks
./bench.sh data tpch10
Run query with datafusion-cli:
cd arrow-datafusion/benchmarks/data/tpch_sf10
datafusion-cli -c "select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate;"
Here is the query:
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where o_orderkey in (
select l_orderkey
from lineitem
group by l_orderkey
havingsum(l_quantity) >300
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate;
Expected behavior
The input order of HashJoin(1) in the below plan should be swapped
Describe the bug
The join order chosen for TPCH query 17 is bad making datafusion take much longer to execute the query
To Reproduce
Create Data:
cd arrow-datafusion/benchmarks ./bench.sh data tpch10
Run query with datafusion-cli:
Here is the query:
Expected behavior
The input order of
HashJoin(1)
in the below plan should be swappedAdditional context
No response
The text was updated successfully, but these errors were encountered: