Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LEFT JOIN has poor/inconsistent performance #219

Open
muntdan opened this issue Dec 28, 2023 · 5 comments
Open

LEFT JOIN has poor/inconsistent performance #219

muntdan opened this issue Dec 28, 2023 · 5 comments
Labels
enhancement New feature or request needs repro This is missing a way to reproduce the issue.

Comments

@muntdan
Copy link

muntdan commented Dec 28, 2023

What's wrong?

I have 2 CTE and than a UNION ALL.
WITH CTE1 as (SELECT * from Table1 Left Join Table2 where targetid in (1,2,3)), CTE2 as (SELECT col1 from CTE1, left join lateral (select col2 from table2 where targetid =cte1.targetid) on true where point is null ) --this one has 0 ROWS SELECT * FROM CTE1 WHERE point IS NOT NULL UNION ALL SELECT * FROM CTE2

all selects are < 1 second and without UNION ALL separately they work ok.
When UNION ALL is applied a CTE scan operation makes a Index scan on Table2 and query takes > 6 seconds

@muntdan muntdan added the bug Something isn't working label Dec 28, 2023
@wuputah wuputah changed the title [Bug]: CTE and UNION ALL dont work as expected UNION ALL is not optimized Jan 9, 2024
@wuputah wuputah added enhancement New feature or request and removed bug Something isn't working labels Jan 9, 2024
@wuputah
Copy link
Member

wuputah commented Jan 9, 2024

Re-categorizing as a possible performance enhancement, though this sounds like a planner issue? I think you could force a table scan using pg_hint_plan, so perhaps there is a workaround for you.

Please let us know if the query results were inaccurate, in which case it's certainly a bug we want to investigate.

@muntdan
Copy link
Author

muntdan commented Jan 22, 2024

Strange enough this now takes 6 seconds no matter with or without the union so it might be from the join inside the first CTE which separately take ~1 sec but when joined they take 6 seconds

@muntdan muntdan changed the title UNION ALL is not optimized LEFT JOIN has poor/incosistent performance Jan 22, 2024
@wuputah wuputah added the needs repro This is missing a way to reproduce the issue. label Jan 26, 2024
@wuputah
Copy link
Member

wuputah commented Jan 26, 2024

Join performance is highly dependent on a lot of factors; there's nothing specific enough here for us to look into. If you can provide an example/test case we could investigate further.

@muntdan
Copy link
Author

muntdan commented Feb 14, 2024

Here are the scripts to reproduce:

CREATE TABLE ten (id int, tenant text, targetid int);
INSERT INTO ten VALUES (1, 'a', 101),(2, 'a', 102),(3, 'b', 201),(4, 'b', 202),
(5, 'c', 301),(6, 'c', 302),(7, 'd', 401),(8, 'd', 402),(9, 'e', 501),(10, 'e', 502);

CREATE TABLE tp (id int, targetid int, tenant text, datetime timestamp without time zone, speed int,
				CONSTRAINT tp_pkey PRIMARY KEY (id, tenant, datetime) ) PARTITION BY LIST (tenant);
DO $$
DECLARE t text; d timestamp;
BEGIN 
FOR t IN SELECT distinct tenant FROM ten LOOP
	    raise notice 'tp_ten%', t;
        EXECUTE 'CREATE TABLE '|| 'tp_ten' || t || ' PARTITION OF tp FOR VALUES IN  ('''|| t ||''') PARTITION BY RANGE(datetime);';
		FOR d IN SELECT * FROM  generate_series('2023-12-01 00:00:00'::timestamp, '2023-12-31 23:59:59'::timestamp, '1 hour') LOOP
			raise notice 'day_hour:%', TO_CHAR( d, 'DD_HH24' );
			EXECUTE 'CREATE TABLE '|| 'tp_ten' || t || '_' || TO_CHAR( d, 'DD_HH24' ) || ' PARTITION OF ' 
				|| 'tp_ten' || t || ' FOR VALUES FROM (''' || d || ''') TO ('''|| d + INTERVAL '1 hour' ||''') USING columnar;';
		END LOOP;
    END LOOP;
END $$;
		 
INSERT INTO tp 
SELECT rnd.id, ten.targetid,ten.tenant, rnd.datetime, rnd.speed from
(SELECT row_number() over () as id,
generate_series('2023-12-01 00:00:00', '2023-12-31 23:59:59', '1 sec'::interval)::timestamp as datetime,
floor(random() * 100 + 1)::int as speed,
floor(random() * 10 + 1)::int as random) rnd inner join ten on rnd.random=ten.id;
select count(*) from tp --2.678.400
CREATE TABLE ev (id int, targetid int, tenant text, datetime timestamp without time zone, code text,
				CONSTRAINT ev_pkey PRIMARY KEY (id, tenant, datetime) ) PARTITION BY LIST (tenant);
DO $$
DECLARE t text; d timestamp;
BEGIN 
FOR t IN SELECT distinct tenant FROM ten LOOP
	    raise notice 'ev_ten%', t;
        EXECUTE 'CREATE TABLE '|| 'ev_ten' || t || ' PARTITION OF ev FOR VALUES IN  ('''|| t ||''') PARTITION BY RANGE(datetime);';
		FOR d IN SELECT * FROM  generate_series('2023-12-01 00:00:00'::timestamp, '2023-12-31 23:59:59'::timestamp, '1 hour') LOOP
			raise notice 'day_hour:%', TO_CHAR( d, 'DD_HH24' );
			EXECUTE 'CREATE TABLE '|| 'ev_ten' || t || '_' || TO_CHAR( d, 'DD_HH24' ) || ' PARTITION OF ' 
				|| 'ev_ten' || t || ' FOR VALUES FROM (''' || d || ''') TO ('''|| d + INTERVAL '1 hour' ||''') USING columnar;';
		END LOOP;
    END LOOP;
END $$;				
	
INSERT INTO ev 
SELECT row_number() over () as id, t.targetid, t.tenant, t.datetime + floor(random() * 5)::int * interval '1 seconds' ,
(array['start','stop','break','engineon','engineoff'])[floor(random() * 5 + 1)] as code
FROM ( SELECT *, row_number() OVER(ORDER BY id ASC) AS row FROM tp) t
WHERE t.row % 5 = 0;
select count(*) from ev --535.680

--1.4sec:
SELECT tp.targetid, tp.datetime, tp.speed  FROM tp WHERE tp.tenant = 'a' AND tp.datetime between '2023-12-01 00:00:00' AND '2023-12-31 23:59:59' AND tp.targetid IN (102)
--1 sec:
SELECT ev.targetid, ev.datetime, ev.code FROM ev WHERE ev.tenant = 'a' AND ev.datetime between '2023-12-01 00:00:00' AND '2023-12-31 23:59:59' AND ev.targetid IN (102)

--16sec:
   SELECT ev.targetid, ev.datetime, ev.code, tp.speed 
   FROM ev  LEFT JOIN  tp on  ev.targetid = tp.targetid AND ev.datetime = tp.datetime  AND ev.tenant = tp.tenant    
	WHERE ev.tenant = 'a' AND ev.datetime between '2023-12-01 00:00:00' AND '2023-12-31 23:59:59' AND ev.targetid IN (102)
	AND tp.tenant = 'a' AND tp.datetime between '2023-12-01 00:00:00' AND '2023-12-31 23:59:59' AND tp.targetid IN (102)

@muntdan
Copy link
Author

muntdan commented Feb 23, 2024

Do you need any more information from me ?

@muntdan muntdan changed the title LEFT JOIN has poor/incosistent performance LEFT JOIN has poor/inconsistent performance Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs repro This is missing a way to reproduce the issue.
Projects
None yet
Development

No branches or pull requests

2 participants