Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Natural Join Column Qualification Conflict #13774

Open
Lordworms opened this issue Dec 13, 2024 · 3 comments
Open

Natural Join Column Qualification Conflict #13774

Lordworms opened this issue Dec 13, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Lordworms
Copy link
Contributor

Describe the bug

Bug where NATURAL JOIN rejects source table column qualifiers in ORDER BY clause.

To Reproduce

(base) ➜  ~ echo '"id","name","stuff"
1,"aaa",2
2,"bbb",1
3,"cc",1
4,"d",2' > test1.csv
(base) ➜  ~ datafusion-cli
DataFusion CLI v43.0.0
> CREATE EXTERNAL TABLE server
STORED AS CSV
LOCATION 'test1.csv';
0 row(s) fetched.
Elapsed 0.037 seconds.

> create table client(id int, name string, stuff int);
0 row(s) fetched.
Elapsed 0.007 seconds.

> select * from server;
+----+------+-------+
| id | name | stuff |
+----+------+-------+
| 1  | aaa  | 2     |
| 2  | bbb  | 1     |
| 3  | cc   | 1     |
| 4  | d    | 2     |
+----+------+-------+
4 row(s) fetched.
Elapsed 0.021 seconds.

> create table client(id int, name string, stuff int);
Execution error: Table 'client' already exists
> SELECT * FROM server NATURAL JOIN client AS client ORDER BY server.id;
Schema error: No field named server.id. Valid fields are client.id, client.name, client.stuff.

Expected behavior

DataFusion CLI v41.0.0
> CREATE EXTERNAL TABLE server(id int, name string, stuff int)
STORED AS CSV
LOCATION 'test1.csv';
0 row(s) fetched. 
Elapsed 0.005 seconds.

> create table client(id int, name string, stuff int);
0 row(s) fetched. 
Elapsed 0.005 seconds.

> SELECT * FROM server NATURAL JOIN client AS client ORDER BY server.id;
+----+------+-------+
| id | name | stuff |
+----+------+-------+
+----+------+-------+
0 row(s) fetched. 
Elapsed 0.020 seconds.

Additional context

No response

@Lordworms Lordworms added the bug Something isn't working label Dec 13, 2024
@Lordworms
Copy link
Contributor Author

take

@findepi
Copy link
Member

findepi commented Dec 14, 2024

> SELECT * FROM server NATURAL JOIN client AS client ORDER BY server.id;
Schema error: No field named server.id. Valid fields are client.id, client.name, client.stuff.

This is actually correct per SQL spec. NATURAL JOIN is a shorthand for USING and when using USING, the join condition columns are deduplicated and lose original qualification.

In DataFusion dialect of SQL we probably want to support this though, because PostgreSQL supports it

postgres=# create table client(id int, name varchar, stuff int);
CREATE TABLE
postgres=# create table server(id int, name varchar, stuff int);
CREATE TABLE
postgres=# SELECT * FROM server NATURAL JOIN client AS client ORDER BY server.id;
 id | name | stuff
----+------+-------
(0 rows)

... while still deduplicating columns on the output (id and other join columns appear only once)

@Omega359
Copy link
Contributor

This seems like a perfect example of a join query that should be in the slt tests (#13470)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants