Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join operation with na_matches="na" returns invalid query for Redshift #1521

Open
krystian8207 opened this issue Jun 25, 2024 · 0 comments · May be fixed by #1522
Open

Join operation with na_matches="na" returns invalid query for Redshift #1521

krystian8207 opened this issue Jun 25, 2024 · 0 comments · May be fixed by #1522

Comments

@krystian8207
Copy link

Defining joins for Redshift (e.g. using left_join) with na_matches="na" argument returns query that includes IS NOT DISTINCT FROM statement which is not supported by the driver.

db <- DBI::dbConnect(
  RPostgres::Redshift(),
  dbname   = Sys.getenv("REDSHIFT_DBNAME"),
  host     = Sys.getenv("REDSHIFT_HOST"),
  port     = Sys.getenv("REDSHIFT_PORT"),
  user     = Sys.getenv("REDSHIFT_USERNAME"),
  password = Sys.getenv("REDSHIFT_PASSWORD")
)
my_tbl <- dplyr::tbl(db, "mytbl")
join_query <- left_join(my_tbl, my_tbl, na_matches = "na")

join_query |> dplyr::show_query()
#> <SQL>
#>   SELECT "mytbl_LHS".*
#>   FROM "mytbl" AS "mytbl_LHS"
#> LEFT JOIN "mytbl" AS "mytbl_RHS"
#> ON (
#>   "mytbl_LHS"."id" IS NOT DISTINCT FROM "mytbl_RHS"."id" 
#> )

Trying to collect the data responds with the following error:

dplyr::collect(join_query)
#> Error in `dplyr::collect()`:
#> ! Failed to collect lazy table.
#> Caused by error:
#> ! Failed to prepare query: ERROR:  syntax error at or near "DISTINCT"
#> LINE 5:     "mytbl_LHS"."id" IS NOT DISTINCT FROM "mytbl...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant