Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think about splitting dbWriteTable() #74

Closed
krlmlr opened this issue Jan 15, 2016 · 14 comments
Closed

Think about splitting dbWriteTable() #74

krlmlr opened this issue Jan 15, 2016 · 14 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Jan 15, 2016

Perhaps define two new generics: dbCreateTable() and dbInsert(). The latter could support additional options, as suggested in #68 (comment).

dbWriteTable() would simply call those two new generics.

CC @beanumber

@krlmlr
Copy link
Member Author

krlmlr commented Sep 25, 2017

We'll also need dbClearTable() to support overwrite = TRUE.

@krlmlr
Copy link
Member Author

krlmlr commented Sep 25, 2017

Also, dbInsertIntoTable() should return a data frame with the newly created row identifiers. The column name(s) of that data frame should match actual column names in the table.

Postgres: RETURNING ID or perhaps an API call

MySQL: https://stackoverflow.com/q/7501464/946850

SQLite: https://stackoverflow.com/q/5867404/946850

@hadley
Copy link
Member

hadley commented Oct 24, 2017

Is there anything I can do to accelerate this?

@nbenn
Copy link
Member

nbenn commented Oct 24, 2017

Also facing issues with the limited framework for DDL statements offered by DBI, I started working on sqlr. The underlying problem is not resolved by adding INSERT/REPLACE and IGNORE switches to dbWriteTable() (as I did here). The user may require more control over data types (#199), foreign keys, unique constraints, table partitions, etc.

DBI might not the place for such added features. I believe there to be merit in having a separate package, focused on building SQL statements (#72). This could then in turn be used in DBI/DB back ends, as well as for packages such as dbplyr.

Dplyr already provides the translation of vector expressions and functions from R to SQL. But its scope is limited to the implementation of dplyr verbs (#170). We therefore need a tool which covers a more complete range of SQL, perhaps by combining the SQL capabilities of dbplyr and with something like I started with sqlr. I'd be happy to help out any way I can.

@krlmlr
Copy link
Member Author

krlmlr commented Oct 25, 2017

I planned to tackle the remaining DBI issues after RPostgres, maybe early December? I can shift priorities, though.

sqlr already seems way more powerful than the DBI interface will ever be. We should discuss if it's worthwhile to define a new DBI interface at all, or if we should leave DDL issues to sqlr. It would be great to have a grammar for generating SQL statements and queries. Eventually, dbplyr and also might become a user of sqlr.

@hadley
Copy link
Member

hadley commented Oct 25, 2017

From a very brief reading of the sqlr readme, I have two main concerns:

  • I think it's a bad idea to start from MySQL/MariaDB because they are not particularly standards compliant. I'd be happy to start from PostgresSQL, which generally adheres closely to the SQL spec.

  • More importantly, I think the scope may be too broad - SQL generation in dplyr is complex, and it only handles SELECT queries (CREATE TABLE and friends are gradually being eliminated in favour of DBI methods). SQL varies considerably from database to database and the SQL spec itself is huge, so I think sqlr need some strong constraints in order to be successful. One place to start (which would be particularly helpful for DBI) would be to focus on DDL.

I also don't like the way that the default connection is defined/established, but I think that's fairly easy to fix (although it does have broader implications for the API)

@nbenn
Copy link
Member

nbenn commented Oct 25, 2017

Thanks for the feedback. I appreciate it.

You make a good point regarding scope. I will focus on DDL statements.

@hadley how do you feel about making the basic generation of SQL independent of dbplyr? As of now, both DBI (though only rudimentarily) and dbplyr offer an sql classes, methods for quoting/escaping, etc. dbplyr goes a step further by offering translation of operators, functions, expressions, etc. from R to SQL. This could be useful beyond mapping dplyr verbs to database back ends.

@hadley
Copy link
Member

hadley commented Oct 25, 2017

SQL generation is the primary job of dbplyr, so I'm not sure what would remain. And generally, I don't have much time to spend on dbplyr in the near future, so I can't make any major changes.

@krlmlr
Copy link
Member Author

krlmlr commented Dec 18, 2017

Need support for the field.types argument, too.

@krlmlr
Copy link
Member Author

krlmlr commented Apr 24, 2018

  1. Add dbCreateTable() that by default calls sqlCreateTable() and then dbExecute()
    • field.types argument is not needed, current interface of sqlCreateTable() is sufficient
  2. Add dbInsertIntoTable() that by default uses INSERT INTO ... (...) VALUES (?, ?, ...) (need to override for RPostgres to use $1) in a parametrized query
    • Uses column names from supplied data frame
    • Creates SQL query, calls dbExecute(con, query, param = ...) . Tight coupling seems necessary because params must be either named or unnamed, depending on the type of placeholder
  3. Add default implementation for dbWriteTable()
    • Dropping tables is realized via dbRemoveTable()
  4. Further options like UPSERT (now available for SQLite) will be provided in new generics

@krlmlr
Copy link
Member Author

krlmlr commented Apr 24, 2018

Just checked, both dbCreateTable() and dbInsertIntoTable() aren't used by any CRAN package.

krlmlr added a commit that referenced this issue Apr 24, 2018
- New `dbCreateTable()` that by default calls `sqlCreateTable()` and then `dbExecute()` (#74).
krlmlr added a commit that referenced this issue Apr 24, 2018
- New `dbAppendTable()` that by default calls `sqlAppendTableTemplate()` and then `dbExecute()` with a `param` argument (#74).
@krlmlr
Copy link
Member Author

krlmlr commented Apr 24, 2018

Using dbAppendTable() (the original name in the Rdbi package, and consistent with sqlAppendTable()), also not used. Using logic in sqlAppendTableTemplate() .

@krlmlr
Copy link
Member Author

krlmlr commented Apr 24, 2018

Using the new generics in our three backends instead of a default implementation due to subtle differences.

krlmlr added a commit to r-dbi/RPostgres that referenced this issue Apr 25, 2018
- The `field.types` argument to `dbWriteTable()` now must be named.
- Using `dbCreateTable()` and `dbAppendTable()` internally (r-dbi/DBI#74).
krlmlr added a commit to r-dbi/RMariaDB that referenced this issue Apr 25, 2018
- Using `dbCreateTable()` and `dbAppendTable()` internally (r-dbi/DBI#74).
krlmlr added a commit to r-dbi/RSQLite that referenced this issue Apr 25, 2018
- The `field.types` argument to `dbWriteTable()` no longer takes precedence when defining the order of the columns in the new table.
- Using `dbCreateTable()` and `dbAppendTable()` internally (r-dbi/DBI#74).
@krlmlr
Copy link
Member Author

krlmlr commented Apr 25, 2018

Done.

@krlmlr krlmlr closed this as completed Apr 25, 2018
@ghost ghost removed the ready label Apr 25, 2018
@github-actions github-actions bot locked and limited conversation to collaborators Oct 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants