-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql_database.with_resources(*) reflects unnecessary table schemas #2121
Comments
@jeff-skoldberg-gmds I do not understand the kind of problem you have with passing a list of tables upfront. I assume you know them? The problem you describe comes from the way Python executes code: it will first create How we can make your code simpler? If you know all your table names in advance, you can pass them to the the other options is passing your own Another option coming to my mind is that we add an option that fully skips reflection and in that case we just infer everything from the data and you must provide a custom SELECT statement to extract |
Hi @rudolfix , thanks for your questions here. Let me see if I can organize my thoughts. why do I not want to pass a list of tablesI like the idea of defining resources up front in config, including what column is the cursor column, etc, write disposition, etc. Passing a list of tables is working for me, but I would like the flexibility to pass resources.
|
dlt version
1.4.0
Describe the problem
Context in this slack thread.
Summary
This code does schema reflection for every table and column in the database.schema, even if only one table is passed in.
However, this code does exactly what a user would expect, it only does schema reflection for the necessary columns:
bug or feature?
There is a debate in the slack thread that this is not a bug. But I would ask these questions:
If the answers are negative on these questions, I feel like we have to categorize this as a bug.
Personally, I can't imagine a single user would want dlt to reflect the schema for random columns they are not using.
impact to my code
using
table_names=
kwarg limits me with certain features, since now I am not defining specific resources, it is just table names. Therefore I have to dohints
after the source is created instead of using config up front with the hints. (Maybe I'm doing something wrong, but that has been my experience as a non-expert here...)Expected behavior
I would expect that only tables and columns requested by the user undergo schema reflection.
Steps to reproduce
Run this:
I think this is the same behavior for all sql_database types, but I am using DB2 if that matters at all.
Operating system
Windows
Runtime environment
Local
Python version
3.11
dlt data source
sql_database
dlt destination
DuckDB, Snowflake
Other deployment details
No response
Additional information
Work around is to pass a list of tables then do something like this:
The text was updated successfully, but these errors were encountered: