Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow dispatching items to nested tables when specified parent_table_name equals resource.table_name #2106

Open
wants to merge 3 commits into
base: devel
Choose a base branch
from

Conversation

joscha
Copy link
Contributor

@joscha joscha commented Nov 28, 2024

Description

I am trying to dispatch items from one resource to different tables, but connect them via a _dlt_parent_id reference.

Expected

Two tables.

my_table:

_dlt_id id
xxx 1

other_table:

_dlt_id _dlt_parent_id id
yyy xxx 10

however instead the test case in this PR yields an exception:

E           dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage extract when processing package 1732801491.314115 with exception:
E
E           <class 'dlt.common.schema.exceptions.TablePropertiesConflictException'>
E           In schema: pipe_90aea10afd1f0c9f22f9154408b787ba: Cannot merge partial tables into table `my_other_table` due to property `resource` with different values: "my_table" != "my_table"

Where the error gets thrown, there is a comment:

# this should not really happen

:-D

And as you can see from the error message, the assertion is not quite right:

with different values: "my_table" != "my_table"

A few ideas:

  1. Am I holding it wrong? If yes, how can we prevent other people from doing the same?
  2. Should the error only be raised if the specified parent_table_name is ACTUALLY different from the table_name of the enclosing resource?

My feeling is 2., but I am not 100% sure. The test passes with my proposed change.

You can run the test via:

poetry run pytest -s tests/pipeline/test_pipeline.py -k test_mark_parent_table

There is a commented out chunk of code that does something similar actually and raises the same exception:

# def compare_tables(tab_a: TTableSchema, tab_b: TTableSchema) -> bool:
# try:
# table_name = tab_a["name"]
# if table_name != tab_b["name"]:
# raise TablePropertiesConflictException(table_name, "name", table_name, tab_b["name"])
# diff_table = diff_tables(tab_a, tab_b, ignore_table_name=False)
# # columns cannot differ
# return len(diff_table["columns"]) == 0
# except SchemaException:
# return False


On a side note, I did just notice that there is no _dlt_parent_id in my_other_table with this test case, it looks like this:

 {'my_other_table': [{'id': 10, '_dlt_load_id': '1732802504.14759', '_dlt_id': 'IUDf2Ch1OievNg'}]}

unsure where to pass the hint to create the parent child relationship.

Copy link

netlify bot commented Nov 28, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 6fa2e06
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/675726100dc0aa00086c9bab
😎 Deploy Preview https://deploy-preview-2106--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joscha joscha force-pushed the joscha/mark-parent-table branch from 88d8192 to b8b5ecd Compare November 28, 2024 13:56
@joscha joscha changed the title test: add test with hint for parent table fix: allow dispatching items to nested tables when specified parent_table_name equals resource.table_name Nov 28, 2024
@joscha
Copy link
Contributor Author

joscha commented Dec 11, 2024

@sh-rp or @burnash can I get your 👀 on this, please? I think it might potentially be a simple fix to accept. Tests are green, no changes to current behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant