-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
destination-snowflake: truncate large records #45431
destination-snowflake: truncate large records #45431
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @stephane-airbyte and the rest of your teammates on Graphite |
eb88eff
to
36f59f1
Compare
21c91c5
to
10105cc
Compare
36f59f1
to
69bf741
Compare
10105cc
to
83e2dda
Compare
69bf741
to
cbd1334
Compare
83e2dda
to
88fd78e
Compare
cbd1334
to
0005abc
Compare
88fd78e
to
1601c23
Compare
0005abc
to
2d22e6f
Compare
1601c23
to
a607474
Compare
25ccb81
to
45b9a08
Compare
05cc92b
to
09fb99c
Compare
45b9a08
to
5130763
Compare
1b5edad
to
d2c938a
Compare
5130763
to
85c5c73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
1 remove the primary key fields, totaling their size; add them to final data
2 sort the remaining by size ascending
3 keep adding them to the final data until we hit the limit, then start nulling and tracking changes
Questions:
- where does this limit come from
16 * 1_024 * 1_024
?; does SF enforce that the string size of all the fields in a row be less than this? or is this to enforce serialization size in the raw table? - is this a hard limit? if so, what happens if the PK fields are too big (obviously unlikely); also, if this is for serialization size in the raw table, won't that size be larger due to the format? do the extra meta fields come into play? could the meta itself blow up if there were lots of fields past the limit
- this leaves the fields sorted by size, I assume that's okay? nothing downstream is dependent on field order (including tests)?
d2c938a
to
dfd8fd1
Compare
85c5c73
to
75c2005
Compare
(declining this review, unless @stephane-airbyte you want a third set of eyes here) |
dfd8fd1
to
14f9e5c
Compare
75c2005
to
a85007e
Compare
a85007e
to
087abd9
Compare
There's a weird 16MB limit that's on the VARIANT type. It's weird because it's not exact. They say it's compressed size (so raw size could be bigger), but could be smaller because of overhead...
If the sum of PK fields is bigger than 16MB, we just try to insert the record anyways, and we'll fail then. Seemed like dying was the only right thing to do short of losing data
yeah, tests look at the json field names, they don't care about order. Neither do our T+D queries |
087abd9
to
454059f
Compare
What
This change implements a large record truncation mechanism for the Snowflake destination connector to handle records exceeding Snowflake's 16MB row size limit.
How
User Impact
Users can now sync large records to Snowflake without encountering errors due to row size limitations. Fields may be truncated to fit within the 16MB limit, but primary keys are always preserved. Metadata is added to indicate which fields were affected.