destination-s3: don't reuse names of existing objects #45143

stephane-airbyte · 2024-09-04T21:32:35Z

Instead of counting the number of files and starting creating file based on that counter, we create files starting at 0 and avoid overriding files that were already present.

The problem was that in case of an overwrite sync, the 1st sync would create files 1, 2, 3. Sync 2 would notice there's 3 files and would create files 4, 5 , 6 and delete 1, 2, 3 at the end of the sync. Sync3 and after would see there's 3 files, would overwrite 4, 5, 6 and delete them because they were here before the sync started, leaving us with no files.

fixes #6417

vercel · 2024-09-04T21:32:39Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 5, 2024 9:25pm

stephane-airbyte · 2024-09-04T21:32:51Z

destination-s3: don't reuse names of existing objects #45143 👈
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @stephane-airbyte and the rest of your teammates on Graphite

gisripa · 2024-09-05T15:47:55Z

...stinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/S3StorageOperations.kt

+        objectNameByPrefix.computeIfAbsent(
+            objectPath,
+        ) {
+            var objectList: Set<String> = setOf()


You can probably use this method here right ?

gisripa

lgtm

stephane-airbyte · 2024-09-05T19:42:28Z

/publish-java-cdk

🕑 https://github.com/airbytehq/airbyte/actions/runs/10727098973
✅ Successfully published Java CDK version=0.44.21!

johnny-schmidt

Makes sense:

when writing a new file
- try to get a partId for the prefix
  - start at 0
  - first time for the prefix: build a map of prefix -> { all existing full names with prefix }
  - while there's a conflict: increment and try again
- (no need to store the object you just made, because all subsequent calls should just increment any time there's a confict)

So:

Sync 1: write(0, 1, 2)
Sync 2: write(3, 4, 5, 6) delete(0, 1, 2)
Sync 3: write(0, 1, 2, 7, 8)

etc

octavia-squidington-iii added the CDK Connector Development Kit label Sep 4, 2024

stephane-airbyte force-pushed the stephane/09-04-destination-s3_don_t_reuse_names_of_existing_objects branch 5 times, most recently from 97d6945 to 1d1843c Compare September 5, 2024 00:33

octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/s3 labels Sep 5, 2024

vercel bot deployed to Preview September 5, 2024 00:38 View deployment

stephane-airbyte force-pushed the stephane/09-04-destination-s3_don_t_reuse_names_of_existing_objects branch 2 times, most recently from 640fbf4 to 0e56f8d Compare September 5, 2024 00:53

vercel bot deployed to Preview September 5, 2024 01:02 View deployment

stephane-airbyte requested review from gisripa and johnny-schmidt September 5, 2024 01:24

stephane-airbyte marked this pull request as ready for review September 5, 2024 01:24

stephane-airbyte requested review from a team as code owners September 5, 2024 01:24

gisripa reviewed Sep 5, 2024

View reviewed changes

stephane-airbyte force-pushed the stephane/09-04-destination-s3_don_t_reuse_names_of_existing_objects branch from 0e56f8d to f812f4f Compare September 5, 2024 18:10

vercel bot deployed to Preview September 5, 2024 18:15 View deployment

stephane-airbyte requested a review from gisripa September 5, 2024 19:37

gisripa approved these changes Sep 5, 2024

View reviewed changes

destination-s3: don't reuse names of existing objects

00c93c1

stephane-airbyte force-pushed the stephane/09-04-destination-s3_don_t_reuse_names_of_existing_objects branch from f812f4f to 00c93c1 Compare September 5, 2024 21:20

vercel bot deployed to Preview September 5, 2024 21:25 View deployment

stephane-airbyte merged commit 6730a3b into master Sep 5, 2024
34 checks passed

stephane-airbyte deleted the stephane/09-04-destination-s3_don_t_reuse_names_of_existing_objects branch September 5, 2024 21:38

johnny-schmidt approved these changes Sep 5, 2024

View reviewed changes

RocketerJames mentioned this pull request Sep 13, 2024

[destination-s3] Deferred deletion incorrectly deleting newly created files #45086

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

destination-s3: don't reuse names of existing objects #45143

destination-s3: don't reuse names of existing objects #45143

stephane-airbyte commented Sep 4, 2024 •

edited

Loading

vercel bot commented Sep 4, 2024 •

edited

Loading

stephane-airbyte commented Sep 4, 2024

gisripa Sep 5, 2024

gisripa left a comment

stephane-airbyte commented Sep 5, 2024 •

edited by github-actions bot

Loading

johnny-schmidt left a comment

destination-s3: don't reuse names of existing objects #45143

destination-s3: don't reuse names of existing objects #45143

Conversation

stephane-airbyte commented Sep 4, 2024 • edited Loading

vercel bot commented Sep 4, 2024 • edited Loading

stephane-airbyte commented Sep 4, 2024

gisripa Sep 5, 2024

Choose a reason for hiding this comment

gisripa left a comment

Choose a reason for hiding this comment

stephane-airbyte commented Sep 5, 2024 • edited by github-actions bot Loading

johnny-schmidt left a comment

Choose a reason for hiding this comment

stephane-airbyte commented Sep 4, 2024 •

edited

Loading

vercel bot commented Sep 4, 2024 •

edited

Loading

stephane-airbyte commented Sep 5, 2024 •

edited by github-actions bot

Loading