Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Shopify: Migrate from REST > GraphQL BULK Operations where possible, fixed STATE collisions for sub-streams #32345

Merged
merged 132 commits into from
Feb 26, 2024

Conversation

bazarnov
Copy link
Collaborator

@bazarnov bazarnov commented Nov 9, 2023

What

Resolving:

On-calls:

Regular issues:

How

  • freeze the STATE for Sub-streams to avoid STATE regression while reading the stream
  • updated the filter_records_newer_than_state method to use the cached_substream_state captures when the sync starts
  • updated config Exception to use the AibyteTracebackException for config errors
  • Added GraphQL BULK Operations to speed up the streams like:
    • metafield_collections
    • metafield_customers
    • metafield_draft_orders
    • metafield_locations
    • metafield_orders
    • metafield_product_images
    • metafield_product_variants
    • collections
    • discount_codes
    • fulfillment_orders
    • inventory_items
    • inventory_levels
    • customer_address
    • added new transactions_graphql (duplicated transactions stream with reduced schema)
  • re-organized the code structure:
    • moved all stream implementations from source.py > streams.streams.py
    • moved all base-class implementations from source.py > streams.base_streams.py
  • added new input configuration optional field GraphQL BULK Date Range in Days to manage BULK slices date range on demand.
  • refactored the next streams to fetch data from their direct parent and not fetch the data again using requests (another performance boost):
    • added IncrementalShopifyNestedSubstream and inherit from it:
      • Product Images
      • Product Variants
      • Order Refunds
      • Fulfillments
  • added concurrency checks to hold on to the job creating if another job is already running.
  • adjusted existing unit_tests to follow the current updates
  • covered new implementations with unit_tests

🔴 User Impact 🔴

This PR introduces the breaking change for the :

  • Fulfillments stream, by changing the cursor from id to updated_at - the stream requires reset
  • Product Images stream, by changing the cursor from id to updated_at - the stream requires reset
  • Product Variants stream, by changing the cursor from id to updated_at - the stream requires reset
  • Order Refunds stream, by changing the schema.refund_line_items.line_item.properties to array of strings, instead of object with properties.
  • Collections stream, a new scope is required to be added for all connections that use API_PASSWORD auth option:

Other changes are not considered as the breaking changes, but optimization and performance improvements.

P.S.: The community post about migrating OrderRisks > GQL: https://community.shopify.com/c/graphql-basics-and/get-the-orderrisks-full-data-using-graphql-not-rest-api/m-p/2353776/thread-id/12174

The Breaking Changes doc, lives here: https://docs.google.com/document/d/1riyv36EEkkP-T1LK2CHeKyfR_uKCjI5sm1sb4CqqYu0/edit

The Additional QA Tests Results lives here: https://docs.google.com/document/d/1aZbeyw_DCwnsoOlV-5KL8yLauA8I_FQMV7uI0jhP4o0/edit#heading=h.onifjcf8uiqw

@bazarnov bazarnov self-assigned this Nov 9, 2023
Copy link

vercel bot commented Nov 9, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Feb 26, 2024 1:33pm

Copy link
Contributor

github-actions bot commented Nov 9, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Nov 9, 2023
@bazarnov bazarnov marked this pull request as ready for review November 9, 2023 12:51
@bazarnov bazarnov requested review from girarda and a team November 9, 2023 12:52
Copy link
Collaborator

@lazebnyi lazebnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change makes sense to me. Can you add a test to confirm the fix (fine to do as a follow PR since we want to unblock the users)

@bazarnov
Copy link
Collaborator Author

bazarnov commented Nov 9, 2023

The pre-release version of the source-shopify has been published successfully:

cc @girarda

@bazarnov
Copy link
Collaborator Author

The new pre-release build has been created:

2.0.0-dev.abdc590830

Pre-release Notes:

  • The small regression fix for the Customer Address stream, when the defaultAddress property is None (literal), instead of object as expected from the API docs.

Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tested this with ~20 customers. let's 🚢

@bazarnov
Copy link
Collaborator Author

I'll merge this on Monday, February 26th, at 8:00 AM (PST)

@bazarnov bazarnov merged commit f509404 into master Feb 26, 2024
28 of 29 checks passed
@bazarnov bazarnov deleted the baz/source-shopify/fix-substream-state-filterring branch February 26, 2024 16:05
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
…re possible, fixed `STATE` collisions for `sub-streams` (airbytehq#32345)
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
…re possible, fixed `STATE` collisions for `sub-streams` (airbytehq#32345)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation breaking-change Don't merge me unless you are ready. connectors/source/google-search-console connectors/source/shopify
Projects
No open projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Source Shopify: product_variant uses id instead of updated_at as cursor field
9 participants