Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Shopify: Fix Inventory Level Substream - Incremental Sync Flawed Filtering #32883

Conversation

cobobrien
Copy link
Contributor

@cobobrien cobobrien commented Nov 28, 2023

What

The incremental sync for Shopify Inventory Level substreams pulls a fraction of the data that it should. This is due to the combination of an unsortable Parent Stream (Location) and the functionality fo the filter_records_newer_than_state method. Detailed explanation with examples in the issue: #32817

How

Adds Incremental Inventory Levels by using the updated_at_min filter. Is far more performant than what existed previously due to no longer requiring the filter_records_newer_than_state functionality.

Previously incremental substreams pulls had to query all existing data for that resource without any filtering, and then filter out the older-than-state data after the fact in the connector runtime (via the filter_records_newer_than_state method). This way, only the newer-than-state data is pulled via the REST interface.

It runs in a fraction of the time of the old implementation and will pull all qualifying records

Recommended reading order

  1. source.py

🚨 User Impact 🚨

All qualifying Inventory Level records will be pulled. This incremental stream sync will also be much more performant than it currently is.

Pre-merge Actions

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

Copy link

vercel bot commented Nov 28, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Dec 4, 2023 4:32pm

Copy link
Contributor

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Nov 28, 2023
@cobobrien cobobrien changed the title (32817) Fix: Shopify Inventory Level Substream - Incremental Sync Flawed Filtering (32817) Source Shopify: Fix Inventory Level Substream - Incremental Sync Flawed Filtering Nov 28, 2023
@cobobrien cobobrien changed the title (32817) Source Shopify: Fix Inventory Level Substream - Incremental Sync Flawed Filtering Source Shopify: Fix Inventory Level Substream - Incremental Sync Flawed Filtering Nov 28, 2023
@cobobrien cobobrien changed the title Source Shopify: Fix Inventory Level Substream - Incremental Sync Flawed Filtering 🐛 Source Shopify: Fix Inventory Level Substream - Incremental Sync Flawed Filtering Nov 28, 2023
@marcosmarxm marcosmarxm requested a review from bazarnov November 29, 2023 13:58
Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments thanks for the contribution @cobobrien

@@ -408,7 +408,7 @@ def stream_slices(self, stream_state: Mapping[str, Any] = None, **kwargs) -> Ite
# avoid checking `deleted` records for substreams, a.k.a `Metafields` streams,
# since `deleted` records are not available, thus we avoid HTTP-400 errors.
if self.deleted_cursor_field not in record:
yield {self.slice_key: record[self.nested_record]}
yield {self.slice_key: record[self.nested_record]} | (stream_state or {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reason of returning stream_state here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I required access to the original updated_at date (from the state file passed in as an argument to the sync execution) in the request_params method. I couldn't find a better approach:

  1. The stream_state in the request_params seems to be always an empty dict
  2. self.get_updated_state does no good here as we're looking for the value to use for the updated_at_min param
  3. This connector does not have a self.state property

I'm eager to know of a better way of achieving this as it felt a bit hacky to me too. Thanks @marcosmarxm 🙌

@cobobrien cobobrien requested a review from marcosmarxm December 4, 2023 16:57
@cobobrien
Copy link
Contributor Author

Left some comments thanks for the contribution @cobobrien

@marcosmarxm I have updated this PR. Could I please get a review?

@bazarnov
Copy link
Collaborator

bazarnov commented Dec 8, 2023

@cobobrien I'll take a look.

@bazarnov
Copy link
Collaborator

@cobobrien Thank you for your time spent on the problem, currently, I don't feel this PR would be needed in the light of this one: #32345 which will migrate the InventoryLevels from REST to BULK and will fix the state issues as well.

@marcosmarxm Should we still merge this in with what is being said? Overall the PR looks good.

@bazarnov
Copy link
Collaborator

The fix has been merged @marcosmarxm FYI.

@marcosmarxm
Copy link
Member

@cobobrien can you try latest version of the connector?

@marcosmarxm
Copy link
Member

Closed as #32345 solves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

5 participants