-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source Shopify: Add metafield streams #17962
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes are required. Please refer to the comments.
Also, let's try to cache the Products
stream, since we have it re-used a lot for sub-sequent streams. To do so, simply add use_cache = True
attribute to the Products
stream class, the cache will be created and should be re-used for all other subsequent stream classes where possible (if possible), this will also reduce the sync time dramatically, I think. WDYT?
airbyte-integrations/connectors/source-shopify/source_shopify/source.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-shopify/source_shopify/source.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-shopify/source_shopify/source.py
Outdated
Show resolved
Hide resolved
/test connector=connectors/source-shopify
Build FailedTest summary info:
|
/test connector=connectors/source-shopify
Build FailedTest summary info:
|
/test connector=connectors/source-shopify
Build FailedTest summary info:
|
/test connector=connectors/source-shopify
Build FailedTest summary info:
|
/test connector=connectors/source-shopify
Build FailedTest summary info:
|
/test connector=connectors/source-shopify
Build FailedTest summary info:
|
/test connector=connectors/source-shopify
Build PassedTest summary info:
|
/test connector=connectors/source-shopify
Build PassedTest summary info:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
airbyte-integrations/connectors/source-shopify/source_shopify/source.py
Outdated
Show resolved
Hide resolved
/publish connector=connectors/source-shopify
if you have connectors that successfully published but failed definition generation, follow step 4 here |
* 🎉 Source Shopify: Add metafield streams * Source Shopify: fix unittest * Source Shopify: docs update * Source Shopify: fix backward compatibility test * Source Shopify: fix schemas * Source Shopify: fix state filter * Source Shopify: refactor & optimize * Source Shopify: fix test privileges * Source Shopify: fix stream filter * Source Shopify: fix streams * Source Shopify: update abnormal state * Source Shopify: fix abnormal state streams * Source Shopify: fix streams * updated methods, formated code * Source Shopify: typo fix * auto-bump connector version Co-authored-by: Oleksandr Bazarnov <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]>
* Implement ColumnSortButton component * Updates component name; Moves component to ui/Table folder; Refactors formattedMessageId property into using render content as children directly; Removes minor SortIcon component * Update airbyte-webapp/src/App.tsx Co-authored-by: Edmundo Ruiz Ghanem <[email protected]> * Updates next properties: wasActive -> isActive, lowToLarge -> isAscending * Skip psql stop in acceptance test for gke (#18023) * Checks for iterator hasNext element (#18041) * Checks for iterator hasNext element * Fix linter with newline * Add Message Migration to Destination Connection Checks (#17954) * Add Message Migration to Destination Connection Checks * Fix test setup * Update helm release workflow (#18048) * Update workflow * Update trigger rules * fix: Update release workflow with abillity to add tags * Update workflow * Remove unused `airbyte-cli` (#18009) * 🐛 [low-code] $options shouldn't overwrite values that are already defined (#18060) * fix * Add missing test * remove prints * extract to method * rename * Add missing test * rename * bump * Update helm chart comments (#18072) * Update helm charts (#18073) * add test * fix chart.yaml * 16250 Destination Redis: Add SSH support (#17951) * 16250 Destination Redis: Add SSH support * 16250 Resolve port issue * 11679 Bump version * auto-bump connector version Co-authored-by: Octavia Squidington III <[email protected]> * Bump helm chart version reference to 0.40.20 (#18074) * Bump helm chart version reference to 0.40.20 * remove binary Co-authored-by: xpuska513 <[email protected]> Co-authored-by: Kyryl Skobylko <[email protected]> * Helm Chart: Create service annotations for airbyte-server (#17932) * Support annotations for airbyte-server as well, update version and update docs. * Fix auto-indent. Co-authored-by: Kyryl Skobylko <[email protected]> * Bmoric/remove dep server worker (#17894) * test [ci skip] * Autogenerated files * Add missing annotation * Remove unused json2Schema block from worker * Move tess * Missing deps and format * Fix test build * TMP * Add missing dependencies * PR comments * Tmp * [ci skip] Tmp * Fix acceptance test and add the seed dependency * Fix build * For diff * tmp * Build pass * make the worker to be on the platform only * fix setting.yaml * Fix pmd * Fix Cron * Add chart * Fix cron * Fix server build.gradle * Fix jar conflict * PR comments * Add cron micronaut environemnt * Updated connector catalog page (#18076) * Move the port forward outside of the main docker-compose (#17864) * Bump Airbyte version from 0.40.14 to 0.40.15 (#17970) Co-authored-by: benmoriceau <[email protected]> * 🎉 Source Shopify: Add metafield streams (#17962) * 🎉 Source Shopify: Add metafield streams * Source Shopify: fix unittest * Source Shopify: docs update * Source Shopify: fix backward compatibility test * Source Shopify: fix schemas * Source Shopify: fix state filter * Source Shopify: refactor & optimize * Source Shopify: fix test privileges * Source Shopify: fix stream filter * Source Shopify: fix streams * Source Shopify: update abnormal state * Source Shopify: fix abnormal state streams * Source Shopify: fix streams * updated methods, formated code * Source Shopify: typo fix * auto-bump connector version Co-authored-by: Oleksandr Bazarnov <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]> * fix check for streams that do not use a stream slicer (#18080) * fix check for streams that do not use a stream slicer * increment version and changelog before publish * tolerate database nulls in webhook operation configs (#18084) * Implement webhook operation in the sync workflow (#18022) Implements the webhook operation as part of the sync workflow. - Introduces the new activity implementation - Updates the various interfaces that pass input to get the relevant configs to the sync workflow - Hooks the new activity into the sync workflow - Passes the webhook configs along into the sync workflow job * Bump helm chart version reference to 0.40.22 (#18077) * Added new "filters" python file, along with a "hash" filter. This can… (#18000) * Added new "filters" python file, along with a "hash" filter. This can be extended to include other custom filters in the future. * Added additional comments * Moved usage of the hash_obj inside the conditional that confirms it exists * Moved the hash function call inside a condition to ensure that it exists * Fixed the application of the salt , so that it does not modify the hash unless it is actually passed in. * Added unit tests to validate new jinja hash functionality * Updated unit test to pass numeric value as a float instead of string * Removed unreferenced import to pytest * Updated version * format * format * format * format * format Co-authored-by: Alexandre Girard <[email protected]> * Bump helm chart version reference to 0.40.24 (#18081) * Bump helm chart version reference to 0.40.24 * Update .gitignore Co-authored-by: benmoriceau <[email protected]> Co-authored-by: Kyryl Skobylko <[email protected]> * SATs: allow new records in a sequential read for full refresh test (#17660) * SATs: allow new records in a sequential read for full refresh test * SATs: upd changelog * SATs: change the output when failing full refresh test * SATs: upd according to code review * Source facebook-marketing: remove `pixel` from custom conversions stream (#18045) * #744 source facebook-marketing: rm pixel from custom conversions stream * #744 source fb marketing: upd changelog * #744 source facebook-marketing - add custom_conversions to the test catalog * auto-bump connector version Co-authored-by: Octavia Squidington III <[email protected]> * #17506 fix klaviyo & marketo expected_records (#18101) Co-authored-by: Edmundo Ruiz Ghanem <[email protected]> Co-authored-by: terencecho <[email protected]> Co-authored-by: Ryan Fu <[email protected]> Co-authored-by: Jimmy Ma <[email protected]> Co-authored-by: Kyryl Skobylko <[email protected]> Co-authored-by: Evan Tahler <[email protected]> Co-authored-by: Alexandre Girard <[email protected]> Co-authored-by: Yevhen Sukhomud <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: xpuska513 <[email protected]> Co-authored-by: Prasanth <[email protected]> Co-authored-by: Benoit Moriceau <[email protected]> Co-authored-by: Amruta Ranade <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]> Co-authored-by: benmoriceau <[email protected]> Co-authored-by: Artem Inzhyyants <[email protected]> Co-authored-by: Oleksandr Bazarnov <[email protected]> Co-authored-by: Brian Lai <[email protected]> Co-authored-by: Michael Siega <[email protected]> Co-authored-by: Alexander Marquardt <[email protected]> Co-authored-by: Denys Davydov <[email protected]>
* 🎉 Source Shopify: Add metafield streams * Source Shopify: fix unittest * Source Shopify: docs update * Source Shopify: fix backward compatibility test * Source Shopify: fix schemas * Source Shopify: fix state filter * Source Shopify: refactor & optimize * Source Shopify: fix test privileges * Source Shopify: fix stream filter * Source Shopify: fix streams * Source Shopify: update abnormal state * Source Shopify: fix abnormal state streams * Source Shopify: fix streams * updated methods, formated code * Source Shopify: typo fix * auto-bump connector version Co-authored-by: Oleksandr Bazarnov <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]>
* Implement ColumnSortButton component * Updates component name; Moves component to ui/Table folder; Refactors formattedMessageId property into using render content as children directly; Removes minor SortIcon component * Update airbyte-webapp/src/App.tsx Co-authored-by: Edmundo Ruiz Ghanem <[email protected]> * Updates next properties: wasActive -> isActive, lowToLarge -> isAscending * Skip psql stop in acceptance test for gke (airbytehq#18023) * Checks for iterator hasNext element (airbytehq#18041) * Checks for iterator hasNext element * Fix linter with newline * Add Message Migration to Destination Connection Checks (airbytehq#17954) * Add Message Migration to Destination Connection Checks * Fix test setup * Update helm release workflow (airbytehq#18048) * Update workflow * Update trigger rules * fix: Update release workflow with abillity to add tags * Update workflow * Remove unused `airbyte-cli` (airbytehq#18009) * 🐛 [low-code] $options shouldn't overwrite values that are already defined (airbytehq#18060) * fix * Add missing test * remove prints * extract to method * rename * Add missing test * rename * bump * Update helm chart comments (airbytehq#18072) * Update helm charts (airbytehq#18073) * add test * fix chart.yaml * 16250 Destination Redis: Add SSH support (airbytehq#17951) * 16250 Destination Redis: Add SSH support * 16250 Resolve port issue * 11679 Bump version * auto-bump connector version Co-authored-by: Octavia Squidington III <[email protected]> * Bump helm chart version reference to 0.40.20 (airbytehq#18074) * Bump helm chart version reference to 0.40.20 * remove binary Co-authored-by: xpuska513 <[email protected]> Co-authored-by: Kyryl Skobylko <[email protected]> * Helm Chart: Create service annotations for airbyte-server (airbytehq#17932) * Support annotations for airbyte-server as well, update version and update docs. * Fix auto-indent. Co-authored-by: Kyryl Skobylko <[email protected]> * Bmoric/remove dep server worker (airbytehq#17894) * test [ci skip] * Autogenerated files * Add missing annotation * Remove unused json2Schema block from worker * Move tess * Missing deps and format * Fix test build * TMP * Add missing dependencies * PR comments * Tmp * [ci skip] Tmp * Fix acceptance test and add the seed dependency * Fix build * For diff * tmp * Build pass * make the worker to be on the platform only * fix setting.yaml * Fix pmd * Fix Cron * Add chart * Fix cron * Fix server build.gradle * Fix jar conflict * PR comments * Add cron micronaut environemnt * Updated connector catalog page (airbytehq#18076) * Move the port forward outside of the main docker-compose (airbytehq#17864) * Bump Airbyte version from 0.40.14 to 0.40.15 (airbytehq#17970) Co-authored-by: benmoriceau <[email protected]> * 🎉 Source Shopify: Add metafield streams (airbytehq#17962) * 🎉 Source Shopify: Add metafield streams * Source Shopify: fix unittest * Source Shopify: docs update * Source Shopify: fix backward compatibility test * Source Shopify: fix schemas * Source Shopify: fix state filter * Source Shopify: refactor & optimize * Source Shopify: fix test privileges * Source Shopify: fix stream filter * Source Shopify: fix streams * Source Shopify: update abnormal state * Source Shopify: fix abnormal state streams * Source Shopify: fix streams * updated methods, formated code * Source Shopify: typo fix * auto-bump connector version Co-authored-by: Oleksandr Bazarnov <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]> * fix check for streams that do not use a stream slicer (airbytehq#18080) * fix check for streams that do not use a stream slicer * increment version and changelog before publish * tolerate database nulls in webhook operation configs (airbytehq#18084) * Implement webhook operation in the sync workflow (airbytehq#18022) Implements the webhook operation as part of the sync workflow. - Introduces the new activity implementation - Updates the various interfaces that pass input to get the relevant configs to the sync workflow - Hooks the new activity into the sync workflow - Passes the webhook configs along into the sync workflow job * Bump helm chart version reference to 0.40.22 (airbytehq#18077) * Added new "filters" python file, along with a "hash" filter. This can… (airbytehq#18000) * Added new "filters" python file, along with a "hash" filter. This can be extended to include other custom filters in the future. * Added additional comments * Moved usage of the hash_obj inside the conditional that confirms it exists * Moved the hash function call inside a condition to ensure that it exists * Fixed the application of the salt , so that it does not modify the hash unless it is actually passed in. * Added unit tests to validate new jinja hash functionality * Updated unit test to pass numeric value as a float instead of string * Removed unreferenced import to pytest * Updated version * format * format * format * format * format Co-authored-by: Alexandre Girard <[email protected]> * Bump helm chart version reference to 0.40.24 (airbytehq#18081) * Bump helm chart version reference to 0.40.24 * Update .gitignore Co-authored-by: benmoriceau <[email protected]> Co-authored-by: Kyryl Skobylko <[email protected]> * SATs: allow new records in a sequential read for full refresh test (airbytehq#17660) * SATs: allow new records in a sequential read for full refresh test * SATs: upd changelog * SATs: change the output when failing full refresh test * SATs: upd according to code review * Source facebook-marketing: remove `pixel` from custom conversions stream (airbytehq#18045) * airbytehq#744 source facebook-marketing: rm pixel from custom conversions stream * airbytehq#744 source fb marketing: upd changelog * airbytehq#744 source facebook-marketing - add custom_conversions to the test catalog * auto-bump connector version Co-authored-by: Octavia Squidington III <[email protected]> * #17506 fix klaviyo & marketo expected_records (airbytehq#18101) Co-authored-by: Edmundo Ruiz Ghanem <[email protected]> Co-authored-by: terencecho <[email protected]> Co-authored-by: Ryan Fu <[email protected]> Co-authored-by: Jimmy Ma <[email protected]> Co-authored-by: Kyryl Skobylko <[email protected]> Co-authored-by: Evan Tahler <[email protected]> Co-authored-by: Alexandre Girard <[email protected]> Co-authored-by: Yevhen Sukhomud <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: xpuska513 <[email protected]> Co-authored-by: Prasanth <[email protected]> Co-authored-by: Benoit Moriceau <[email protected]> Co-authored-by: Amruta Ranade <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]> Co-authored-by: benmoriceau <[email protected]> Co-authored-by: Artem Inzhyyants <[email protected]> Co-authored-by: Oleksandr Bazarnov <[email protected]> Co-authored-by: Brian Lai <[email protected]> Co-authored-by: Michael Siega <[email protected]> Co-authored-by: Alexander Marquardt <[email protected]> Co-authored-by: Denys Davydov <[email protected]>
Hi, @bazarnov I am looking at the source-shopify connector performance lately, especially to speed up syncs of metafields. I am wondering why you only enabled cache for the |
@loris My best guess here, we didn't realize the need. @artem1205 are there any difficulties with caching the As for me, this will not add more speed to the connector, since these streams are not Did you create the issue for this? @loris |
Hi @bazarnov. Thanks for the reply, I will create a new issue regarding Shopify metafields, but first I have some question you can answer to help me understand some Airbyte design choices to handle metafields with substreams. Considering the way metafields are modelized and exposed in Shopify:
I don't understand why metafields (and substream in general) are designed the way they are in Airbyte right now as separate standalone streams, with their own state, and which needs to use the parent stream query to fetch the resource IDs (hopefully, some queries can be cached, but it comes with memory challenge). My understanding, is that it was designed this way to make it possible in an Airbyte connection to sync only the substream (ie, the metafields) without syncing the parent stream (ie, the products), but I don't get why someone would want to do that. It would be easier to consider metafields as dependant streams of the parent resource, meaning:
WDYT? Are you aware of other Airbyte connectors which have these "required" streams (ie, you cannot enable stream A if you don't have stream B enabled) |
@loris Regarding your question:
me either, but there are Customers who need just 1 or 2 streams fetched independently, but in a nutshell - the
As for:
The Shopify connector is one among the few others that use this technique to fetch the streams explicitly independent providing the parent stream state and the substream state to fetch the substream based on the parent stream to provide the incremental functionality. As has already been said - they look independent in the UI but on the backend, they are synced based on the parent updates. If not, we will consider all the issues around the |
@bazarnov Thanks for the detailed answer.
As you can see, the
|
@loris have you noticed that the metafields could be added and regardless the actual customer object for instance? like the metafields field "name" could be updated without the update for the actual customer id directly using API? which means there is a possibility someone will update the properties of the metafields instead of updating the actual record id. that's why the metafields have their "updated_at" because they are not stateless. |
I'll double-check the actual logic how the metafields streams work, once i work on this issue in a few days, thanks you for your findings |
@bazarnov Just did some tests on a shopify store we own to validate which state we should use: Initial state
Adding a metafield
Updating a metafield
Deleting a metafield
As you can see, the only proper way to synchronize the metafields of a Shopify resource (and handle all operations: creation, modification and deletion of metafields) is to do a full refresh of the metafields of a given parent resource when the parent resource has its updated_at value changed. That's why I proposed (but not sure how it goes in the opposite direction of how Airbyte handles child stream) to sync metafields dependently of the parent streams sync: we only store the updated_at state of the parent stream, fetch updated parent resource, and for each fetched parent records fetch the associated metafields (no need for any metafields state). WDYT? |
@loris
It turns out that we're indeed doing like you propose: We sync the See the logic here: https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-shopify/source_shopify/source.py#L257 This method produces the The ids are taken and passed to fetch all Initial sync sample:Parent stream
Substream:// Metafields Customer
[
{
"id": 1111,
"namespace": "custom",
"value": "Test\n",
"description": null,
"owner_id": 123,
"created_at": "2023-04-13T04:50:10-07:00",
"updated_at": "2023-04-13T04:50:10-07:00",
....
},
{
"id": 222,
"namespace": "custom",
"value": "Test15",
"description": null,
"owner_id": 123,
"created_at": "2023-04-13T04:50:10-07:00",
"updated_at": "2023-04-14T05:31:11-07:00",
....
}
] Statethe {
"state": {
"metafield_customers": {
"updated_at": "2023-04-14T05:31:11-07:00",
"customers": {
"updated_at": "2023-04-24T06:53:48-07:00"
}
}
}
} Because the Now, when we run the Subsequent Incremental syncParent stream
The We fetch the same customer again but we expect a different number of values, smaller ones. Substream:{
"id": 222,
"namespace": "custom",
"value": "Test15",
"description": null,
"owner_id": 123,
"created_at": "2023-04-13T04:50:10-07:00",
"updated_at": "2023-04-14T05:31:11-07:00",
....
} State{
"state": {
"metafield_customers": {
"updated_at": "2023-04-14T05:31:11-07:00",
"customers": {
"updated_at": "2023-04-24T06:53:48-07:00"
}
}
}
} The reason the if we fetch the records like you propose, this record will be omitted for the |
@bazarnov Thanks for the investigation, but how the current design (in code and that you described) can handle deletions of metafields? Is this a known issue / limitations that Airbyte will not propagate deletions of metafields (and thus we should warn customers and give them workarounds, like doing full refresh)? Or did I missed something? I still do understand the only way to handle deletions is to do a full refresh (aka, fetch all metafields for a given parent, and emit them all to the destination) for any
|
What
Resolving #5493
How
Add metafield streams
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.