Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: configure batch pipelines #658

Merged
merged 12 commits into from
Jun 29, 2023
Merged

feat: configure batch pipelines #658

merged 12 commits into from
Jun 29, 2023

Conversation

pnadolny13
Copy link
Contributor

Closes #604

I configured all pipelines that could support batch to use it. tap-github and tap-dynamodb use stream maps so I couldnt use batch for those.

I will likely revert a few of these after we see them work in CI since theyre somewhat unnecessary for some cases i.e. tap-snowflake for 3 records.

@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:35 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:35 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:35 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 27, 2023 23:41 — with GitHub Actions Inactive
This reverts commit f082cc2.
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:04 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:04 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:04 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:09 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:19 — with GitHub Actions Inactive
@pnadolny13
Copy link
Contributor Author

@kgpayne the tap-meltanohub fix you added did solve the problem and tests passed https://github.com/meltano/squared/actions/runs/5402924550/jobs/9845025953 expect for the reverse etl jobs that use target-yaml which is too old to support batch right now.

I removed all the dev commits and left only slack and cloudwatch as real batch use cases.

@pnadolny13 pnadolny13 temporarily deployed to test June 29, 2023 20:33 — with GitHub Actions Inactive
@pnadolny13 pnadolny13 merged commit 668917e into main Jun 29, 2023
@pnadolny13
Copy link
Contributor Author

pnadolny13 commented Jul 5, 2023

@edgarrmondragon I configured batch with cloudwatch and slack which do need state support. I didnt realize that wasnt supported. The interesting thing is that it seems like the state is still being tracked properly, at least in meltano cloud, I see the first tap-cloudwatch query in the log today says Submitting query for batch from: 2023-07-04T06:00:58 UTC - 2023-07-04T07:00:58 UTC. If state wasnt working properly it would have started at it start_date '2022-12-08. The runtimes look very similar before and after this merged so theres no obvious spike in sync time due to running a full refresh each time.

Any idea whats going on?

@edgarrmondragon edgarrmondragon deleted the configure_batch_pipelines branch July 5, 2023 20:54
@edgarrmondragon
Copy link
Contributor

@pnadolny13 Ok I didn't realize state was correctly incremented in batch, but looking at the code it seems to be handled correctly. Do let me know if you see anything that feels off with batch 🙏.

@kgpayne
Copy link
Contributor

kgpayne commented Jul 7, 2023

@edgarrmondragon I believe the STATE issue you are thinking of is limited to SQL-based Taps which override the BATCH methods (i.e. to export data using COPY or EXPORT, where the latest record needs to be identified separately from the export in order to emit state). In the case of slack and cloudwatch, I believe they are BATCHing from a stream of individual records, rather than using any kind of native export capability, so the Tap does still process each record and therefore can emit STATE messages as usual 🙂

@edgarrmondragon
Copy link
Contributor

@kgpayne that makes sense, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configure BATCH use case for testing
3 participants