Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add built-in support for BATCH message type (aka FAST_SYNC spec) #9

Closed
MeltyBot opened this issue Dec 18, 2020 · 3 comments · Fixed by #904
Closed

feature: Add built-in support for BATCH message type (aka FAST_SYNC spec) #9

MeltyBot opened this issue Dec 18, 2020 · 3 comments · Fixed by #904

Comments

@MeltyBot
Copy link
Contributor

MeltyBot commented Dec 18, 2020

This feature would bring BATCH message type support to SDK taps and targets, beginning with the .jsonl.gz file format saved to local storage.

Spec discussion:


Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/9

Originally created by @aaronsteers on 2020-12-18 18:12:08


Spec discussion (old)

This enhancement would add framework support for the new FAST_SYNC spec as described on the meltano thread (https://gitlab.com/meltano/meltano/-/issues/2364).

To kick off the discussion, what about this as a strawman spec:

List of spec changes to support Fast Sync (partial, wip):

  • register_batch_export_handler() - Registers a handler function to respond to batch export requests. Includes in the registration command a declaration of what file type and storage options are supported by the handler, along with the relative priority of the specific handler.
  • register_batch_import_handler() - Same as above but for targets.

Following from other design practices, we would not require that the tap author knows how to implement the BATCH message type, just that they return file paths in a way we can properly pass them to the downstream client (according to spec work on https://gitlab.com/meltano/meltano/-/issues/2364).

Example:

In the case of a Redshift UNLOAD command, the register_batch_export_handler() might give a function to execute the UNLOAD command, save to S3, and then download the files locally and return the corresponding local filepaths.

@MeltyBot
Copy link
Contributor Author

@labelsync-manager labelsync-manager bot added the kind/Feature New feature or request label Jun 23, 2022
@aaronsteers aaronsteers moved this to Up Next in Office Hours Jul 26, 2022
@aaronsteers aaronsteers changed the title Add built-in support for new FAST_SYNC spec (aka BATCH message type) Add built-in support for BATCH message type (aka FAST_SYNC spec) Jul 26, 2022
@aaronsteers
Copy link
Contributor

aaronsteers commented Jul 28, 2022

Noting from a talk with one user today: 20MM rows seems to be a turning point performance wise.

When we test performance, would be helpful to use 20-50MM rows as performance evaluation framework.

tap-stackoverflow was suggested as a source that can generate this volume of records fairly easily. Or here.

@aaronsteers aaronsteers moved this from Up Next to Discussed in Office Hours Aug 3, 2022
@edgarrmondragon edgarrmondragon changed the title Add built-in support for BATCH message type (aka FAST_SYNC spec) feat: Add built-in support for BATCH message type (aka FAST_SYNC spec) Aug 9, 2022
@edgarrmondragon edgarrmondragon changed the title feat: Add built-in support for BATCH message type (aka FAST_SYNC spec) feature: Add built-in support for BATCH message type (aka FAST_SYNC spec) Aug 9, 2022
@edgarrmondragon
Copy link
Collaborator

I'm making some improvements to documentation and developer workflow that will come in handy for this work:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants