Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative implementation of a pubsub source - 2 #86

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

istreeter
Copy link
Collaborator

The pubsub Source from common-streams is a wrapper around Subscriber provided by the 3rd-party pubsub sdk. That Subscriber is a wrapper around a lower-level GRPC stub.

This commit adds an alternative Source which directly wraps the GRPC stub, not the higher-level Subscriber.

Compared with the previous Source implementation it has these differences in behaviour:

  • In the V1 source, ack extension periods were adjusted dynamically according to runtime heuristics of message processing times. In the V2 source, the ack extension period is a fixed configurable period.
  • The V1 source made a modack request (extending ack deadline) immediately after receiving any message. Whereas the V2 source does not modack a message unless its deadline is about the expire.
  • The V1 source periodically modacks all unacked messages currently held in memory. This is a problem for e.g. the Lake Loader which can have a very large number of unacked messages at any one time. The V2 source only modacks messages when they are approaching their ack deadline.
  • The V2 source uses a smaller thread pool for GRPC callbacks. The V1 source needed a very large thread pool to avoid deadlocks in setups that opened a large number of streaming pulls.
  • V2 opens PullRequests with pubsub, whereas V1 opened StreamingPullRequests

If this experimental V2 Source is successful, it is likely to be the replacement of the V1 Source in a future release of common-streams.

@istreeter istreeter force-pushed the alternative-pubsub-source-2 branch 2 times, most recently from 7612dc7 to 8b83f58 Compare October 16, 2024 14:00
The pubsub Source from common-streams is a wrapper around `Subscriber`
provided by the 3rd-party pubsub sdk. That `Subscriber` is a wrapper
around a lower-level GRPC stub.

This commit adds an alternative Source which directly wraps the GRPC
stub, not the higher-level Subscriber.

Compared with the previous Source implementation it has these
differences in behaviour:

- In the V1 source, ack extension periods were adjusted dynamically
  according to runtime heuristics of message processing times. In the V2
  source, the ack extension period is a fixed configurable period.
- The V1 source made a modack request (extending ack deadline)
  immediately after receiving any message.  Whereas the V2 source does
  not modack a message unless its deadline is about the expire.
- The V1 source periodically modacks all unacked messages currently held
  in memory. This is a problem for e.g. the Lake Loader which can have a
  very large number of unacked messages at any one time.  The V2 source
  only modacks messages when they are approaching their ack deadline.
- The V2 source uses a smaller thread pool for GRPC callbacks. The V1
  source needed a very large thread pool to avoid deadlocks in setups
  that opened a large number of streaming pulls.
- V2 opens PullRequests with pubsub, whereas V1 opened
  StreamingPullRequests

If this experimental V2 Source is successful, it is likely to be the
replacement of the V1 Source in a future release of common-streams.
@istreeter istreeter force-pushed the alternative-pubsub-source-2 branch from 8b83f58 to 042f02a Compare November 8, 2024 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant