Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSub Source using Unary Pull #101

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

istreeter
Copy link
Contributor

The previous implementation of the PubSub Source was a wrapper around Subscriber provided by the 3rd-party pubsub sdk. That Subscriber is a wrapper around a lower-level GRPC stub. It used the "Streaming Pull" GRPC method to fetch messages from PubSub.

This commit abandons using the Subscriber and instead wraps the GRPC stub directly. It uses the "Unary Pull" GRPC method to fetch messages from PubSub.

We found that "Unary Pull" alleviates a problem in which PubSub occasionally re-delivers the same messages, causing downstream duplicates. The problem happened especially in apps like Lake Loader, which builds up a very large number of un-acked messages and then acks them all in one go at the end of a timed window.

Compared with the previous Source implementation it has these differences in behaviour:

  • Previously, the 3rd-party Subscriber managed ack extensions (a.k.a. modifying ack deadlines). Ack extension periods were adjusted dynamically according to runtime heuristics of message processing times. Whereas in this new Source, the ack extension period is a fixed configurable period.
  • Previously, the Subscriber periodically mod-acked all unacked messages currently held in memory. Whereas the new Source only mod-acks messages when they are approaching their ack deadline. This is an improvement for apps like Lake Loader which might have a very large number of outstanding unacked messages.
  • Unary Pull has slightly worse latency compared to Streaming Pull. I consider this ok for common-streams apps, where the latency caused by the Source is negligible compared to other latencies in the app.
  • Arguably, "Unary Pull" is a neater fit (less hacky) for a common-streams Source. Now, we simply pull a batch when a batch is needed.

The previous implementation of the PubSub Source was a wrapper around
`Subscriber` provided by the 3rd-party pubsub sdk. That `Subscriber` is
a wrapper around a lower-level GRPC stub. It used the "Streaming Pull"
GRPC method to fetch messages from PubSub.

This commit abandons using the `Subscriber` and instead wraps the GRPC
stub directly. It uses the "Unary Pull" GRPC method to fetch messages
from PubSub.

We found that "Unary Pull" alleviates a problem in which PubSub
occasionally re-delivers the same messages, causing downstream
duplicates. The problem happened especially in apps like Lake Loader,
which builds up a very large number of un-acked messages and then acks
them all in one go at the end of a timed window.

Compared with the previous Source implementation it has these
differences in behaviour:

- Previously, the 3rd-party `Subscriber` managed ack extensions (a.k.a.
  modifying ack deadlines). Ack extension periods were adjusted
  dynamically according to runtime heuristics of message processing
  times. Whereas in this new Source, the ack extension period is a fixed
  configurable period.
- Previously, the `Subscriber` periodically mod-acked all unacked
  messages currently held in memory. Whereas the new Source only
  mod-acks messages when they are approaching their ack deadline. This
  is an improvement for apps like Lake Loader which might have a very
  large number of outstanding unacked messages.
- Unary Pull has _slightly_ worse latency compared to Streaming Pull. I
  consider this ok for common-streams apps, where the latency caused
  by the Source is negligible compared to other latencies in the app.
- Arguably, "Unary Pull" is a neater fit (less hacky) for a
  common-streams Source. Now, we simply pull a batch when a batch is
  needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant