PubSub: Deprecate several FlowControl settings and things in Message class #8796

plamut · 2019-07-26T15:03:25Z

As per offline email thread, this PR marks several aspects of the Python PubSub client as deprecated. The PR assumes that deprecation will happen in version 0.43.1 (current version: 0.43.0)

How to test

Steps to perform:
Verify that all discussed features are properly marked as deprecated, and that this is reflected in the docs.

To generate the docs, run the following from inside the pubsub/ directory:

$ nox -f noxfile.py -s docs

... and open the generated docs in the browser:

$ google-chrome docs/_build/html/index.html

pubsub/google/cloud/pubsub_v1/subscriber/message.py

pubsub/tests/unit/pubsub_v1/subscriber/test_message.py

TheKevJames · 2019-07-29T20:05:47Z

@plamut I can't seem to find any information on what these settings are being deprecated in favor of -- is there somewhere we can read the "offline email thread" to catch up on this? It would be great if we could get an upgrade path here; ie. "instead of setting max_request_batch_size, please set..." or at least some info on what will be replacing these.

plamut · 2019-07-29T22:01:30Z

@TheKevJames I can give an unofficial answer - these settings are/were specific to the Python implementation of the PubSub client, while clients for other languages do not have them. Since consistency across various implementations is desired, the decision was made to deprecate some of the Python PubSub client implementation.

I will also try to get an official answer on this matter.

kamalaboulhosn · 2019-08-01T18:44:33Z

@TheKevJames The deprecated items fit into three categories:

Things that were not being used. Getting rid of these has no functional impact, e.g., max_requests.
Leaked implementation details from early on in the client library's existence that one should not need to set, e.g., max_request_batch_latency and max_request_batch_size
Items that were exclusive to the Python library that don't address specific needs on that language and are not APIs we want to have exposed at this time, e.g., resume_threshold.

If you have specific reasons for which the defaults of the properties are not sufficient, please let us know. If there are use cases where we need to consider adding more control, we can look at them across the Cloud Pub/Sub client libraries in all languages. For now, prior to marking the library GA, we'd like to err on the side of having too few controls exposed than having too many.

TheKevJames · 2019-08-01T19:28:10Z

@kamalaboulhosn thanks for the response!

(1) and (3) definitely make sense to me, and I'm fully onboard. I've never had occasion to tune resume_threshold anyway, so no harm done there.

I think (2) is where I get a bit confused, since the defaults have a bunch of scenarios where they might benefit from being tuned:

At $myworkplace we've tuned max_request_batch_latency and max_request_batch_size in response to our various use-cases and payload sizes. Those two parameters together determine:

max number of concurrently leased tasks (and therefore memory usage)
polling interval (and thus size and frequency time slices in the thread scheduler)
average and worst case latency of tasks in the Pub/Sub queue

Since we care pretty heavily about memory usage, performance, and latency in $myapps, we've tuned these pretty heavily.

We've done a bunch of math and have an internal guide for tuning these values, would sharing it be helpful?

kamalaboulhosn · 2019-08-01T20:12:14Z

@TheKevJames Are these things that max_bytes and max_messages are not sufficient to control? These are really the properties that were intended to be used for controlling memory. They should put a limit on the number of outstanding messages/bytes that are received by the client. These are probably even more important for memory usage because when these limits are reached, the client will pushback to the server to stop sending more messages to the client. I don't think the max_request_batch_latency and max_request_batch_size have any impact on this because they are only controlling the messages sent to callbacks after they have already been received by the client library.

TheKevJames · 2019-08-01T22:19:11Z

@kamalaboulhosn Its totally possible I misread the code and thus have an incorrect understanding! As far as I can tell, though:

the maximum number of concurrently leased tasks at peak is (f.max_messages * f.resume_threshold) + f.max_request_batch_size
the maximum memory usage in that case is thus (f.max_bytes * f.resume_threshold) + (f.max_request_batch_size * bytes_per_task)
those two values constrain each other, ie. since the lesser of max_bytes and max_messages is used, so the above two factoids are clamped to a maximum of min(max_tasks * bytes_per_task, max_memory)

Then psuedo-code for the SubscriptionClient would roughly be:

    def lease_more_tasks():
        start = time.now()
        yield queue.Queue.get(block=True)  # always returns >=1

        for _ in range(f.max_request_batch_size - 1):
            elapsed = time.now() - start
            yield queue.Queue.get(
                block=False,
                timeout=f.max_request_batch_latency-elapsed)
            if elapsed >= f.max_request_batch_latency:
                break

So therefore tuning max_request_batch_latency provides a tuning knob for the median latency of available tasks to-be-subscribed to. The expected best-case time for Queue.get() off a full queue is no worse than 0.3ms. This Queue should be filling up as fast as grpc can make requests to the Pubsub server, which should be fast enough to keep it filled, given those requests are batched. Therefore, we can expect:

avg_lease_latency ~= f.max_request_batch_size * 0.0003
worst_case_latency ~= f.max_request_batch_latency

Of course, some of that time is concurrent with task execution, so you would want to tune those values to be approximately equal to the expected time it takes to process your tasks in order to minimize time spent waiting on leasing.

kamalaboulhosn · 2019-08-01T23:12:21Z

@TheKevJames max_request_batch_size and max_request_batch_latency have nothing to do with controlling the number of message received. They are used to control when the library sends acks, nacks, and modackdeadlines to the server, which control message leases. It is trying to batch these requests up for efficiency reasons. These requests should be pretty small either way relative to message size.

I believe the amount of memory consumed by messages is going to be ~min(max_bytes, max_messages*sizeof(messages). The key method is the _on_response method. It receives messages as they come in from the stream. If we aren't above MAX_LOAD, we add it to a list of messages for which to dispatch the callback. Otherwise, we add it to a list of messages on hold that will be delivered once under the load threshold. These messages are taking up memory either way. The RESUME_THRESHOLD basically accounts for the fact that we may have overshot the limit (if we received a batch that had enough data to push us over the limit and had messages added to the hold list), meaning we have messages we can deliver without having to resume getting messages from the stream.

TheKevJames · 2019-08-02T01:08:12Z

max_request_batch_size and max_request_batch_latency have nothing to do with controlling the number of message received.

I'm definitely misunderstanding something, then! Would you mind pointing out the flaw in my following logic?

So at any given point, if you have more than max_messages * resume_threshold (or the bytes equivalent) leased tasks, the consumer is paused. If you have <= max_messages * resume_threshold leased tasks, then, the consumer gets resumed. At this point, you have at most max_messages * resume_threshold leased tasks and the consumer is beginning an iteration. That means we call this method, where max_items and max_latency were, respectively, max_request_batch_size and max_request_batch_latency prior to the deprecation. That method performs this operation, which would return up to max_items (ie. max_request_batch_size tasks, assuming we don't timeout according to max_request_batch_latency first. Assuming no timeouts, that method would return max_request_batch_size new tasks. Does that not mean we may have max_messages * resume_threshold + max_request_batch_size tasks in memory at peak?

kamalaboulhosn · 2019-08-02T01:38:26Z

This method is in QueueCallbackWorker. The only place that creates a QueueCallbackWorker is Dispatcher. The Dispatcher passes as the method to be called on the queue dispatch_callback. This method only deals with sending acks/nacks/modacks.

The queue in the instance of the Dispatcher is set to be the one in the scheduler. This queue passed into every constructed message. A message populates this queue with acks, nacks, and modacks. None of this affects the messages that are actually fetched from the server.

TheKevJames · 2019-08-02T17:24:46Z

Gotcha, that makes sense. Thanks, that helps tremendously!

plamut added 2 commits July 26, 2019 16:27

Deprecate several PubSub FlowControl settings

863d4af

Deprecate things in PubSub message.Message class

8fe42de

plamut added the api: pubsub Issues related to the Pub/Sub API. label Jul 26, 2019

plamut requested a review from anguillanneuf as a code owner July 26, 2019 15:03

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 26, 2019

plamut requested a review from busunkim96 July 26, 2019 15:03

plamut added type: docs Improvement to the documentation for an API. type: cleanup An internal cleanup or hygiene concern. labels Jul 26, 2019

tseaver reviewed Jul 26, 2019

View reviewed changes

pubsub/google/cloud/pubsub_v1/subscriber/message.py Outdated Show resolved Hide resolved

pubsub/tests/unit/pubsub_v1/subscriber/test_message.py Show resolved Hide resolved

Change deprecation release version to 0.44.0

fd215a2

tseaver approved these changes Jul 26, 2019

View reviewed changes

busunkim96 approved these changes Jul 26, 2019

View reviewed changes

plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 29, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 29, 2019

plamut merged commit b32e4e9 into googleapis:master Jul 29, 2019

plamut mentioned this pull request Jul 30, 2019

PubSub: Remove deprecated methods and settings #8836

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PubSub: Deprecate several FlowControl settings and things in Message class #8796

PubSub: Deprecate several FlowControl settings and things in Message class #8796

plamut commented Jul 26, 2019 •

edited

Loading

TheKevJames commented Jul 29, 2019

plamut commented Jul 29, 2019 •

edited

Loading

kamalaboulhosn commented Aug 1, 2019

TheKevJames commented Aug 1, 2019

kamalaboulhosn commented Aug 1, 2019

TheKevJames commented Aug 1, 2019

kamalaboulhosn commented Aug 1, 2019

TheKevJames commented Aug 2, 2019

kamalaboulhosn commented Aug 2, 2019

TheKevJames commented Aug 2, 2019

PubSub: Deprecate several FlowControl settings and things in Message class #8796

PubSub: Deprecate several FlowControl settings and things in Message class #8796

Conversation

plamut commented Jul 26, 2019 • edited Loading

How to test

TheKevJames commented Jul 29, 2019

plamut commented Jul 29, 2019 • edited Loading

kamalaboulhosn commented Aug 1, 2019

TheKevJames commented Aug 1, 2019

kamalaboulhosn commented Aug 1, 2019

TheKevJames commented Aug 1, 2019

kamalaboulhosn commented Aug 1, 2019

TheKevJames commented Aug 2, 2019

kamalaboulhosn commented Aug 2, 2019

TheKevJames commented Aug 2, 2019

plamut commented Jul 26, 2019 •

edited

Loading

plamut commented Jul 29, 2019 •

edited

Loading