Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSub: Deprecate several FlowControl settings and things in Message class #8796

Merged
merged 3 commits into from
Jul 29, 2019

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Jul 26, 2019

As per offline email thread, this PR marks several aspects of the Python PubSub client as deprecated. The PR assumes that deprecation will happen in version 0.43.1 (current version: 0.43.0)

How to test

Steps to perform:
Verify that all discussed features are properly marked as deprecated, and that this is reflected in the docs.

To generate the docs, run the following from inside the pubsub/ directory:

$ nox -f noxfile.py -s docs

... and open the generated docs in the browser:

$ google-chrome docs/_build/html/index.html 

@plamut plamut added the api: pubsub Issues related to the Pub/Sub API. label Jul 26, 2019
@plamut plamut requested a review from anguillanneuf as a code owner July 26, 2019 15:03
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 26, 2019
@plamut plamut requested a review from busunkim96 July 26, 2019 15:03
@plamut plamut added type: docs Improvement to the documentation for an API. type: cleanup An internal cleanup or hygiene concern. labels Jul 26, 2019
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 29, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 29, 2019
@plamut plamut merged commit b32e4e9 into googleapis:master Jul 29, 2019
@TheKevJames
Copy link

@plamut I can't seem to find any information on what these settings are being deprecated in favor of -- is there somewhere we can read the "offline email thread" to catch up on this? It would be great if we could get an upgrade path here; ie. "instead of setting max_request_batch_size, please set..." or at least some info on what will be replacing these.

@plamut
Copy link
Contributor Author

plamut commented Jul 29, 2019

@TheKevJames I can give an unofficial answer - these settings are/were specific to the Python implementation of the PubSub client, while clients for other languages do not have them. Since consistency across various implementations is desired, the decision was made to deprecate some of the Python PubSub client implementation.

I will also try to get an official answer on this matter.

@kamalaboulhosn
Copy link

@TheKevJames The deprecated items fit into three categories:

  1. Things that were not being used. Getting rid of these has no functional impact, e.g., max_requests.
  2. Leaked implementation details from early on in the client library's existence that one should not need to set, e.g., max_request_batch_latency and max_request_batch_size
  3. Items that were exclusive to the Python library that don't address specific needs on that language and are not APIs we want to have exposed at this time, e.g., resume_threshold.

If you have specific reasons for which the defaults of the properties are not sufficient, please let us know. If there are use cases where we need to consider adding more control, we can look at them across the Cloud Pub/Sub client libraries in all languages. For now, prior to marking the library GA, we'd like to err on the side of having too few controls exposed than having too many.

@TheKevJames
Copy link

@kamalaboulhosn thanks for the response!

(1) and (3) definitely make sense to me, and I'm fully onboard. I've never had occasion to tune resume_threshold anyway, so no harm done there.

I think (2) is where I get a bit confused, since the defaults have a bunch of scenarios where they might benefit from being tuned:

At $myworkplace we've tuned max_request_batch_latency and max_request_batch_size in response to our various use-cases and payload sizes. Those two parameters together determine:

  • max number of concurrently leased tasks (and therefore memory usage)
  • polling interval (and thus size and frequency time slices in the thread scheduler)
  • average and worst case latency of tasks in the Pub/Sub queue

Since we care pretty heavily about memory usage, performance, and latency in $myapps, we've tuned these pretty heavily.

We've done a bunch of math and have an internal guide for tuning these values, would sharing it be helpful?

@kamalaboulhosn
Copy link

@TheKevJames Are these things that max_bytes and max_messages are not sufficient to control? These are really the properties that were intended to be used for controlling memory. They should put a limit on the number of outstanding messages/bytes that are received by the client. These are probably even more important for memory usage because when these limits are reached, the client will pushback to the server to stop sending more messages to the client. I don't think the max_request_batch_latency and max_request_batch_size have any impact on this because they are only controlling the messages sent to callbacks after they have already been received by the client library.

@TheKevJames
Copy link

@kamalaboulhosn Its totally possible I misread the code and thus have an incorrect understanding! As far as I can tell, though:

  • the maximum number of concurrently leased tasks at peak is (f.max_messages * f.resume_threshold) + f.max_request_batch_size
  • the maximum memory usage in that case is thus (f.max_bytes * f.resume_threshold) + (f.max_request_batch_size * bytes_per_task)
  • those two values constrain each other, ie. since the lesser of max_bytes and max_messages is used, so the above two factoids are clamped to a maximum of min(max_tasks * bytes_per_task, max_memory)

Then psuedo-code for the SubscriptionClient would roughly be:

    def lease_more_tasks():
        start = time.now()
        yield queue.Queue.get(block=True)  # always returns >=1

        for _ in range(f.max_request_batch_size - 1):
            elapsed = time.now() - start
            yield queue.Queue.get(
                block=False,
                timeout=f.max_request_batch_latency-elapsed)
            if elapsed >= f.max_request_batch_latency:
                break

So therefore tuning max_request_batch_latency provides a tuning knob for the median latency of available tasks to-be-subscribed to. The expected best-case time for Queue.get() off a full queue is no worse than 0.3ms. This Queue should be filling up as fast as grpc can make requests to the Pubsub server, which should be fast enough to keep it filled, given those requests are batched. Therefore, we can expect:

  • avg_lease_latency ~= f.max_request_batch_size * 0.0003
  • worst_case_latency ~= f.max_request_batch_latency

Of course, some of that time is concurrent with task execution, so you would want to tune those values to be approximately equal to the expected time it takes to process your tasks in order to minimize time spent waiting on leasing.

@kamalaboulhosn
Copy link

@TheKevJames max_request_batch_size and max_request_batch_latency have nothing to do with controlling the number of message received. They are used to control when the library sends acks, nacks, and modackdeadlines to the server, which control message leases. It is trying to batch these requests up for efficiency reasons. These requests should be pretty small either way relative to message size.

I believe the amount of memory consumed by messages is going to be ~min(max_bytes, max_messages*sizeof(messages). The key method is the _on_response method. It receives messages as they come in from the stream. If we aren't above MAX_LOAD, we add it to a list of messages for which to dispatch the callback. Otherwise, we add it to a list of messages on hold that will be delivered once under the load threshold. These messages are taking up memory either way. The RESUME_THRESHOLD basically accounts for the fact that we may have overshot the limit (if we received a batch that had enough data to push us over the limit and had messages added to the hold list), meaning we have messages we can deliver without having to resume getting messages from the stream.

@TheKevJames
Copy link

max_request_batch_size and max_request_batch_latency have nothing to do with controlling the number of message received.

I'm definitely misunderstanding something, then! Would you mind pointing out the flaw in my following logic?

So at any given point, if you have more than max_messages * resume_threshold (or the bytes equivalent) leased tasks, the consumer is paused. If you have <= max_messages * resume_threshold leased tasks, then, the consumer gets resumed. At this point, you have at most max_messages * resume_threshold leased tasks and the consumer is beginning an iteration. That means we call this method, where max_items and max_latency were, respectively, max_request_batch_size and max_request_batch_latency prior to the deprecation. That method performs this operation, which would return up to max_items (ie. max_request_batch_size tasks, assuming we don't timeout according to max_request_batch_latency first. Assuming no timeouts, that method would return max_request_batch_size new tasks. Does that not mean we may have max_messages * resume_threshold + max_request_batch_size tasks in memory at peak?

@kamalaboulhosn
Copy link

This method is in QueueCallbackWorker. The only place that creates a QueueCallbackWorker is Dispatcher. The Dispatcher passes as the method to be called on the queue dispatch_callback. This method only deals with sending acks/nacks/modacks.

The queue in the instance of the Dispatcher is set to be the one in the scheduler. This queue passed into every constructed message. A message populates this queue with acks, nacks, and modacks. None of this affects the messages that are actually fetched from the server.

@TheKevJames
Copy link

Gotcha, that makes sense. Thanks, that helps tremendously!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. cla: yes This human has signed the Contributor License Agreement. type: cleanup An internal cleanup or hygiene concern. type: docs Improvement to the documentation for an API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants