Rewrite stream manager back presssure algorithm #1785

congwang · 2017-03-31T03:56:37Z

The current back pressure algorithm we use in stream manager is problematic in several ways:

It simply throttles all spouts in the whole topology when congestion happens in even just one container. This unfairly penalizes irrelevant paths in the topology;
It does not do anything to these bolts in the middle even when they generate tuples too hence contribute to the congestion;
In an entire topology at one time there is only one instance could trigger back pressure, we don't allow multiple instances to trigger back pressure again because all spouts are already throttled so we don't need to throttle them again.

We could improve it by:

Stream managers need to learn all the paths in the topology and penalize all the bolts and spouts upstream along its paths, because they all could contribute to the congestion;
For congestions on the data path between two stream managers, there is no way to define "upstream" by the design of Heron, but we could add some counters to count packets from which instances contribute most to the congestion and do back pressure on behalf of these instances;
Back pressure should be able to trigger multiple times in a whole topology to reflect the fact that multiple instances on different paths of the topology could have congestions independently. Therefore, we need a real "reference counting" to make it work correctly.

This resolves #1567.

srkukarni · 2017-03-31T18:55:01Z

heron/stmgr/src/cpp/manager/stmgr-clientmgr.h

@@ -76,6 +76,9 @@ class StMgrClientMgr {

  sp_int64 high_watermark_;
  sp_int64 low_watermark_;
+
+  // Counters for remote instance traffic, this is used for back pressure


can we add more description here as to whats the key, value means.

srkukarni · 2017-03-31T18:57:12Z

heron/stmgr/src/cpp/manager/stmgr-clientmgr.cpp

+sp_int32 StMgrClientMgr::FindBusiestTaskOnStmgr(const sp_string& _stmgr_id) {
+  sp_int32 task_id;
+  sp_int64 max = 0;
+  for (auto iter = instance_stats_[_stmgr_id].begin();


if instance_stats_[_stmgr_id] does not exist?

If instance_stats_[_stmgr_id] does not exist that means we don't have any traffic to that stmgr, therefore impossible to trigger a back pressure.

srkukarni · 2017-03-31T19:00:02Z

heron/stmgr/src/cpp/manager/stmgr-clientmgr.cpp

 void StMgrClientMgr::SendTupleStreamMessage(sp_int32 _task_id, const sp_string& _stmgr_id,
                                            const proto::system::HeronTupleSet2& _msg) {
  auto iter = clients_.find(_stmgr_id);
  CHECK(iter != clients_.end());

+  instance_stats_[_stmgr_id][_task_id] += _msg.GetCachedSize();


where are you clearing this? Shouldn;t this be cleared on a regular basis?

What makes you think a += operator is clearing it?? I am totally confused...

Oops, replied too fast. Good point, we need to clear it after a backpressure is triggered.

billonahill · 2017-03-31T19:58:38Z

The current approach has limitations, but it also has simplicity and transparency. It's easy for someone to rationalize that one agent invoked back pressure and all the spouts stopped. With this new approach, understanding the system becomes a more complex problem. @congwang can you discuss what metrics, logging, etc that should be (or are being) included to help someone troubleshooting to understand the current state of the system and how it got there? For example why some bolts might be emitting and not others, or visualizations around the reference counting mechanism, or which agents are contributing to the backpressure and in what order.

Also, the shortcomings and the features of the new approach are well described above, but could you expand on the specifics of the new algorithms at play, either here or in another doc or state diagram? I think that will be a critical thing to understand both when troubleshooting and when reading the code.

congwang · 2017-03-31T20:31:47Z

@billonahill The new one is not any harder to understand, we still throttle things, just that we only throttle relevant things in the path, in a smart way. The reference counting might be slightly harder to understand, but think about the cases where paths overlap, it is natural to have.

The logging is extended to include task id rather than just stmgr id, to reflect this change. But if you feel we need more logging in some places for trouble shooting purpose, I am very happy to add (actually I removed some debugging logs before sending this PR).

The metrics should be the same as before, instance_metric_map_ should include both bolts and spouts. People could figure out if a bolt is under back pressure with this metric.

I don't understand your question about bolts emitting tuples, this totally depends on their code, right?

I am not sure about the doc, perhaps need to add a page under website/content/docs/ ?

billonahill · 2017-03-31T20:57:01Z

Thanks @congwang, yes we'll certainly want docs under website/content/docs so might as well start there.

For the reference counting I understand the reason for the feature, I was questioning how the effect of the feature would be made clear to the user during backpressure. I wanted to make sure we've thought about how to help a user answer these questions (for example) when backpressure is occurring, ideally without having to sift too much through logs:

Which components are responsible for the back pressure and in what order? Should this be a metric?
Which ones have released their call for back pressure, which are remaining?
Are bolts not emitting because they're under backpressure, or because they're not receiving tuples?

I haven't yet reviewed the code, but wanted to hear your thoughts on these so I'd know what to expect. From past experience understanding backpressure hasn't always been easy with the existing implementation.

My question about the bolts not emitting was because with the new approach you would stop upstream bolts from emitting but others would continue to emit, as I understand it. I was using that as an example for something that we might need to clearly point out to our users (or not?).

congwang · 2017-03-31T21:36:08Z

@billonahill I will open a separate issue on github for the doc, since my English writing skill is not good at all, I expect someone else could write it.

We do have metrics for back pressure initiater, METRIC_TIME_SPENT_BACK_PRESSURE_INIT and METRIC_TIME_SPENT_BACK_PRESSURE_AGGR too. They serve the purpose you ask for here. Unfortunately, we don't have a way to show the path.

Of course we should not throttle downstream ones, otherwise no one will consume the packets, deadlock!! ;-)

billonahill · 2017-03-31T22:22:15Z

For documenting the specifics of the algorithm, an README targeted to just Heron developers would suffice, if you're not comfortable writing the public documentation. This is a critical thing to clearly describe to make it possible to best review and maintain the code.

congwang · 2017-03-31T22:29:59Z

@billonahill I believe I already clarify the high-level overview of the new algorithm in the description of this PR. If anything not clear, please point it out, I am happy to add more. BTW, I don't think we should cover any code details in description.

huijunw · 2017-04-26T20:50:17Z

heron/stmgr/src/cpp/manager/stmgr-client.cpp

@@ -148,9 +148,6 @@ void StMgrClient::HandleHelloResponse(void*, proto::stmgr::StrMgrHelloResponse*
    Stop();
  }
  delete _response;
-  if (client_manager_->DidAnnounceBackPressure()) {


it failed the unit test test_back_pressure_stmgr_reconnect.
when the stmgr X in backpressure disconnects and reconnects, stmgr X is supposed to receive backpressure notice according to test_back_pressure_stmgr_reconnect.

follow up question:
how do you maintain the state eg. backpressure_starters_ after stmgr restarts/reconnects?

@huijunw That should be handled in HandleConnectionClose().

@huijunw I already noticed the test case failure, will fix it.

huijunw · 2017-04-26T21:40:00Z

i have question on point 2.
is it possible the guess of 'upstream of two stmgr' wrong? what happens if the guess wrong?
if FindBusiestTaskOnStmgr guess wrong, then GetUpstreamInstances returns different taskid, then the backpressure routes to a different connection?

congwang · 2017-05-01T19:58:55Z

@huijunw It is not a guess, the topology is in pplan, it is 100% accurate. FindBusiestTaskOnStmgr() is the best effort we can do to find out which one to blame, but we can't predicate the future, if we blame a wrong one with this, the next time when BP is triggered again we could catch up.

Rewrite stream manager back presssure algorithm

74d493c

congwang requested review from billonahill, kramasamy and maosongfu March 31, 2017 03:56

maosongfu requested a review from huijunw March 31, 2017 04:34

srkukarni suggested changes Mar 31, 2017

View reviewed changes

objmagic added this to the 0.15.1 or later milestone Mar 31, 2017

Address review comments from Sanjeev

9c0b3c7

huijunw suggested changes Apr 26, 2017

View reviewed changes

congwang mentioned this pull request Jun 27, 2017

Add a configurable memory cap for mempool #1590

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite stream manager back presssure algorithm #1785

Rewrite stream manager back presssure algorithm #1785

congwang commented Mar 31, 2017 •

edited

Loading

srkukarni Mar 31, 2017

congwang Mar 31, 2017

srkukarni Mar 31, 2017

congwang Mar 31, 2017

srkukarni Mar 31, 2017

congwang Mar 31, 2017

congwang Mar 31, 2017

billonahill commented Mar 31, 2017

congwang commented Mar 31, 2017

billonahill commented Mar 31, 2017

congwang commented Mar 31, 2017

billonahill commented Mar 31, 2017

congwang commented Mar 31, 2017

huijunw Apr 26, 2017

huijunw Apr 29, 2017

congwang May 1, 2017

congwang May 1, 2017

huijunw commented Apr 26, 2017 •

edited

Loading

congwang commented May 1, 2017

Rewrite stream manager back presssure algorithm #1785

Are you sure you want to change the base?

Rewrite stream manager back presssure algorithm #1785

Conversation

congwang commented Mar 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billonahill commented Mar 31, 2017

congwang commented Mar 31, 2017

billonahill commented Mar 31, 2017

congwang commented Mar 31, 2017

billonahill commented Mar 31, 2017

congwang commented Mar 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huijunw commented Apr 26, 2017 • edited Loading

congwang commented May 1, 2017

congwang commented Mar 31, 2017 •

edited

Loading

huijunw commented Apr 26, 2017 •

edited

Loading