Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Rewrite stream manager back presssure algorithm #1785

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

congwang
Copy link
Contributor

@congwang congwang commented Mar 31, 2017

The current back pressure algorithm we use in stream manager is problematic in several ways:

  1. It simply throttles all spouts in the whole topology when congestion happens in even just one container. This unfairly penalizes irrelevant paths in the topology;

  2. It does not do anything to these bolts in the middle even when they generate tuples too hence contribute to the congestion;

  3. In an entire topology at one time there is only one instance could trigger back pressure, we don't allow multiple instances to trigger back pressure again because all spouts are already throttled so we don't need to throttle them again.

We could improve it by:

  1. Stream managers need to learn all the paths in the topology and penalize all the bolts and spouts upstream along its paths, because they all could contribute to the congestion;

  2. For congestions on the data path between two stream managers, there is no way to define "upstream" by the design of Heron, but we could add some counters to count packets from which instances contribute most to the congestion and do back pressure on behalf of these instances;

  3. Back pressure should be able to trigger multiple times in a whole topology to reflect the fact that multiple instances on different paths of the topology could have congestions independently. Therefore, we need a real "reference counting" to make it work correctly.

This resolves #1567.

@@ -76,6 +76,9 @@ class StMgrClientMgr {

sp_int64 high_watermark_;
sp_int64 low_watermark_;

// Counters for remote instance traffic, this is used for back pressure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add more description here as to whats the key, value means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

sp_int32 StMgrClientMgr::FindBusiestTaskOnStmgr(const sp_string& _stmgr_id) {
sp_int32 task_id;
sp_int64 max = 0;
for (auto iter = instance_stats_[_stmgr_id].begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if instance_stats_[_stmgr_id] does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If instance_stats_[_stmgr_id] does not exist that means we don't have any traffic to that stmgr, therefore impossible to trigger a back pressure.

void StMgrClientMgr::SendTupleStreamMessage(sp_int32 _task_id, const sp_string& _stmgr_id,
const proto::system::HeronTupleSet2& _msg) {
auto iter = clients_.find(_stmgr_id);
CHECK(iter != clients_.end());

instance_stats_[_stmgr_id][_task_id] += _msg.GetCachedSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are you clearing this? Shouldn;t this be cleared on a regular basis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes you think a += operator is clearing it?? I am totally confused...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, replied too fast. Good point, we need to clear it after a backpressure is triggered.

@billonahill
Copy link
Contributor

The current approach has limitations, but it also has simplicity and transparency. It's easy for someone to rationalize that one agent invoked back pressure and all the spouts stopped. With this new approach, understanding the system becomes a more complex problem. @congwang can you discuss what metrics, logging, etc that should be (or are being) included to help someone troubleshooting to understand the current state of the system and how it got there? For example why some bolts might be emitting and not others, or visualizations around the reference counting mechanism, or which agents are contributing to the backpressure and in what order.

Also, the shortcomings and the features of the new approach are well described above, but could you expand on the specifics of the new algorithms at play, either here or in another doc or state diagram? I think that will be a critical thing to understand both when troubleshooting and when reading the code.

@congwang
Copy link
Contributor Author

@billonahill The new one is not any harder to understand, we still throttle things, just that we only throttle relevant things in the path, in a smart way. The reference counting might be slightly harder to understand, but think about the cases where paths overlap, it is natural to have.

The logging is extended to include task id rather than just stmgr id, to reflect this change. But if you feel we need more logging in some places for trouble shooting purpose, I am very happy to add (actually I removed some debugging logs before sending this PR).

The metrics should be the same as before, instance_metric_map_ should include both bolts and spouts. People could figure out if a bolt is under back pressure with this metric.

I don't understand your question about bolts emitting tuples, this totally depends on their code, right?

I am not sure about the doc, perhaps need to add a page under website/content/docs/ ?

@billonahill
Copy link
Contributor

Thanks @congwang, yes we'll certainly want docs under website/content/docs so might as well start there.

For the reference counting I understand the reason for the feature, I was questioning how the effect of the feature would be made clear to the user during backpressure. I wanted to make sure we've thought about how to help a user answer these questions (for example) when backpressure is occurring, ideally without having to sift too much through logs:

  • Which components are responsible for the back pressure and in what order? Should this be a metric?
  • Which ones have released their call for back pressure, which are remaining?
  • Are bolts not emitting because they're under backpressure, or because they're not receiving tuples?

I haven't yet reviewed the code, but wanted to hear your thoughts on these so I'd know what to expect. From past experience understanding backpressure hasn't always been easy with the existing implementation.

My question about the bolts not emitting was because with the new approach you would stop upstream bolts from emitting but others would continue to emit, as I understand it. I was using that as an example for something that we might need to clearly point out to our users (or not?).

@objmagic objmagic added this to the 0.15.1 or later milestone Mar 31, 2017
@congwang
Copy link
Contributor Author

@billonahill I will open a separate issue on github for the doc, since my English writing skill is not good at all, I expect someone else could write it.

We do have metrics for back pressure initiater, METRIC_TIME_SPENT_BACK_PRESSURE_INIT and METRIC_TIME_SPENT_BACK_PRESSURE_AGGR too. They serve the purpose you ask for here. Unfortunately, we don't have a way to show the path.

Of course we should not throttle downstream ones, otherwise no one will consume the packets, deadlock!! ;-)

@billonahill
Copy link
Contributor

For documenting the specifics of the algorithm, an README targeted to just Heron developers would suffice, if you're not comfortable writing the public documentation. This is a critical thing to clearly describe to make it possible to best review and maintain the code.

@congwang
Copy link
Contributor Author

@billonahill I believe I already clarify the high-level overview of the new algorithm in the description of this PR. If anything not clear, please point it out, I am happy to add more. BTW, I don't think we should cover any code details in description.

@@ -148,9 +148,6 @@ void StMgrClient::HandleHelloResponse(void*, proto::stmgr::StrMgrHelloResponse*
Stop();
}
delete _response;
if (client_manager_->DidAnnounceBackPressure()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it failed the unit test test_back_pressure_stmgr_reconnect.
when the stmgr X in backpressure disconnects and reconnects, stmgr X is supposed to receive backpressure notice according to test_back_pressure_stmgr_reconnect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up question:
how do you maintain the state eg. backpressure_starters_ after stmgr restarts/reconnects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huijunw That should be handled in HandleConnectionClose().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huijunw I already noticed the test case failure, will fix it.

@huijunw
Copy link
Contributor

huijunw commented Apr 26, 2017

i have question on point 2.
is it possible the guess of 'upstream of two stmgr' wrong? what happens if the guess wrong?
if FindBusiestTaskOnStmgr guess wrong, then GetUpstreamInstances returns different taskid, then the backpressure routes to a different connection?

@congwang
Copy link
Contributor Author

congwang commented May 1, 2017

@huijunw It is not a guess, the topology is in pplan, it is 100% accurate. FindBusiestTaskOnStmgr() is the best effort we can do to find out which one to blame, but we can't predicate the future, if we blame a wrong one with this, the next time when BP is triggered again we could catch up.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increased stream manager memory usage
5 participants