Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added histogram latency information to Hystrix dashboard stream #3986

Merged
merged 19 commits into from
Aug 29, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/root/operations/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,7 @@ The fields are:
In Envoy, service unavailable response will cause **outlier detection** - removing a node off the
load balancer pool, but requests are not rejected as a result. Therefore, this counter is always
set to '0'.
* Latency information is currently unavailable.
* Latency information represent cumulative data from start of the clusters operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/represent/represents/
s/start/the start/

Mean latency is currently not available.


57 changes: 49 additions & 8 deletions source/extensions/stat_sinks/hystrix/hystrix.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "common/http/headers.h"

#include "absl/strings/str_cat.h"
#include "absl/strings/str_split.h"

namespace Envoy {
namespace Extensions {
Expand Down Expand Up @@ -41,6 +42,16 @@ void ClusterStatsCache::printRollingWindow(absl::string_view name, RollingWindow
out_str << std::endl;
}

std::string ClusterStatsCache::printTimingHistogram() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this addHistogramToStream and let it take in a stringstream?

std::stringstream out_str;
bool is_first = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a place where a join operation would make sense, but I can see that there is already a stylized approach to building these strings in this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to consider changing the code to use join?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, TODO is fine as well.

for (std::pair<std::string, double> element : timing_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::pair<std::string, double>&

HystrixSink::addDoubleToStream(element.first, element.second, out_str, is_first);
is_first = false;
}
return out_str.str();
}

// Add new value to rolling window, in place of oldest one.
void HystrixSink::pushNewValue(RollingWindow& rolling_window, uint64_t value) {
if (rolling_window.empty()) {
Expand All @@ -65,7 +76,8 @@ uint64_t HystrixSink::getRollingValue(RollingWindow rolling_window) {
}

void HystrixSink::updateRollingWindowMap(const Upstream::ClusterInfo& cluster_info,
ClusterStatsCache& cluster_stats_cache) {
ClusterStatsCache& cluster_stats_cache,
std::unordered_map<std::string, double>& histogram) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a 'using' shortcut for this map type, declared in the HystrixSink scope, I think.

const std::string cluster_name = cluster_info.name();
Upstream::ClusterStats& cluster_stats = cluster_info.stats();
Stats::Scope& cluster_stats_scope = cluster_info.statsScope();
Expand Down Expand Up @@ -100,6 +112,8 @@ void HystrixSink::updateRollingWindowMap(const Upstream::ClusterInfo& cluster_in
uint64_t total = errors + timeouts + success + rejected;
pushNewValue(cluster_stats_cache.total_, total);

cluster_stats_cache.timing_ = histogram;

ENVOY_LOG(trace, "{}", printRollingWindows());
}

Expand All @@ -116,6 +130,11 @@ void HystrixSink::addIntToStream(absl::string_view key, uint64_t value, std::str
addInfoToStream(key, std::to_string(value), info, is_first);
}

void HystrixSink::addDoubleToStream(absl::string_view key, double value, std::stringstream& info,
bool is_first) {
addInfoToStream(key, std::to_string(value), info, is_first);
}

void HystrixSink::addInfoToStream(absl::string_view key, absl::string_view value,
std::stringstream& info, bool is_first) {
if (!is_first) {
Expand Down Expand Up @@ -159,7 +178,7 @@ void HystrixSink::addHystrixCommand(ClusterStatsCache& cluster_stats_cache,
addIntToStream("rollingCountResponsesFromCache", 0, ss);

// Envoy's "circuit breaker" has similar meaning to hystrix's isolation
// so we count upstream_rq_pending_overflow and present it as ss
// so we count upstream_rq_pending_overflow and present it as rollingCountSemaphoreRejected
addIntToStream("rollingCountSemaphoreRejected", rejected, ss);

// Hystrix's short circuit is not similar to Envoy's since it is triggered by 503 responses
Expand All @@ -172,11 +191,7 @@ void HystrixSink::addHystrixCommand(ClusterStatsCache& cluster_stats_cache,
addIntToStream("rollingCountBadRequests", 0, ss);
addIntToStream("currentConcurrentExecutionCount", 0, ss);
addIntToStream("latencyExecute_mean", 0, ss);

// TODO trabetti : add histogram information once available by PR #2932
addInfoToStream(
"latencyExecute",
"{\"0\":0,\"25\":0,\"50\":0,\"75\":0,\"90\":0,\"95\":0,\"99\":0,\"99.5\":0,\"100\":0}", ss);
addInfoToStream("latencyExecute", "{" + cluster_stats_cache.printTimingHistogram() + "}", ss);
addIntToStream("propertyValue_circuitBreakerRequestVolumeThreshold", 0, ss);
addIntToStream("propertyValue_circuitBreakerSleepWindowInMilliseconds", 0, ss);
addIntToStream("propertyValue_circuitBreakerErrorThresholdPercentage", 0, ss);
Expand Down Expand Up @@ -304,6 +319,31 @@ void HystrixSink::flush(Stats::Source&) {
incCounter();
std::stringstream ss;
Upstream::ClusterManager::ClusterInfoMap clusters = server_.clusterManager().clusters();

// Save a map of the relevant histograms per cluster in a convenient format.
std::unordered_map<std::string, std::unordered_map<std::string, double>> time_histograms;
for (const Stats::ParentHistogramSharedPtr histogram : server_.stats().histograms()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should be using the Stats::Source::cachedHistograms() method on the Source that's being passed in (

virtual const std::vector<ParentHistogramSharedPtr>& cachedHistograms() PURE;
) to get the histograms.

// histogram->name() on clusters of the format "cluster.cluster_name.histogram_name"
// i.e. "cluster.service1.upstream_rq_time".
std::vector<std::string> split_name = absl::StrSplit(histogram->name(), ".");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the return type to std::vector<absl::string_view> (weird StrSplit return type allows casting to a bunch of different types), this won't require us to do a bunch of string copies.

Also, I think it's slightly better to pass in the the delimiter as a char ('.') rather than a string (really it's a char*) if it's only a single character because it will allow the library to choose a non-multi character overload.

if (split_name[0] == "clusters" && split_name[2] == "upstream_rq_time") {
std::unordered_map<std::string, double> hist_map;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the copy later by doing the following:

std::unordered_map<std::string, double>& hist_map = time_histograms[split_name[1]];

for (size_t i = 0; i < histogram->cumulativeStatistics().supportedQuantiles().size(); ++i) {
if (histogram->cumulativeStatistics().supportedQuantiles()[i] == 0.999) {
// Envoy provide 99.9 percentile, while Hystrix shows 99.5, so there is a small
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just add that to the supported quantiles? It seems like a reasonable percentile to care about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can? I have no idea if this will effect anything else.. I'll try and see if any test fails.

// inaccuracy, which is probably ok since the dashboard is for general information.
// Alternatively, we could approximate by the average between 99 and 99.9.
hist_map["99.5"] = histogram->cumulativeStatistics().computedQuantiles()[i];
} else {
hist_map[std::to_string(
int(100 * histogram->cumulativeStatistics().supportedQuantiles()[i]))] =
histogram->cumulativeStatistics().computedQuantiles()[i];
}
}
time_histograms[split_name[1]] = hist_map;
}
}

for (auto& cluster : clusters) {
Upstream::ClusterInfoConstSharedPtr cluster_info = cluster.second.get().info();

Expand All @@ -314,7 +354,8 @@ void HystrixSink::flush(Stats::Source&) {
}

// update rolling window with cluster stats
updateRollingWindowMap(*cluster_info, *cluster_stats_cache_ptr);
updateRollingWindowMap(*cluster_info, *cluster_stats_cache_ptr,
time_histograms[cluster_info->name()]);

// append it to stream to be sent
addClusterStatsToStream(
Expand Down
34 changes: 23 additions & 11 deletions source/extensions/stat_sinks/hystrix/hystrix.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ struct ClusterStatsCache {
void printToStream(std::stringstream& out_str);
void printRollingWindow(absl::string_view name, RollingWindow rolling_window,
std::stringstream& out_str);
std::string printTimingHistogram();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line?

std::string cluster_name_;

// Rolling windows
Expand All @@ -34,6 +36,8 @@ struct ClusterStatsCache {
RollingWindow total_;
RollingWindow timeouts_;
RollingWindow rejected_;

std::unordered_map<std::string, double> timing_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this to be cached if we're just going to regenerate the map each time?

Also, why not make this a double to double instead by storing the quantile as a double rather than a string?

Copy link
Contributor Author

@trabetti trabetti Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this to be cached if we're just going to regenerate the map each time?

I still have an open question if it worth to do any kind of calculation over the percentile values, instead of giving the values since the beginning of the cluster's operation. Otherwise, I can pass it directly without using the cache.

why not make this a double to double instead by storing the quantile as a double rather than a string?

Just to overcome the inconsistent representation - "99.5" vs. all other values which are integers (25, 50, 99, etc..), and if I pass them as doubles, dashboard do not accept. It was convenient to handle this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just mapping "99.5" to 99.5? and "25" to 25.0?

Unless there's a big efficiency issue I'd vote for doing this type coercion closer to where it's consumed. Otherwise commenting why you are doing this would be really helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mapping the string representation of the quantile to its current value from the histogram

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean in the PR; comments; I mean adding comments in the code. Document both what it's doing and also why you think this is helpful from a code-structure or performance perspective.

};

typedef std::unique_ptr<ClusterStatsCache> ClusterStatsCachePtr;
Expand Down Expand Up @@ -78,7 +82,8 @@ class HystrixSink : public Stats::Sink, public Logger::Loggable<Logger::Id::hyst
* Calculate values needed to create the stream and write into the map.
*/
void updateRollingWindowMap(const Upstream::ClusterInfo& cluster_info,
ClusterStatsCache& cluster_stats_cache);
ClusterStatsCache& cluster_stats_cache,
std::unordered_map<std::string, double>& histogram);
/**
* Clear map.
*/
Expand All @@ -94,27 +99,34 @@ class HystrixSink : public Stats::Sink, public Logger::Loggable<Logger::Id::hyst
*/
uint64_t getRollingValue(RollingWindow rolling_window);

private:
/**
* Format the given key and absl::string_view value to "key"="value", and adding to the
* Format the given key and value to "key"=value, and adding to the stringstream.
*/
static void addInfoToStream(absl::string_view key, absl::string_view value,
std::stringstream& info, bool is_first = false);

/**
* Format the given key and double value to "key"=<string of uint64_t>, and adding to the
* stringstream.
*/
void addStringToStream(absl::string_view key, absl::string_view value, std::stringstream& info,
bool is_first = false);
static void addDoubleToStream(absl::string_view key, double value, std::stringstream& info,
bool is_first);

/**
* Format the given key and uint64_t value to "key"=<string of uint64_t>, and adding to the
* Format the given key and absl::string_view value to "key"="value", and adding to the
* stringstream.
*/
void addIntToStream(absl::string_view key, uint64_t value, std::stringstream& info,
bool is_first = false);
static void addStringToStream(absl::string_view key, absl::string_view value,
std::stringstream& info, bool is_first = false);

/**
* Format the given key and value to "key"=value, and adding to the stringstream.
* Format the given key and uint64_t value to "key"=<string of uint64_t>, and adding to the
* stringstream.
*/
void addInfoToStream(absl::string_view key, absl::string_view value, std::stringstream& info,
bool is_first = false);
static void addIntToStream(absl::string_view key, uint64_t value, std::stringstream& info,
bool is_first = false);

private:
/**
* Generate HystrixCommand event stream.
*/
Expand Down