Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added histogram latency information to Hystrix dashboard stream #3986

Merged
merged 19 commits into from
Aug 29, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/root/operations/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,7 @@ The fields are:
In Envoy, service unavailable response will cause **outlier detection** - removing a node off the
load balancer pool, but requests are not rejected as a result. Therefore, this counter is always
set to '0'.
* Latency information is currently unavailable.
* Latency information represent cumulative data from start of the clusters operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/represent/represents/
s/start/the start/

Mean latency is currently not available.


4 changes: 2 additions & 2 deletions source/common/stats/histogram_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ HistogramStatisticsImpl::HistogramStatisticsImpl(const histogram_t* histogram_pt
}

const std::vector<double>& HistogramStatisticsImpl::supportedQuantiles() const {
static const std::vector<double> supported_quantiles = {0, 0.25, 0.5, 0.75, 0.90,
0.95, 0.99, 0.999, 1};
static const std::vector<double> supported_quantiles = {0, 0.25, 0.5, 0.75, 0.90,
0.95, 0.99, 0.995, 0.999, 1};
return supported_quantiles;
}

Expand Down
62 changes: 53 additions & 9 deletions source/extensions/stat_sinks/hystrix/hystrix.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,16 @@
#include "common/http/headers.h"

#include "absl/strings/str_cat.h"
#include "absl/strings/str_split.h"

namespace Envoy {
namespace Extensions {
namespace StatSinks {
namespace Hystrix {

const uint64_t HystrixSink::DEFAULT_NUM_BUCKETS;

static const std::vector<double> hystrix_quantiles = {0, 0.25, 0.5, 0.75, 0.90,
0.95, 0.99, 0.995, 1};
ClusterStatsCache::ClusterStatsCache(const std::string& cluster_name)
: cluster_name_(cluster_name) {}

Expand All @@ -41,6 +43,14 @@ void ClusterStatsCache::printRollingWindow(absl::string_view name, RollingWindow
out_str << std::endl;
}

void ClusterStatsCache::addHistogramToStream(std::stringstream& ss) {
bool is_first = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a place where a join operation would make sense, but I can see that there is already a stylized approach to building these strings in this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to consider changing the code to use join?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, TODO is fine as well.

for (std::pair<std::string, double> element : timing_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::pair<std::string, double>&

HystrixSink::addDoubleToStream(element.first, element.second, ss, is_first);
is_first = false;
}
}

// Add new value to rolling window, in place of oldest one.
void HystrixSink::pushNewValue(RollingWindow& rolling_window, uint64_t value) {
if (rolling_window.empty()) {
Expand All @@ -65,7 +75,8 @@ uint64_t HystrixSink::getRollingValue(RollingWindow rolling_window) {
}

void HystrixSink::updateRollingWindowMap(const Upstream::ClusterInfo& cluster_info,
ClusterStatsCache& cluster_stats_cache) {
ClusterStatsCache& cluster_stats_cache,
std::unordered_map<std::string, double>& histogram) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a 'using' shortcut for this map type, declared in the HystrixSink scope, I think.

const std::string cluster_name = cluster_info.name();
Upstream::ClusterStats& cluster_stats = cluster_info.stats();
Stats::Scope& cluster_stats_scope = cluster_info.statsScope();
Expand Down Expand Up @@ -100,6 +111,8 @@ void HystrixSink::updateRollingWindowMap(const Upstream::ClusterInfo& cluster_in
uint64_t total = errors + timeouts + success + rejected;
pushNewValue(cluster_stats_cache.total_, total);

cluster_stats_cache.timing_ = histogram;

ENVOY_LOG(trace, "{}", printRollingWindows());
}

Expand All @@ -116,6 +129,11 @@ void HystrixSink::addIntToStream(absl::string_view key, uint64_t value, std::str
addInfoToStream(key, std::to_string(value), info, is_first);
}

void HystrixSink::addDoubleToStream(absl::string_view key, double value, std::stringstream& info,
bool is_first) {
addInfoToStream(key, std::to_string(value), info, is_first);
}

void HystrixSink::addInfoToStream(absl::string_view key, absl::string_view value,
std::stringstream& info, bool is_first) {
if (!is_first) {
Expand Down Expand Up @@ -159,7 +177,7 @@ void HystrixSink::addHystrixCommand(ClusterStatsCache& cluster_stats_cache,
addIntToStream("rollingCountResponsesFromCache", 0, ss);

// Envoy's "circuit breaker" has similar meaning to hystrix's isolation
// so we count upstream_rq_pending_overflow and present it as ss
// so we count upstream_rq_pending_overflow and present it as rollingCountSemaphoreRejected
addIntToStream("rollingCountSemaphoreRejected", rejected, ss);

// Hystrix's short circuit is not similar to Envoy's since it is triggered by 503 responses
Expand All @@ -172,11 +190,10 @@ void HystrixSink::addHystrixCommand(ClusterStatsCache& cluster_stats_cache,
addIntToStream("rollingCountBadRequests", 0, ss);
addIntToStream("currentConcurrentExecutionCount", 0, ss);
addIntToStream("latencyExecute_mean", 0, ss);

// TODO trabetti : add histogram information once available by PR #2932
addInfoToStream(
"latencyExecute",
"{\"0\":0,\"25\":0,\"50\":0,\"75\":0,\"90\":0,\"95\":0,\"99\":0,\"99.5\":0,\"100\":0}", ss);
ss << ", \"latencyExecute\": {";
cluster_stats_cache.addHistogramToStream(ss);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass the key into the method as a string_view as you do for the others? This would allow the method to handle the brackets, etc. You can still concatenate them directly to the underlying stringstream directly IIUC.

ss << "}";
// addInfoToStream("latencyExecute", "{" + cluster_stats_cache.printTimingHistogram() + "}", ss);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line?

addIntToStream("propertyValue_circuitBreakerRequestVolumeThreshold", 0, ss);
addIntToStream("propertyValue_circuitBreakerSleepWindowInMilliseconds", 0, ss);
addIntToStream("propertyValue_circuitBreakerErrorThresholdPercentage", 0, ss);
Expand Down Expand Up @@ -304,6 +321,32 @@ void HystrixSink::flush(Stats::Source&) {
incCounter();
std::stringstream ss;
Upstream::ClusterManager::ClusterInfoMap clusters = server_.clusterManager().clusters();

// Save a map of the relevant histograms per cluster in a convenient format.
std::unordered_map<absl::string_view, std::unordered_map<std::string, double>, StringViewHash>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a string_view as the key for a map is a little dangerous. You need to be sure that the string that's being used to generate the string_view lasts longer than that element in the map. This is mostly true here, but I think there's a small edge case where it might be possible to cause a segfault. You're grabbing a vector of histogram shared pointers, but that goes away at the end of the scope, which means it's possible for one of the histograms that's holding one of the referenced strings to be destroyed by the time you do the lookup in the map. I'd suggest just using a regular string as the key.

Also, can't we use QuantileLatencyMap here instead of typing out the entire type?

time_histograms;
for (const Stats::ParentHistogramSharedPtr histogram : server_.stats().histograms()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should be using the Stats::Source::cachedHistograms() method on the Source that's being passed in (

virtual const std::vector<ParentHistogramSharedPtr>& cachedHistograms() PURE;
) to get the histograms.

// histogram->name() on clusters of the format "cluster.cluster_name.histogram_name"
// i.e. "cluster.service1.upstream_rq_time".
const std::vector<absl::string_view> split_name = absl::StrSplit(histogram->name(), '.');
if (split_name[0] == "cluster" && split_name[2] == "upstream_rq_time") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check split_name.size()>2 first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean there could be a case where the histogram name is made up of more parts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more parts would be OK. less parts would crash. I would prefer to be locally paranoid and have an assertion rather than a SEGV if that occurs.

std::unordered_map<std::string, double>& hist_map = time_histograms[split_name[1]];
for (size_t i = 0; i < histogram->cumulativeStatistics().supportedQuantiles().size(); ++i) {
if (std::find(hystrix_quantiles.begin(), hystrix_quantiles.end(),
histogram->cumulativeStatistics().supportedQuantiles()[i]) !=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a temp ref for supportedQuartiles()? Would make this long std::find call easier to read.

hystrix_quantiles.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be handling for a case where the find() fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't it suppose to return hystrix_quantiles.end() if it fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, but is that expected? Should you handle that case?

if (histogram->cumulativeStatistics().supportedQuantiles()[i] == 0.995) {
hist_map["99.5"] = histogram->cumulativeStatistics().computedQuantiles()[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this require special-casing? add comment in code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the Hystrix formatting requirement for these strings? Could we possibly take advantage of sprintf formatting options to format all of them the same way instead of using an if statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the 99.5 is not an int. all others are integers and it didn't work when i tried sending these as double.
{"95": 12.500000, "100": 16000.000000, "99": 15300.000000, "90": 7.100000, "0": 1.000000, "99.5": 15650.000000, "25": 1.060345, "50": 2.040000, "75": 3.070833}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the requirements? Would "95.0" be okay? Or does it have to be exactly "95"? There are some float formatting options in sprintf that allow you to limit the number of decimals on floats, which might allow you to get what you want without the if statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sprintf with %g might do what you want. But are you using that? Or fmt::format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you might want to consider using integers scaled by 10.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing, you have the solution for everything 😀 it worked. I am keeping it as a double to make it more clear, does it improve a lot to scale by 10 and use integer?

} else {
hist_map[std::to_string(
int(100 * histogram->cumulativeStatistics().supportedQuantiles()[i]))] =
histogram->cumulativeStatistics().computedQuantiles()[i];
}
}
}
}
}

for (auto& cluster : clusters) {
Upstream::ClusterInfoConstSharedPtr cluster_info = cluster.second.get().info();

Expand All @@ -314,7 +357,8 @@ void HystrixSink::flush(Stats::Source&) {
}

// update rolling window with cluster stats
updateRollingWindowMap(*cluster_info, *cluster_stats_cache_ptr);
updateRollingWindowMap(*cluster_info, *cluster_stats_cache_ptr,
time_histograms[cluster_info->name()]);

// append it to stream to be sent
addClusterStatsToStream(
Expand Down
34 changes: 23 additions & 11 deletions source/extensions/stat_sinks/hystrix/hystrix.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ struct ClusterStatsCache {
void printToStream(std::stringstream& out_str);
void printRollingWindow(absl::string_view name, RollingWindow rolling_window,
std::stringstream& out_str);
void addHistogramToStream(std::stringstream& ss);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line?

std::string cluster_name_;

// Rolling windows
Expand All @@ -34,6 +36,8 @@ struct ClusterStatsCache {
RollingWindow total_;
RollingWindow timeouts_;
RollingWindow rejected_;

std::unordered_map<std::string, double> timing_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this to be cached if we're just going to regenerate the map each time?

Also, why not make this a double to double instead by storing the quantile as a double rather than a string?

Copy link
Contributor Author

@trabetti trabetti Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this to be cached if we're just going to regenerate the map each time?

I still have an open question if it worth to do any kind of calculation over the percentile values, instead of giving the values since the beginning of the cluster's operation. Otherwise, I can pass it directly without using the cache.

why not make this a double to double instead by storing the quantile as a double rather than a string?

Just to overcome the inconsistent representation - "99.5" vs. all other values which are integers (25, 50, 99, etc..), and if I pass them as doubles, dashboard do not accept. It was convenient to handle this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just mapping "99.5" to 99.5? and "25" to 25.0?

Unless there's a big efficiency issue I'd vote for doing this type coercion closer to where it's consumed. Otherwise commenting why you are doing this would be really helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mapping the string representation of the quantile to its current value from the histogram

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean in the PR; comments; I mean adding comments in the code. Document both what it's doing and also why you think this is helpful from a code-structure or performance perspective.

};

typedef std::unique_ptr<ClusterStatsCache> ClusterStatsCachePtr;
Expand Down Expand Up @@ -78,7 +82,8 @@ class HystrixSink : public Stats::Sink, public Logger::Loggable<Logger::Id::hyst
* Calculate values needed to create the stream and write into the map.
*/
void updateRollingWindowMap(const Upstream::ClusterInfo& cluster_info,
ClusterStatsCache& cluster_stats_cache);
ClusterStatsCache& cluster_stats_cache,
std::unordered_map<std::string, double>& histogram);
/**
* Clear map.
*/
Expand All @@ -94,27 +99,34 @@ class HystrixSink : public Stats::Sink, public Logger::Loggable<Logger::Id::hyst
*/
uint64_t getRollingValue(RollingWindow rolling_window);

private:
/**
* Format the given key and absl::string_view value to "key"="value", and adding to the
* Format the given key and value to "key"=value, and adding to the stringstream.
*/
static void addInfoToStream(absl::string_view key, absl::string_view value,
std::stringstream& info, bool is_first = false);

/**
* Format the given key and double value to "key"=<string of uint64_t>, and adding to the
* stringstream.
*/
void addStringToStream(absl::string_view key, absl::string_view value, std::stringstream& info,
bool is_first = false);
static void addDoubleToStream(absl::string_view key, double value, std::stringstream& info,
bool is_first);

/**
* Format the given key and uint64_t value to "key"=<string of uint64_t>, and adding to the
* Format the given key and absl::string_view value to "key"="value", and adding to the
* stringstream.
*/
void addIntToStream(absl::string_view key, uint64_t value, std::stringstream& info,
bool is_first = false);
static void addStringToStream(absl::string_view key, absl::string_view value,
std::stringstream& info, bool is_first = false);

/**
* Format the given key and value to "key"=value, and adding to the stringstream.
* Format the given key and uint64_t value to "key"=<string of uint64_t>, and adding to the
* stringstream.
*/
void addInfoToStream(absl::string_view key, absl::string_view value, std::stringstream& info,
bool is_first = false);
static void addIntToStream(absl::string_view key, uint64_t value, std::stringstream& info,
bool is_first = false);

private:
/**
* Generate HystrixCommand event stream.
*/
Expand Down
10 changes: 5 additions & 5 deletions test/common/stats/thread_local_store_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -610,9 +610,9 @@ TEST_F(HistogramTest, BasicHistogramSummaryValidate) {

const std::string h1_expected_summary =
"P0: 1, P25: 1.025, P50: 1.05, P75: 1.075, P90: 1.09, P95: 1.095, "
"P99: 1.099, P99.9: 1.0999, P100: 1.1";
const std::string h2_expected_summary =
"P0: 0, P25: 25, P50: 50, P75: 75, P90: 90, P95: 95, P99: 99, P99.9: 99.9, P100: 100";
"P99: 1.099, P99.5: 1.0995, P99.9: 1.0999, P100: 1.1";
const std::string h2_expected_summary = "P0: 0, P25: 25, P50: 50, P75: 75, P90: 90, P95: 95, "
"P99: 99, P99.5: 99.5, P99.9: 99.9, P100: 100";

for (size_t i = 0; i < 100; ++i) {
expectCallAndAccumulate(h2, i);
Expand All @@ -639,8 +639,8 @@ TEST_F(HistogramTest, BasicHistogramMergeSummary) {
}
EXPECT_EQ(1, validateMerge());

const std::string expected_summary =
"P0: 0, P25: 25, P50: 50, P75: 75, P90: 90, P95: 95, P99: 99, P99.9: 99.9, P100: 100";
const std::string expected_summary = "P0: 0, P25: 25, P50: 50, P75: 75, P90: 90, P95: 95, P99: "
"99, P99.5: 99.5, P99.9: 99.9, P100: 100";

NameHistogramMap name_histogram_map = makeHistogramMap(store_->histograms());
EXPECT_EQ(expected_summary, name_histogram_map["h1"]->cumulativeStatistics().summary());
Expand Down
14 changes: 14 additions & 0 deletions test/server/http/admin_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ TEST_P(AdminStatsTest, StatsAsJson) {
90.0,
95.0,
99.0,
99.5,
99.9,
100.0
],
Expand Down Expand Up @@ -160,6 +161,10 @@ TEST_P(AdminStatsTest, StatsAsJson) {
"interval": null,
"cumulative": 109.9
},
{
"interval": null,
"cumulative": 109.95
},
{
"interval": null,
"cumulative": 109.99
Expand Down Expand Up @@ -201,6 +206,10 @@ TEST_P(AdminStatsTest, StatsAsJson) {
"interval": 109.9,
"cumulative": 209.8
},
{
"interval": 109.95,
"cumulative": 209.9
},
{
"interval": 109.99,
"cumulative": 209.98
Expand Down Expand Up @@ -262,6 +271,7 @@ TEST_P(AdminStatsTest, UsedOnlyStatsAsJson) {
90.0,
95.0,
99.0,
99.5,
99.9,
100.0
],
Expand Down Expand Up @@ -297,6 +307,10 @@ TEST_P(AdminStatsTest, UsedOnlyStatsAsJson) {
"interval": 109.9,
"cumulative": 209.8
},
{
"interval": 109.95,
"cumulative": 209.9
},
{
"interval": 109.99,
"cumulative": 209.98
Expand Down