-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: Reorganize tags into a map<prefix, std::vector<tag>> to avoid initiating multiple regexes that don't match. #2582
stats: Reorganize tags into a map<prefix, std::vector<tag>> to avoid initiating multiple regexes that don't match. #2582
Conversation
…ing multiple regexes that don't match. Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
There was some computation in here of prefix strings were not necessarily "."-delimeted, which could potentially used for a prefix-filter independent of the map lookup. However that's not used currently, so just collect the prefixes we are actually going to use for the map. Signed-off-by: Joshua Marantz <[email protected]>
…void asan issues in test. Signed-off-by: Joshua Marantz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give this one a first pass as maintainer, but @mrice32 might also want to take a look since he reviewed your last change related to the regex improvements.
Assigning to @alyssawilk for final senior maintainer review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, 3x speed up - Fantastic! First review pass below.
Also please keep Release Notes in the PR description even if you think it's n/a. I suspect your entire set of PRs merits a release note but I'm fine if you land the release note in the final PR but having the field reminds us to check in :-)
include/envoy/stats/stats.h
Outdated
* | ||
* The storage for the prefix is owned by the TagExtractor. | ||
* | ||
* @return absl::string_view returnsthe prefix, or an empty string_view if none was found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns the prefix
// Constructor to fill map. | ||
TagNameValues() : name_regex_pairs_(getRegexMapping()) {} | ||
// Returns the list of descriptors. | ||
const std::vector<Descriptor>& descriptor_vec() const { return descriptor_vec_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe Envoy does snake case for accessors, so descriptorVec()
: name_(name), prefix_(std::string(extractRegexPrefix(regex))), | ||
regex_(RegexUtil::parseRegex(regex)) {} | ||
|
||
std::string TagExtractorImpl::extractRegexPrefix(absl::string_view regex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could totally use some explanatory comments for those of us who haven't been living in this code for a month :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough; I put them in the header though:
/**
* Examines a regex string, looking for the pattern: ^alphanumerics_with_underscores\.
* Returns "alphanumerics_with_underscores" if that pattern is found, empty-string otherwise.
* @param regex absl::string_view the regex to scan for prefixes.
* @return std::string the prefix, or "" if no prefix found.
*/
throw EnvoyException(fmt::format( | ||
"No regex specified for tag specifier and no default regex for name: '{}'", name)); | ||
} | ||
if (regex.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you've moved the default lookup to the function calling createTagExtrator? I believe you should be updating the comments in the .h file accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this is substantial enough refactor I'd be inclined to call it Medium but your call :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, and added comments for all new functions (and some old ones).
source/common/stats/stats_impl.h
Outdated
@@ -60,12 +64,27 @@ class TagProducerImpl : public TagProducer { | |||
*/ | |||
std::string produceTags(const std::string& metric_name, std::vector<Tag>& tags) const override; | |||
|
|||
void addExtractor(TagExtractorPtr extractor); | |||
void addExtractorsMatching(absl::string_view name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment up all new functions.
Existing Envoy code has a pretty low bar for comments but I'd like to raise it :-)
It also makes things easier on your reviewers - I'll do one more pass with comments added as it'll be faster than you waiting on me reading and grokking all of the tag code :-)
…changed in this PR. Signed-off-by: Joshua Marantz <[email protected]>
…than awkward reverse iteration. Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I just have a bunch of nits. This is cool stuff!
include/envoy/stats/stats.h
Outdated
namespace absl { | ||
class string_view; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this instead of (presumably) including the string_view header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dependency management in bazel, mainly. Let me look at that again.
|
||
return name_regex_pairs; | ||
void TagNameValues::addRegex(const std::string& name, const std::string& regex) { | ||
// Descriptor& descriptor = descriptor_map_[name]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a vestigial comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Sorry about all these. Internally I usually don't put the space after the "//" and depend on the linter to find these for me so I can clean them before mailing a review. But I've been just having envoy's formatter fix my code and then it just puts the space back in. I need a new pattern for things to clean up before review :)
descriptor_vec_.emplace_back(); | ||
Descriptor& descriptor = descriptor_vec_.back(); | ||
descriptor.name = name; | ||
descriptor.regex = regex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Descriptor
had a constructor, I think you can eliminate a bunch of boilerplate here and also allow the members of Descriptor
to be const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// Mapping from the names above to their respective regex strings. | ||
// std::unordered_map<std::string, Descriptor> descriptor_map_; | ||
// std::vector<const Descriptor* /* owned by descriptor_map_ */> descriptor_vec_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More vestigial comments? Or hints of what's to come?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, just vestigial.
source/common/stats/stats_impl.cc
Outdated
} | ||
|
||
void TagProducerImpl::addExtractor(TagExtractorPtr extractor) { | ||
absl::string_view prefix = extractor->prefixToken(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const absl::string_view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
source/common/stats/stats_impl.cc
Outdated
@@ -4,13 +4,17 @@ | |||
|
|||
#include <algorithm> | |||
#include <chrono> | |||
#include <list> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used. deleted.
test/common/stats/stats_impl_test.cc
Outdated
std::vector<TagExtractorPtr> tag_extractors_; | ||
/** | ||
* Reimplements TagProducerImpl::produceTags, but extracts the tags in reverse order. | ||
* His helps demonstrate that the order extractors does not matter to the end result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: His -> This, and "order of extractors"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/common/stats/stats_impl_test.cc
Outdated
* His helps demonstrate that the order extractors does not matter to the end result, | ||
* assuming we don't care about tag-order. This is in large part correct by design because | ||
* stat_name is not mutated until all the extraction is done. | ||
* @param metric_name std::string a name of Stats::Metric (Counter, Gauge, Histogram). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no parameter called metric_name
, did you mean stat_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the var to metric_name for consistency with the interface in Stats.
test/common/stats/stats_impl_test.cc
Outdated
* stat_name is not mutated until all the extraction is done. | ||
* @param metric_name std::string a name of Stats::Metric (Counter, Gauge, Histogram). | ||
* @param tags std::vector a set of Stats::Tag. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Add a @return
doc string. Not normally required for tests, but since the input params are well documented might as well do it for return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
source/common/stats/stats_impl.cc
Outdated
for (absl::string_view::size_type i = 1; i < regex.size(); ++i) { | ||
if (!absl::ascii_isalnum(regex[i]) && (regex[i] != '_')) { | ||
if (i > 1) { | ||
bool last_char = i == regex.size() - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Joshua Marantz <[email protected]>
Back to you, @alyssawilk :) |
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Please merge master to pick up #2613, this should fix CI TSAN. |
Signed-off-by: Joshua Marantz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Will probably do a final pass tomorrow just to make sure I didn't miss anything, but awesome work!
addDefaultExtractors(const envoy::config::metrics::v2::StatsConfig& config); | ||
|
||
/** | ||
* Iterates over every tag extractor that might possibly match stat_name, calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "that might possibly match stat_name" seems a little unspecific. Do you think it might make sense to be specific about what exact criterion this method checks against (prefix for those that have it) to determine the set over which the function is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm coy about this implementation detail in the interface doc because I am weighing whether it makes sense to also scan for substrings. In particular, the rq_xxx paterns are slow, and can be eliminated with a quick scan for "rq" on the input string, so it may not be prefixes in the future. I want to do some more targeted tests for that after more of the low hanging fruit is plucked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, since the primary goal of the method is to select on certain criteria, I think that should be explicitly laid out in the documentation since the only other way to determine it is to check the implementation. Could we maybe create a bullet list of the criteria here, and for now it'll just show the prefix, and later we can add to that if things change? Also, I thought only public methods required doxygen documentation in Envoy. Has this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE show what actually happens in the current impl, and update when that changes: SGTM. Coming.
RE doxygen: I would like to know what the actual post-processing flow is. I think there's a bunch of redundancy in the doxygen comments. In the past when I've used doxygen for C++ I've done it somewhat differently, commenting the vars in the formal-list, so you don't have to repeat the variable name or the type. So right now I'm just trying to be consistent but I wonder what the overall benefit is.
However @alyssawilk asked specifically for more function-doc here and in general in Envoy and I agree with that. It helps the reader to more easily follow unfamiliar code so I'm all over that. I wouldn't mind being somewhat terser though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the history around Doxygen usage in Envoy can be summed up as the fact that Doxygen style was adopted as a convention, but rather than actually running Doxygen on the source Envoy development has historically involved just human reading the Doxygen comments in the header files. So the quirks in usage of Doxygen style can generally be traced to this somewhat docs usage pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more nits.
source/common/stats/stats_impl.cc
Outdated
for (const auto& desc : Config::TagNames::get().descriptorVec()) { | ||
if (desc.name == name) { | ||
addExtractor(Stats::TagExtractorImpl::createTagExtractor(desc.name, desc.regex)); | ||
++num_found; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is this expected to never go above 1? If so, should we toss in an ASSERT
at the end of this method? (Or if we're really confident, we could save some time and just break when found :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is expected to go above 1. That's one of the anticipated changes I have in the wings; my theory (which I need to isolate a test for) is that it's faster to do two different regexes when there are two different ways to generate a tag. I think some of the catastrophic backtracking was due to too many alternatives in one regex, but I haven't totally wrapped my arms around that yet as I was focused on trying to avoid regexes completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - that makes sense. Since this is a future feature, can we toss an assert in for now (since it's only a one-liner), and remove if the implementation changes just to keep this PR self-contained?
source/common/stats/stats_impl.cc
Outdated
std::vector<Tag>& tags) const { | ||
tags.insert(tags.end(), default_tags_.begin(), default_tags_.end()); | ||
void TagProducerImpl::addExtractorsMatching(absl::string_view name) { | ||
int num_found = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer fixed-size primitives (int64_t
for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading of https://google.github.io/styleguide/cppguide.html#Integer_Types is that small integers can be int, though I really don't care that much. I don't see an override for that in https://github.com/envoyproxy/envoy/blob/master/STYLE.md -- did I miss it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, you're right. I just think it's preferable to use the explicitly sized integers everywhere (especially in non-test code) rather than making a somewhat subjective exception for small values. Not a big deal, and if you want to leave as is, that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about this at all, but a recursive-grep through envoy source shows lots of existing 'int' usage. Some of them are related to system-calls for that type, but there's plenty of others that look like they are counting small things. So I'm inclined to keep this as 'int'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
source/common/stats/stats_impl.cc
Outdated
} | ||
if (num_found == 0) { | ||
throw EnvoyException(fmt::format( | ||
"No regex specified for tag specifier and no default regex for name: '{}'", name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not sure how opinionated others are on this, but I generally prefer checking context-specific errors like this in the caller since this method has nothing to do with checking for the presence of a config-specified regex. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough I can just return num_found.
const absl::string_view token = absl::string_view(stat_name.data(), dot); | ||
const auto iter = tag_extractor_prefix_map_.find(token); | ||
if (iter != tag_extractor_prefix_map_.end()) { | ||
for (const TagExtractorPtr& tag_extractor : iter->second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer const auto &
in range-based loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the fact that iter->second ranges over TagExtractorPtr is not locally obvious and it's more readable to be explicit. Moreover https://google.github.io/styleguide/cppguide.html#auto talks about when 'auto' being acceptable when the actual type name is cumbersome or obvious, and I think neither is the case here.
I don't see an override of this policy in https://github.com/envoyproxy/envoy/blob/master/STYLE.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. I didn't notice that the type of the vector wasn't spelled out nearby (as denoted in one of the allowed cases in the style guide).
source/common/stats/stats_impl.cc
Outdated
tags.insert(tags.end(), default_tags_.begin(), default_tags_.end()); | ||
IntervalSetImpl<size_t> remove_characters; | ||
forEachExtractorMatching( | ||
metric_name, [&remove_characters, &tags, metric_name](const TagExtractorPtr& tag_extractor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not &metric_name
in the capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question...I was thinking it made sense to capture the reference only if it was going to be modified in the lambda body.
The first example in https://google.github.io/styleguide/cppguide.html#Lambda_expressions supports my pattern but doesn't explicitly require it. I don't see any overriding commentary in https://github.com/envoyproxy/envoy/blob/master/STYLE.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, since metric_name
is a const ref already, I think the only difference would be whether you make a copy or not, since the lambda should retain the constness of the ref. IIUC, this is a similar tradeoff to passing a string by value or by const ref in a function if the function is not going to modify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, there should be no performance difference here. I felt like prefixing a variable with "&" was a way for me to tell the reader "I'm going to be modifying this state in the lambda body". Since I'm not planning on modifying it, I felt like I should omit the &.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't think I worded that very well. I think there is a performance difference here. IIUC, without the &
, you are making a local copy of the string for the lambda's use, which is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL: const refs get value-ized by C++ in a lambda capture. Right you are. Switched to using & in the capture for metric_name.
…f 0. Also renames structure member variables in Descriptor in well_known_names.cc to end with _ per Envoy style guide (overriding the Google style guide which states that structs don't suffix member variables with _). Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
…was a const ref to begin with. Signed-off-by: Joshua Marantz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, defer to @alyssawilk for senior maintainer review and merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great - now both fastrer and easier to understand :-) Only two little nits on my side.
source/common/stats/stats_impl.cc
Outdated
// I anticipate possibly changing the default tag regexes so that more than one regex can | ||
// yield the same tag, on the theory that this will reduce regex backtracking. But at the | ||
// moment, this doesn't happen, so the flow isn't well tested this. When we start exploiting | ||
// this, and it's tested, we can simply remove this assert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a TODO ? If not I'd suggest rewriting as a statement of what is, since speculation of what will happen may be more confusing than not.
Also
the flow isn't well tested this -> this flow isn't well tested?
} | ||
if (regex.empty()) { | ||
throw EnvoyException(fmt::format( | ||
"No regex specified for tag specifier and no default regex for name: '{}'", name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you just moved this, but as long as you're in here mind adding a unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done & done
…pty regex. Signed-off-by: Joshua Marantz <[email protected]>
…o avoid initiating multiple regexes that don't match. (envoyproxy#2582)" This reverts commit 9c728a0.
Thanks for all the reviewing!
…On Thu, Feb 15, 2018 at 4:31 PM, alyssawilk ***@***.***> wrote:
Merged #2582 <#2582>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2582 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB2kPXoPL7Uby9L1NMSs9BL4dilOMppYks5tVKIXgaJpZM4SBdks>
.
|
Description:
Improves startup speed from about 18 seconds to 7 seconds on the 10k example from #2063.
This is step 5 in the the plan to improve startup performance, and the first one (in time-order) to actually improve speed. This also removes a previously existing restriction that a tag can only be computed from one regex; now a tag could be computed by multiple different regexes, although no such change has occurred in this PR.
Note that the code which collects the regexes has been refactored to be more functional, and thus easier to change in the an upcoming PR, but the regexes have not been changed at all.
Risk Level: Medium -- this is startup critical but I think it's functionally bounded and well tested.
Testing:
//test/...
manually hacked main_common to exit(0) before loop and timed server startup with 10k clusters.
Release Notes:
None yet, will add release notes when the chain of PRs is done.