Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: Reorganize tags into a map<prefix, std::vector<tag>> to avoid initiating multiple regexes that don't match. #2582

Merged
merged 17 commits into from
Feb 15, 2018

Conversation

jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Feb 11, 2018

Description:
Improves startup speed from about 18 seconds to 7 seconds on the 10k example from #2063.

This is step 5 in the the plan to improve startup performance, and the first one (in time-order) to actually improve speed. This also removes a previously existing restriction that a tag can only be computed from one regex; now a tag could be computed by multiple different regexes, although no such change has occurred in this PR.

Note that the code which collects the regexes has been refactored to be more functional, and thus easier to change in the an upcoming PR, but the regexes have not been changed at all.

Risk Level: Medium -- this is startup critical but I think it's functionally bounded and well tested.

Testing:
//test/...
manually hacked main_common to exit(0) before loop and timed server startup with 10k clusters.

Release Notes:
None yet, will add release notes when the chain of PRs is done.

…ing multiple regexes that don't match.

Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
There was some computation in here of prefix strings were not
necessarily "."-delimeted, which could potentially used for a
prefix-filter independent of the map lookup.  However that's not used
currently, so just collect the prefixes we are actually going to use
for the map.

Signed-off-by: Joshua Marantz <[email protected]>
…void asan issues in test.

Signed-off-by: Joshua Marantz <[email protected]>
@dnoe dnoe requested a review from mrice32 February 12, 2018 15:20
@dnoe dnoe self-assigned this Feb 12, 2018
Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this one a first pass as maintainer, but @mrice32 might also want to take a look since he reviewed your last change related to the regex improvements.

@htuch
Copy link
Member

htuch commented Feb 13, 2018

Assigning to @alyssawilk for final senior maintainer review.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, 3x speed up - Fantastic! First review pass below.

Also please keep Release Notes in the PR description even if you think it's n/a. I suspect your entire set of PRs merits a release note but I'm fine if you land the release note in the final PR but having the field reminds us to check in :-)

*
* The storage for the prefix is owned by the TagExtractor.
*
* @return absl::string_view returnsthe prefix, or an empty string_view if none was found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns the prefix

// Constructor to fill map.
TagNameValues() : name_regex_pairs_(getRegexMapping()) {}
// Returns the list of descriptors.
const std::vector<Descriptor>& descriptor_vec() const { return descriptor_vec_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe Envoy does snake case for accessors, so descriptorVec()

: name_(name), prefix_(std::string(extractRegexPrefix(regex))),
regex_(RegexUtil::parseRegex(regex)) {}

std::string TagExtractorImpl::extractRegexPrefix(absl::string_view regex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could totally use some explanatory comments for those of us who haven't been living in this code for a month :-)

Copy link
Contributor Author

@jmarantz jmarantz Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough; I put them in the header though:

  /**
   * Examines a regex string, looking for the pattern: ^alphanumerics_with_underscores\.
   * Returns "alphanumerics_with_underscores" if that pattern is found, empty-string otherwise.
   * @param regex absl::string_view the regex to scan for prefixes.
   * @return std::string the prefix, or "" if no prefix found.
   */

throw EnvoyException(fmt::format(
"No regex specified for tag specifier and no default regex for name: '{}'", name));
}
if (regex.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you've moved the default lookup to the function calling createTagExtrator? I believe you should be updating the comments in the .h file accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is substantial enough refactor I'd be inclined to call it Medium but your call :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, and added comments for all new functions (and some old ones).

@@ -60,12 +64,27 @@ class TagProducerImpl : public TagProducer {
*/
std::string produceTags(const std::string& metric_name, std::vector<Tag>& tags) const override;

void addExtractor(TagExtractorPtr extractor);
void addExtractorsMatching(absl::string_view name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment up all new functions.

Existing Envoy code has a pretty low bar for comments but I'd like to raise it :-)
It also makes things easier on your reviewers - I'll do one more pass with comments added as it'll be faster than you waiting on me reading and grokking all of the tag code :-)

Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I just have a bunch of nits. This is cool stuff!

namespace absl {
class string_view;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this instead of (presumably) including the string_view header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency management in bazel, mainly. Let me look at that again.


return name_regex_pairs;
void TagNameValues::addRegex(const std::string& name, const std::string& regex) {
// Descriptor& descriptor = descriptor_map_[name];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a vestigial comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Sorry about all these. Internally I usually don't put the space after the "//" and depend on the linter to find these for me so I can clean them before mailing a review. But I've been just having envoy's formatter fix my code and then it just puts the space back in. I need a new pattern for things to clean up before review :)

descriptor_vec_.emplace_back();
Descriptor& descriptor = descriptor_vec_.back();
descriptor.name = name;
descriptor.regex = regex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Descriptor had a constructor, I think you can eliminate a bunch of boilerplate here and also allow the members of Descriptor to be const.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Mapping from the names above to their respective regex strings.
// std::unordered_map<std::string, Descriptor> descriptor_map_;
// std::vector<const Descriptor* /* owned by descriptor_map_ */> descriptor_vec_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More vestigial comments? Or hints of what's to come?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, just vestigial.

}

void TagProducerImpl::addExtractor(TagExtractorPtr extractor) {
absl::string_view prefix = extractor->prefixToken();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const absl::string_view

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -4,13 +4,17 @@

#include <algorithm>
#include <chrono>
#include <list>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used. deleted.

std::vector<TagExtractorPtr> tag_extractors_;
/**
* Reimplements TagProducerImpl::produceTags, but extracts the tags in reverse order.
* His helps demonstrate that the order extractors does not matter to the end result,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: His -> This, and "order of extractors"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* His helps demonstrate that the order extractors does not matter to the end result,
* assuming we don't care about tag-order. This is in large part correct by design because
* stat_name is not mutated until all the extraction is done.
* @param metric_name std::string a name of Stats::Metric (Counter, Gauge, Histogram).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no parameter called metric_name, did you mean stat_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the var to metric_name for consistency with the interface in Stats.

* stat_name is not mutated until all the extraction is done.
* @param metric_name std::string a name of Stats::Metric (Counter, Gauge, Histogram).
* @param tags std::vector a set of Stats::Tag.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add a @return doc string. Not normally required for tests, but since the input params are well documented might as well do it for return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (absl::string_view::size_type i = 1; i < regex.size(); ++i) {
if (!absl::ascii_isalnum(regex[i]) && (regex[i] != '_')) {
if (i > 1) {
bool last_char = i == regex.size() - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dnoe
dnoe previously approved these changes Feb 14, 2018
@dnoe
Copy link
Contributor

dnoe commented Feb 14, 2018

Back to you, @alyssawilk :)

@htuch
Copy link
Member

htuch commented Feb 15, 2018

Please merge master to pick up #2613, this should fix CI TSAN.

Copy link
Member

@mrice32 mrice32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will probably do a final pass tomorrow just to make sure I didn't miss anything, but awesome work!

addDefaultExtractors(const envoy::config::metrics::v2::StatsConfig& config);

/**
* Iterates over every tag extractor that might possibly match stat_name, calling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "that might possibly match stat_name" seems a little unspecific. Do you think it might make sense to be specific about what exact criterion this method checks against (prefix for those that have it) to determine the set over which the function is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coy about this implementation detail in the interface doc because I am weighing whether it makes sense to also scan for substrings. In particular, the rq_xxx paterns are slow, and can be eliminated with a quick scan for "rq" on the input string, so it may not be prefixes in the future. I want to do some more targeted tests for that after more of the low hanging fruit is plucked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, since the primary goal of the method is to select on certain criteria, I think that should be explicitly laid out in the documentation since the only other way to determine it is to check the implementation. Could we maybe create a bullet list of the criteria here, and for now it'll just show the prefix, and later we can add to that if things change? Also, I thought only public methods required doxygen documentation in Envoy. Has this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE show what actually happens in the current impl, and update when that changes: SGTM. Coming.

RE doxygen: I would like to know what the actual post-processing flow is. I think there's a bunch of redundancy in the doxygen comments. In the past when I've used doxygen for C++ I've done it somewhat differently, commenting the vars in the formal-list, so you don't have to repeat the variable name or the type. So right now I'm just trying to be consistent but I wonder what the overall benefit is.

However @alyssawilk asked specifically for more function-doc here and in general in Envoy and I agree with that. It helps the reader to more easily follow unfamiliar code so I'm all over that. I wouldn't mind being somewhat terser though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the history around Doxygen usage in Envoy can be summed up as the fact that Doxygen style was adopted as a convention, but rather than actually running Doxygen on the source Envoy development has historically involved just human reading the Doxygen comments in the header files. So the quirks in usage of Doxygen style can generally be traced to this somewhat docs usage pattern.

Copy link
Member

@mrice32 mrice32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nits.

for (const auto& desc : Config::TagNames::get().descriptorVec()) {
if (desc.name == name) {
addExtractor(Stats::TagExtractorImpl::createTagExtractor(desc.name, desc.regex));
++num_found;
Copy link
Member

@mrice32 mrice32 Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this expected to never go above 1? If so, should we toss in an ASSERT at the end of this method? (Or if we're really confident, we could save some time and just break when found :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expected to go above 1. That's one of the anticipated changes I have in the wings; my theory (which I need to isolate a test for) is that it's faster to do two different regexes when there are two different ways to generate a tag. I think some of the catastrophic backtracking was due to too many alternatives in one regex, but I haven't totally wrapped my arms around that yet as I was focused on trying to avoid regexes completely.

Copy link
Member

@mrice32 mrice32 Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - that makes sense. Since this is a future feature, can we toss an assert in for now (since it's only a one-liner), and remove if the implementation changes just to keep this PR self-contained?

std::vector<Tag>& tags) const {
tags.insert(tags.end(), default_tags_.begin(), default_tags_.end());
void TagProducerImpl::addExtractorsMatching(absl::string_view name) {
int num_found = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer fixed-size primitives (int64_t for example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of https://google.github.io/styleguide/cppguide.html#Integer_Types is that small integers can be int, though I really don't care that much. I don't see an override for that in https://github.com/envoyproxy/envoy/blob/master/STYLE.md -- did I miss it?

Copy link
Member

@mrice32 mrice32 Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, you're right. I just think it's preferable to use the explicitly sized integers everywhere (especially in non-test code) rather than making a somewhat subjective exception for small values. Not a big deal, and if you want to leave as is, that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this at all, but a recursive-grep through envoy source shows lots of existing 'int' usage. Some of them are related to system-calls for that type, but there's plenty of others that look like they are counting small things. So I'm inclined to keep this as 'int'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

}
if (num_found == 0) {
throw EnvoyException(fmt::format(
"No regex specified for tag specifier and no default regex for name: '{}'", name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not sure how opinionated others are on this, but I generally prefer checking context-specific errors like this in the caller since this method has nothing to do with checking for the presence of a config-specified regex. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough I can just return num_found.

const absl::string_view token = absl::string_view(stat_name.data(), dot);
const auto iter = tag_extractor_prefix_map_.find(token);
if (iter != tag_extractor_prefix_map_.end()) {
for (const TagExtractorPtr& tag_extractor : iter->second) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer const auto & in range-based loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the fact that iter->second ranges over TagExtractorPtr is not locally obvious and it's more readable to be explicit. Moreover https://google.github.io/styleguide/cppguide.html#auto talks about when 'auto' being acceptable when the actual type name is cumbersome or obvious, and I think neither is the case here.

I don't see an override of this policy in https://github.com/envoyproxy/envoy/blob/master/STYLE.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I didn't notice that the type of the vector wasn't spelled out nearby (as denoted in one of the allowed cases in the style guide).

tags.insert(tags.end(), default_tags_.begin(), default_tags_.end());
IntervalSetImpl<size_t> remove_characters;
forEachExtractorMatching(
metric_name, [&remove_characters, &tags, metric_name](const TagExtractorPtr& tag_extractor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not &metric_name in the capture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question...I was thinking it made sense to capture the reference only if it was going to be modified in the lambda body.

The first example in https://google.github.io/styleguide/cppguide.html#Lambda_expressions supports my pattern but doesn't explicitly require it. I don't see any overriding commentary in https://github.com/envoyproxy/envoy/blob/master/STYLE.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, since metric_name is a const ref already, I think the only difference would be whether you make a copy or not, since the lambda should retain the constness of the ref. IIUC, this is a similar tradeoff to passing a string by value or by const ref in a function if the function is not going to modify it.

Copy link
Contributor Author

@jmarantz jmarantz Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, there should be no performance difference here. I felt like prefixing a variable with "&" was a way for me to tell the reader "I'm going to be modifying this state in the lambda body". Since I'm not planning on modifying it, I felt like I should omit the &.

Copy link
Member

@mrice32 mrice32 Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't think I worded that very well. I think there is a performance difference here. IIUC, without the &, you are making a local copy of the string for the lambda's use, which is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: const refs get value-ized by C++ in a lambda capture. Right you are. Switched to using & in the capture for metric_name.

…f 0.

Also renames structure member variables in Descriptor in
well_known_names.cc to end with _ per Envoy style guide (overriding
the Google style guide which states that structs don't suffix member
variables with _).

Signed-off-by: Joshua Marantz <[email protected]>
…was a const ref to begin with.

Signed-off-by: Joshua Marantz <[email protected]>
dnoe
dnoe previously approved these changes Feb 15, 2018
Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, defer to @alyssawilk for senior maintainer review and merge.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great - now both fastrer and easier to understand :-) Only two little nits on my side.

// I anticipate possibly changing the default tag regexes so that more than one regex can
// yield the same tag, on the theory that this will reduce regex backtracking. But at the
// moment, this doesn't happen, so the flow isn't well tested this. When we start exploiting
// this, and it's tested, we can simply remove this assert.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a TODO ? If not I'd suggest rewriting as a statement of what is, since speculation of what will happen may be more confusing than not.

Also
the flow isn't well tested this -> this flow isn't well tested?

}
if (regex.empty()) {
throw EnvoyException(fmt::format(
"No regex specified for tag specifier and no default regex for name: '{}'", name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you just moved this, but as long as you're in here mind adding a unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done & done

@alyssawilk alyssawilk merged commit 9c728a0 into envoyproxy:master Feb 15, 2018
@jmarantz jmarantz deleted the prefix-and-map-dispatch branch February 15, 2018 21:35
lita pushed a commit to lita/envoy that referenced this pull request Feb 15, 2018
…o avoid initiating multiple regexes that don't match. (envoyproxy#2582)"

This reverts commit 9c728a0.
@jmarantz
Copy link
Contributor Author

jmarantz commented Feb 15, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants