-
Notifications
You must be signed in to change notification settings - Fork 25k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Build complex automatons more efficiently (#66901)
This change substantially reduces the CPU and Heap usage of StringMatcher when processing large complex patterns. The improvement is achieved by switching the order in which we perform concatenation and union for common styles of wildcard patterns. Given a set of wildcard strings: - "*-logs-*" - "*-metrics-*" - "web-*-prod-*" - "web-*-staging-*" The old implementation would perform steps roughly like: minimize { union { concatenate { MATCH_ANY, "-logs-", MATCH_ANY } concatenate { MATCH_ANY, "-metrics-", MATCH_ANY } concatenate { "web-", MATCH_ANY, "prod-", MATCH_ANY } concatenate { "web-", MATCH_ANY, "staging-", MATCH_ANY } } } The outer minimize would require determinizing the automaton, which was highly inefficient The new implementation is: minimize { union { concatenate { MATCH_ANY , minimize { union { "-logs-", "-metrics"- } } MATCH_ANY } concatenate { minimize { union { concatenate { "web-", MATCH_ANY, "prod-" } concatenate { "web-", MATCH_ANY, "staging-" } } } MATCH_ANY } } } By performing a union of the inner strings before concatenating the MATCH_ANY ("*") the time & heap space spent on determinizing the automaton is greatly reduced. Backport of: #66724
- Loading branch information
Showing
2 changed files
with
90 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters