-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #9550 - Avoid expensive calls inside BuildKeyVarList #9556
Conversation
bool is_simple_string; // Whether the Key potentially includes a Regular Expression pattern | ||
std::shared_ptr<RE2> case_insensitive_pattern; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReqReportVariables
: during GetInput, we try to see if the Key
is a regex or not, and we store: a pre-compiled RE2 case insensitive pattern, and a boolean flag to indicate whether it's supposed to be a Regex or a normal match.
Note: the boolean could be omitted by just checking case_insensitive_pattern == nullptr
. I'm loosely following what DataOutputs.cc does.
EnergyPlus/src/EnergyPlus/DataOutputs.hh
Lines 74 to 83 in 6e02a12
struct OutputReportingVariables | |
{ | |
OutputReportingVariables(EnergyPlusData &state, std::string const &KeyValue, std::string const &VariableName); | |
std::string const key; | |
std::string const variableName; | |
bool is_simple_string = true; | |
std::shared_ptr<RE2> pattern = nullptr; | |
std::shared_ptr<RE2> case_insensitive_pattern = nullptr; | |
}; |
for (int i = 1; i <= op->NumOfReqVariables; ++i) { | ||
auto &reqRepVar = op->ReqRepVars(i); | ||
|
||
// SUBROUTINE LOCAL VARIABLE DECLARATIONS: | ||
int Loop; | ||
int Loop1; | ||
bool Dup; | ||
auto &op(state.dataOutputProcessor); | ||
|
||
for (Loop = MinIndx; Loop <= MaxIndx; ++Loop) { | ||
if (op->ReqRepVars(Loop).Key.empty()) continue; | ||
if (!UtilityRoutines::SameString(op->ReqRepVars(Loop).VarName, VariableName)) continue; | ||
if (!(UtilityRoutines::SameString(op->ReqRepVars(Loop).Key, KeyedValue) || RE2::FullMatch(KeyedValue, "(?i)" + op->ReqRepVars(Loop).Key))) | ||
if (!UtilityRoutines::SameString(reqRepVar.VarName, VarName)) { | ||
continue; | ||
|
||
// A match. Make sure doesn't duplicate | ||
|
||
op->ReqRepVars(Loop).Used = true; | ||
Dup = false; | ||
for (Loop1 = 1; Loop1 <= op->NumExtraVars; ++Loop1) { | ||
if (op->ReqRepVars(op->ReportList(Loop1)).frequency == op->ReqRepVars(Loop).frequency) { | ||
Dup = true; | ||
} else { | ||
continue; | ||
} | ||
// So Same Report Frequency | ||
if (op->ReqRepVars(op->ReportList(Loop1)).SchedPtr != op->ReqRepVars(Loop).SchedPtr) Dup = false; | ||
} | ||
|
||
if (!Dup) { | ||
++op->NumExtraVars; | ||
if (op->NumExtraVars == op->NumReportList) { | ||
op->ReportList.redimension(op->NumReportList += 100, 0); | ||
} | ||
op->ReportList(op->NumExtraVars) = Loop; | ||
if (!reqRepVar.Key.empty() && !(reqRepVar.is_simple_string && UtilityRoutines::SameString(reqRepVar.Key, KeyedValue)) && | ||
!(!reqRepVar.is_simple_string && RE2::FullMatch(KeyedValue, *(reqRepVar.case_insensitive_pattern)))) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the array was traversed multiple times:
- A first time to check the first and last index that match the VarName
- A second time partially, between that min and max index in BuildKeyVarList (filtering out blank keys, and indices where VarName didn't match)
- A third time partially, between that min and max index in AddBlankKeys (filtering out non blank keys, and indices where VarName didn't match)
The other and main thing was with the RE2 in BuildKeyVarList
- Anytime a Key didn't match "normally", it would build a single use RE2 pattern again (which is very expensive) and try to match, regardless of whether the Key contained any special characters making it a potential regex
- That meant that a single Output:Variable could lead to the creation of potentially hundreds or thousands of RE2 patterns
- HospitalBasline is a good (albeit extreme) example because it has a lot of variables that are exact-keyed for node variables, eg
Output:Variable,Floor 1 Cafe VAV Box Outlet Node Name,System Node Standard Density Volume Flow Rate,hourly;
. - Each Node will end up triggering a call to SetupOutputVariable for
System Node Standard Density Volume Flow Rate
and every time (except that one time where the Key matches exactly in the first pass) that single Output:Variable will lead to the creation of an expensive RE2 pattern - There are 130
System Node Standard Density Volume Flow Rate
variables requested in that file. 1420 unique node names are in that file. So there are1420 * (130 - 1) = 183 180
RE2 patterns being created, versus zero with this PR.
- HospitalBasline is a good (albeit extreme) example because it has a lot of variables that are exact-keyed for node variables, eg
time ./Products/energyplus -w $chig_epw -d out-delete -D -r ../EnergyPlus/testfiles/HospitalBaseline.idf
Build Type | develop | PR | Runtime |
---|---|---|---|
Debug | 180.644 | 120.531 | -33% |
Release | 14.6902 | 11.715 | -20% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 🔥 🔥
Dup = true; | ||
} else { | ||
continue; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no point keeping looking as soon as we've found it's a Dup.
// TODO: I think it's supposed to be upper case already, but tests aren't doing that at least... | ||
const std::string FreqStringUpper = UtilityRoutines::MakeUPPERCase(FreqString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if this was really true (and then I should just modify the tests to be upper case as well) or not.
@@ -558,10 +444,12 @@ namespace OutputProcessor { | |||
// SUBROUTINE ARGUMENT DEFINITIONS: | |||
|
|||
// SUBROUTINE PARAMETER DEFINITIONS: | |||
static std::vector<std::string> const PossibleFreq({"deta", "time", "hour", "dail", "mont", "runp", "envi", "annu"}); | |||
static std::vector<std::string> const PossibleFreqs({"DETA", "TIME", "HOUR", "DAIL", "MONT", "RUNP", "ENVI", "ANNU"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid consistently upper casing stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is already merged, but ... these things can/should be static constexpr std::array<std::string_view>
which are compiled into the executable rather than constructed (I understand that this is static
and so only constructed once, but once is still more than zero times.
src/EnergyPlus/OutputProcessor.cc
Outdated
bool is_simple_string = !UtilityRoutines::isKeyRegexLike(reqRepVar.Key); | ||
reqRepVar.is_simple_string = is_simple_string; | ||
if (!is_simple_string) { | ||
reqRepVar.case_insensitive_pattern = std::make_shared<RE2>("(?i)" + reqRepVar.Key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if this has special regex chars, if so compile the pattern and store.
for (int i = 1; i <= op->NumOfReqVariables; ++i) { | ||
auto &reqRepVar = op->ReqRepVars(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amirroth 's comment on a commit (which is much harder to see / find) here: bd6ce05#r78895379
📌 Most EnergyPlus uses of
std::map
orstd::unordered_map
are misguided, i.e., a map is overkill because the expected number of elements is small and/or the elements are compile time constants which maps cannot be. Then there are situations like this which look like good fits for a map--potentially large number of dynamic items--and we are rolling with anArray1D
.
0m24,836s => 0m23,870s
…egex are EXPENSIVE 0m24,836s => 0m23,870s => 0m19,816s
…ildKeyVarList/AddBlankKeys), check if a Key could be a Regex and cache an RE2 case insentitive pattern
…outines so I can test against it
…ing UtilityRoutines
11e0d6f
to
2494907
Compare
TEST_F(EnergyPlusFixture, DataOutputs_isKeyRegexLike) | ||
{ | ||
std::vector<std::pair<std::string, bool>> test_cases{ | ||
{"*", false}, | ||
{"This is the first one", false}, | ||
{"This is another.*one", true}, | ||
{"This is (a|some) ones?", true}, | ||
{"Zone 1.1", true}, // The `.` could mean any character (though in this case it's not meant as a regex really) | ||
{"Cafétéria", false}, // !std::isalnum, but not a regex | ||
}; | ||
|
||
for (auto &[s, expectedIsRegexLike] : test_cases) { | ||
EXPECT_EQ(expectedIsRegexLike, DataOutputs::isKeyRegexLike(s)) << "isKeyRegexLike: Failed for " << s; | ||
} | ||
|
||
for (auto &[s, expectedIsRegexLike] : test_cases) { | ||
EXPECT_EQ(expectedIsRegexLike, DataOutputs::isKeyRegexLikeOri(s)) << "isKeyRegexLikeOri: Failed for " << s; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation fails for Cafétéria
(assumes it's a regex when there's no chance it is). isKeyRegexLikeOri is extracted from
EnergyPlus/src/EnergyPlus/DataOutputs.cc
Lines 141 to 146 in 2a26c7b
if (KeyValue == "*") return; | |
for (auto const &c : KeyValue) { | |
if (c == ' ' || c == '_' || std::isalnum(c)) continue; | |
is_simple_string = false; | |
break; | |
} |
[ RUN ] EnergyPlusFixture.DataOutputs_isKeyRegexLike
/home/julien/Software/Others/EnergyPlus/tst/EnergyPlus/unit/OutputProcessor.unit.cc:6088: Failure
Expected equality of these values:
expectedIsRegexLike
Which is: false
DataOutputs::isKeyRegexLikeOri(s)
Which is: true
isKeyRegexLikeOri: Failed for Cafétéria
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test bed: https://godbolt.org/z/Y4n74orbK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me ask a high-level question. To what extent do EnergyPlus users actually use general-purpose regex for variable keys? Or do they mostly use just plain *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define "general-purpose" please? Not sure I follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I meant "general" rather than "general-purpose". Basically, anything other than a simple "*" with nothing else around it. If those are the majority of the regex use cases, then you can probably create a special case for it that doesn't use regex at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you mean .*
or .+
? I doubt people use that, they just use *
then.
I'd bet most use case are prefix/suffix filtering. Eg if you name your zones FLOOR1-Zone1
etc, you can get repporting for FLOOR1-.*
. Same if you named the nodes of a loop like LOOPNAME NODENAME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A prefix is also a special case that doesn't need the regex library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you know what I am getting at. If the cost is in compiling the regex and the common usages are special cases that can be handled using has_prefix
, has_suffix
, or even no explicit matching at all (*
), then we can reduce the frequency of regex compilation even further. I'm going to just leave this here. Maybe what you've already done (:fire:) is the 80/20 sweet spot here and no additional optimization is needed at this time.
@jmarrec this is an incredible improvement, thanks! Looks like there is still one more failing unit test, but otherwise CI is happy. Let me know if you need anything from me to help resolve that, but I'm assuming it's a straightforward tweak to get it working. Thanks! |
@Myoldmopar I don't know why decent CI does that, but the build results posted here are just for an older commit 2494907 (that I pushed on purpose with a failing test). Maybe it's because I immediately pushed the fix right after and decent ci has a kind of time delay? Anyways, if you click on the latest commit you'll see all green results, aside from the SolarShading GPU test usual issue (we must address it or disable that test, or just rerun it if it fails or something.... CI should have a 100% pass rate all the time unless there is a change). see ee41ce7 |
You're right, all clean! Not sure why it got confused, but this is green and ready to go in. Thanks!! |
Pull request overview
Please see #9556 (comment) for explanations.
Pull Request Author
Add to this list or remove from it as applicable. This is a simple templated set of guidelines.
Reviewer
This will not be exhaustively relevant to every PR.