-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiLineBuildLogIndication performance and bug fixes #59
Conversation
Since inherited classes from Indication does not declare @JsonProperty("pattern") in their constructors, the JSON writer will use the default constructor and populate fields using a matching (by name) getter. This leads to one particular bad bug, where MultiLineBuildLogIndication returns a modified pattern using the @Overidden getPattern() during JSON serialization. Everytime a user modify a FailureCause containing a MultiLineBuildLogIndication it will be rewritten when stored in mongodb: 1. User enters a MultiLineBuildLogIndication: email.com 2. This is stored: (?m)(?s)^[\r\n]*?email.com[^\r\n]*?$ 3. This is searched for in Logs: (?m)(?s)^[\r\n]*?(?m)(?s)^[\r\n]*?email.com[^\r\n]*?$[^\r\n]*?$ 4. Next time the Cause is saved: (?m)(?s)^[\r\n]*?(?m)(?s)^[\r\n]*?(?m)(?s)^[\r\n]*?email.com [^\r\n]*?$[^\r\n]*?$[^\r\n]*?$ Rinse and repeat. Change-Id: Id1233b66d29aa948813a29bcc01f353c4d245304
Test data --------- Log: 49MB log @ 280.000 lines Multiline pattern: Simple pattern, matching +20 lines at end of log Pre-fix: 32 seconds (Note that the hardcoded timeout is 10 seconds) Post-fix: 1.7 seconds The fix ------- Current implementation relies on Scanner.findWithinHorizon() with a horizon of 10k. Every time this large window does not turn up a match, the sliding window increases just one line (or several if consecutive empty lines are found). This essentially means that every line in a huge log is scanned 100s of times. To avoid grabbing extremely large captures, we now scan the log in 10kb chunks with 5kb overlaps (~50 lines for regular logs). This patch also fixes an issue where the multiline pattern always needed to match the beginning of a line. Change-Id: I5eeddb3555f965dde463015766f1800c5bcb772b
At least one test failure seems legit, rebuilding since some of the tests where due to timezone issues. |
Yepp, the three remaining test failures seems legit. |
One testcase was a bit broken (it only worked due to a somewhat faulty compiled pattern in previous versions of MultilineBuildLogIndication). This test case was replaced with a more stable one that utilizes the block timeout. QuadrupleDupleLineReader() is not used for Multiline scanning since it does not use Reader.readline(). Another testcase utilized a deprecated schedule method and did not receive Causes due to this. Change-Id: I6cf0dfc81869aad965c3d2731cfb8f9121da584f
Thanks for looking at the code Bobby. I have updated the testcases. |
Hang on. Just noticed the checkstyle. |
Change-Id: I8f633d61f0abc0b2ef98adb33c9799c87e9b3436
It's hard to tell; does this change require a minor version bump? Or is the behaviour consistent enough with previous versions to just bump the micro version? |
Thanks!
Hard to say indeed. Multiline + Mongo is broken today. So that fix should go in as soon as possible. The performance stuff should be drop in replacement. But looking at the release history, performance fixes have motivated major releases in the past (see 1.15 -> 1.16). Or perhaps it was the xss-fix that motivated that version jump? |
Yes, it was the size of the xss fix that prompted the minor bump instead of a micro bump. |
This fork provides some needed MultiLineBuildLogIndication
performance and bug fixes (see commit messages for more details):
JSON Serialization
Since inherited classes from Indication does not declare
@JsonProperty("pattern") in their constructors, the JSON
writer will use the default constructor and populate fields
using a matching (by name) getter.
This leads to one particular bad bug, where
MultiLineBuildLogIndication returns a faulty modified pattern
using the @Overidden getPattern() during JSON serialization.
Performance fixes.
Test data:
Log: 49MB log @ 280.000 lines
Multiline pattern: Simple pattern, matching +20 lines at end of log
Pre-fix: 32 seconds (Note that the hardcoded timeout is 10 seconds)
Post-fix: 1.7 seconds
Multiline patterns could only match beginning of a line