Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiLineBuildLogIndication performance and bug fixes #59

Merged
merged 4 commits into from
Oct 19, 2016

Conversation

GLundh
Copy link
Member

@GLundh GLundh commented Oct 12, 2016

This fork provides some needed MultiLineBuildLogIndication
performance and bug fixes (see commit messages for more details):

JSON Serialization

Since inherited classes from Indication does not declare
@JsonProperty("pattern") in their constructors, the JSON
writer will use the default constructor and populate fields
using a matching (by name) getter.

This leads to one particular bad bug, where
MultiLineBuildLogIndication returns a faulty modified pattern
using the @Overidden getPattern() during JSON serialization.

Performance fixes.

Test data:
Log: 49MB log @ 280.000 lines
Multiline pattern: Simple pattern, matching +20 lines at end of log
Pre-fix: 32 seconds (Note that the hardcoded timeout is 10 seconds)
Post-fix: 1.7 seconds

Multiline patterns could only match beginning of a line

Since inherited classes from Indication does not declare
@JsonProperty("pattern") in their constructors, the JSON
writer will use the default constructor and populate fields
using a matching (by name) getter.

This leads to one particular bad bug, where
MultiLineBuildLogIndication returns a modified pattern
using the @Overidden getPattern() during JSON serialization.

Everytime a user modify a FailureCause containing a
MultiLineBuildLogIndication it will be rewritten when stored
in mongodb:

1. User enters a MultiLineBuildLogIndication:
email.com

2. This is stored:
(?m)(?s)^[\r\n]*?email.com[^\r\n]*?$

3. This is searched for in Logs:
(?m)(?s)^[\r\n]*?(?m)(?s)^[\r\n]*?email.com[^\r\n]*?$[^\r\n]*?$

4. Next time the Cause is saved:
(?m)(?s)^[\r\n]*?(?m)(?s)^[\r\n]*?(?m)(?s)^[\r\n]*?email.com
[^\r\n]*?$[^\r\n]*?$[^\r\n]*?$

Rinse and repeat.

Change-Id: Id1233b66d29aa948813a29bcc01f353c4d245304
Test data
---------
Log: 49MB log @ 280.000 lines
Multiline pattern: Simple pattern, matching +20 lines at end of log
Pre-fix: 32 seconds (Note that the hardcoded timeout is 10 seconds)
Post-fix: 1.7 seconds

The fix
-------
Current implementation relies on Scanner.findWithinHorizon() with
a horizon of 10k.

Every time this large window does not turn up a match, the sliding
window increases just one line (or several if consecutive empty
lines are found).

This essentially means that every line in a huge log is scanned 100s
of times.

To avoid grabbing extremely large captures, we now scan the log in
10kb chunks with 5kb overlaps (~50 lines for regular logs).

This patch also fixes an issue where the multiline pattern always
needed to match the beginning of a line.

Change-Id: I5eeddb3555f965dde463015766f1800c5bcb772b
@GLundh GLundh changed the title Fix: Broken JSON serialization MultiLineBuildLogIndication performance and bug fixes Oct 14, 2016
@rsandell
Copy link
Member

At least one test failure seems legit, rebuilding since some of the tests where due to timezone issues.

@rsandell rsandell closed this Oct 14, 2016
@rsandell rsandell reopened this Oct 14, 2016
@rsandell
Copy link
Member

Yepp, the three remaining test failures seems legit.

One testcase was a bit broken (it only worked due to a somewhat
faulty compiled pattern in previous versions of
MultilineBuildLogIndication). This test case was replaced with a
more stable one that utilizes the block timeout.

QuadrupleDupleLineReader() is not used for Multiline scanning
since it does not use Reader.readline().

Another testcase utilized a deprecated schedule method and did
not receive Causes due to this.

Change-Id: I6cf0dfc81869aad965c3d2731cfb8f9121da584f
@GLundh
Copy link
Member Author

GLundh commented Oct 18, 2016

Thanks for looking at the code Bobby. I have updated the testcases.

@GLundh
Copy link
Member Author

GLundh commented Oct 18, 2016

Hang on. Just noticed the checkstyle.

Change-Id: I8f633d61f0abc0b2ef98adb33c9799c87e9b3436
@rsandell rsandell merged commit 97baff4 into jenkinsci:master Oct 19, 2016
@rsandell
Copy link
Member

It's hard to tell; does this change require a minor version bump? Or is the behaviour consistent enough with previous versions to just bump the micro version?

@GLundh
Copy link
Member Author

GLundh commented Oct 19, 2016

Thanks!

It's hard to tell; does this change require a minor version bump?

Hard to say indeed.

Multiline + Mongo is broken today. So that fix should go in as soon as possible.

The performance stuff should be drop in replacement. But looking at the release history, performance fixes have motivated major releases in the past (see 1.15 -> 1.16). Or perhaps it was the xss-fix that motivated that version jump?

@rsandell
Copy link
Member

Yes, it was the size of the xss fix that prompted the minor bump instead of a micro bump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants