Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise Scanner performance #226

Merged
merged 6 commits into from
Sep 25, 2018
Merged

Conversation

Liquidsoul
Copy link
Contributor

This is a rebase of #207

✅ What I did:

  • squash of the commits
  • integrate the upstream changes about range reporting to fix the tests during the rebase process
  • remove Swift 3 from the build matrix

❌What I did not do:

  • benchmark if there is an actual performance improvement

yonaskolb
yonaskolb previously approved these changes Sep 1, 2018
Copy link
Collaborator

@yonaskolb yonaskolb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 😄👍
Would you be able to add a changelog entry and reference the original author as well?

@yonaskolb yonaskolb mentioned this pull request Sep 1, 2018
@Liquidsoul
Copy link
Contributor Author

@yonaskolb PR rebased and entry added to the changelog

@yonaskolb
Copy link
Collaborator

yonaskolb commented Sep 2, 2018

So I did some benchmarking using SwagGen as a test harness and using XCTest to measure just the stencil file generation

Results in seconds:

So while this PR improves things slightly it seems something was added after the original #207 that slowed down things even more.

After some more investigation it seems #167 caused the slowdown
before: 0.25
after: 6.0

@yonaskolb
Copy link
Collaborator

@ilyapuchka as the author of #167 do you have any other insights here?

@yonaskolb
Copy link
Collaborator

Interestingly just timed #225 and it was 0.37s

@ilyapuchka
Copy link
Collaborator

@yonaskolb I suppose something around here might have caused performance issues https://github.com/stencilproject/Stencil/pull/167/files#diff-91493e167a3f6240a7cc916d9504d439R25

@ilyapuchka
Copy link
Collaborator

I noticed recently that parsing things like func some() {{% some tag %} fails due to usage of {{%. I see that there are some changes to the parsing of starting tokens here and wonder if this is still the case with this change and, if so, if it can be addressed @Liquidsoul

@Liquidsoul
Copy link
Contributor Author

I've tried to tokenize func some() {{% if %} using this updated Lexer and the existing one. Here is the results:

  • the new lexer does not handle this well and crashes even.
  • the old lexer fails to parse the if tag but does not stop the execution.

So the new lexer does solve the issue and introduces a new problem.
Instead of failing to capture the tag as in the old one the new lexer crashes.
I see that as a blocking regression.

The fact that we did not have a testing context showing the improvement in #207 makes it hard to see this as a working optimisation. And since the original author does not seem to use Stencil anymore maybe we should just drop this entirely, see if this reported again and just address the existing performance issue.

What do you think?

@ilyapuchka
Copy link
Collaborator

I agree, we should better address performance regression introduced by #167

@ilyapuchka
Copy link
Collaborator

@yonaskolb can you create a PR with your performance tests?

@djbe
Copy link
Contributor

djbe commented Sep 10, 2018

  • Swift 4.1 #228 has been merged so this should be next.
  • Swift 4.1 #228 introduced a performance loss in this commit: 93ccc56
  • Investigate if String/Substring is the performance issue, and how to solve this.

@djbe
Copy link
Contributor

djbe commented Sep 10, 2018

@yonaskolb You've mentioned a few times that you timed the tests. Are you just running them locally and checking how long yourself (with time diff or something), or do you have an actual test case we could run on CI?

@yonaskolb
Copy link
Collaborator

I've just been timing locally using an XCTest in SwagGen while pointing to specific Stencil commit. I don't know of a way to run performance tests on CI as to fail on performance regressions you need to set a baseline and that won't be stable across different machines

@djbe
Copy link
Contributor

djbe commented Sep 10, 2018

I don't think we need a test that "fails", more a test that might report the change, either in CI, or here in the PR. For example, SwiftLint reports performance changes in the PR: realm/SwiftLint#2396.

@djbe
Copy link
Contributor

djbe commented Sep 10, 2018

Now that whole process would be a separate PR, but just for this commit, so we can test locally, a simple test that runs a few times, and logs the average time would be nice.

@AliSoftware
Copy link
Collaborator

We definitely need to configure Danger on those repos at some point ❤️

@ilyapuchka
Copy link
Collaborator

@Liquidsoul I created #235 for mentioned issue

@djbe
Copy link
Contributor

djbe commented Sep 11, 2018

So, going through the comments again:

@Liquidsoul
Copy link
Contributor Author

@djbe the {{% error already exists in the current code base, this is not something that's introduced by this change.
However, as I said here, this PR introduces a crash instead of a parsing failure when there is a syntax error in the stencil file.
So as I said, if we want to implement this optimisation, we may wish to rewrite this to ensure that no regressions are introduced.

@djbe
Copy link
Contributor

djbe commented Sep 12, 2018 via email

content = substring
return (string, result)
}
for (index, char) in zip(0..., content.unicodeScalars) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a performance difference between this and content.unicodeScalars.enumerated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bad practice in Swift to use .enumerated() just to be able to iterate with an index.

@yonaskolb
Copy link
Collaborator

The new test will have to be updated to the Spectre 9.0 format

@djbe
Copy link
Contributor

djbe commented Sep 23, 2018

@yonaskolb What do you mean the new format? The only difference that I saw in that PR, was just an XCTestCase class added in each file, which this PR already has since it's been rebased on master.

return result
} else if char == tokenChar {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else  {
  foundChar = char == tokenChar
}

let startIndex = content.index(content.startIndex, offsetBy: index)
let result = String(content[..<startIndex])
content = String(content[content.index(after: startIndex)...])
range = range.upperBound..<originalContent.index(range.upperBound, offsetBy: index + 1)
Copy link
Collaborator

@ilyapuchka ilyapuchka Sep 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why content and range are needed when result is returned right after they are calculated? I can't see them being used. ah, it's a state mutation 🤦‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use ... instead of + 1 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it took me a while before I grokked the whole thing 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ... gives a ClosedRange instead of a Range, so we can't use that unfortunately.

if foundChar && char == Scanner.tokenEndDelimiter {
let startIndex = content.index(content.startIndex, offsetBy: index)
let result = String(content[..<startIndex])
content = String(content[content.index(after: startIndex)...])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use prefix/suffix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if it impacts performance.

content = String(content[startIndex...])
range = range.upperBound..<originalContent.index(range.upperBound, offsetBy: index - 1)
return (char, result)
} else if char == Scanner.tokenStartDelimiter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else {
  foundBrace = char == Scanner.tokenStartDelimiter
}

@ilyapuchka
Copy link
Collaborator

Can we add a performance test?

@djbe
Copy link
Contributor

djbe commented Sep 23, 2018

I don't know, you can only really test it with really (really) large templates, or a whole bunch of templates. Maybe leave that for a separate PR?

@ilyapuchka
Copy link
Collaborator

we could just add whatever project you are using right now to measure improvement and use its templates in a test?

@djbe
Copy link
Contributor

djbe commented Sep 23, 2018

I'm using SwaGen (what @yonaskolb was using), it's a whole project, with a whole bunch of templates, and the test isn't perfect as it's a full render, not just the lexer.

@djbe djbe force-pushed the faster-scanner branch 2 times, most recently from c44698c to f0eca34 Compare September 23, 2018 11:44
@djbe
Copy link
Contributor

djbe commented Sep 23, 2018

I've added a performance test based on the template in #224, repeated a few times 😆.

There's no reporting yet, we'll leave that for a different PR (Danger).

@djbe djbe mentioned this pull request Sep 24, 2018
@djbe
Copy link
Contributor

djbe commented Sep 25, 2018

@yonaskolb @ilyapuchka any final thoughts? I'd be ideal to have this in for 0.13.

@djbe djbe mentioned this pull request Sep 25, 2018
Copy link
Collaborator

@AliSoftware AliSoftware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would definitely need a PR dedicated to add even a little doc-comment on that code; always confusing especially for someone who doesn't work on it daily like me 😉

Sources/Lexer.swift Show resolved Hide resolved
ethorpe and others added 6 commits September 26, 2018 00:33
…ment under Ubuntu

Cleanup readability a little bit
Rewrite original scan function so it's available. Syntax improvements

Fix deprecation warnings in Lexer

Cleanup some syntax issues

lexer

t

t
@djbe djbe merged commit 6f9bb3e into stencilproject:master Sep 25, 2018
@Liquidsoul Liquidsoul deleted the faster-scanner branch June 4, 2019 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants