-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EOF_CHAR isn't the common case, moving those first #365
base: main
Are you sure you want to change the base?
Conversation
I was looking into the parser, thinking if LittleDict might be faster there than Dict. Maybe not since it doesn't seem speed-critical. I know the parser is hardly speed-critical, compared to the compiler/optimizer, or at least was. Now more code will be precompiled. It doesn't seem to hurt to rearrange here, this likely is compiled to branches (misdirected are slow). A LittleDict of some common ASCII letters might be even faster. Note, this isn't tested or benchmarked, but I tried to be very careful to not screw up copy-pasting. |
Codecov Report
@@ Coverage Diff @@
## main #365 +/- ##
=======================================
Coverage 96.56% 96.56%
=======================================
Files 14 14
Lines 4161 4161
=======================================
Hits 4018 4018
Misses 143 143
|
I'm a bit confused about these line:
or I think they must work, just unclear why not similar needed for the lines above for hyphen minus. But this must have worked... |
Does the order matter for correctness rather than speed? That's only idea I have why my last commit worked around bug. If you trust this you could merge. |
There's a very basic benchmarking script in test/benchmark.jl - does this show any improvement for before vs after this change? If you're working on performance improvements, please do prove to yourself (and everyone else :-) ) that they actually make a difference. For example, the compiler may be able to recognize very simple if-else chains and do something more efficient with them than comparing everything in order. I don't know if that happens in this case, but it could make any actual rearrangements here have no effect on the final runtime. The order could matter if there's any overlap between categories. I think there's not, but worth checking whether of the explicit characters might be in one of the predicates. |
At first I just meant to move the check for EOF_CHAR as it's obviously not most common. I believe the checks are in order, the compiler cann't have any idea of the true distribution. I'm doing a best guess but I haven't benchmarked.
if some of the checks are not as simple maybe the compiler takes that into account. And I've also thought about that possibility, why I had emit at the top. |
Please do benchmark it, all performance work needs benchmarking. Without that, it's easy to code changes which don't matter - needlessly churning the code or introducing complexity. My intuition says it would be best to focus on optimizing You could also work out an approximation for the true distribution of characters if that matters - for example just compute a histogram of character frequencies. |
A good part of this was effectively done in #372, and it does have some measurable performance benefit. |
no _next_token, should be optimized, have a fast path? It calls lex_identifier, that was optimized, and it isn't the main bottleneck? is_identifier_start_char isn't most critical, did you mean is_identifier_char which is in the new fast path there. It's interesting to see the table: I would have thought checking for most likely (and maybe 2nd or 3rd most likely) would be faster, and possibly then a lookup table.. Lookups aren't that fast, though this one only 16 bytes. 16-entry can be made fast though, with SIMD instructions, I understand, but I'm not convinced otherwise, also bit-manipulation, not fastest? [There are tests for BigInt literals, i.e. big macro not used, it seems, thus also no need to test for BigFloat? I'm thinking this is correct, the parser just parses a string, and hands it to the macro. I'm just thinking how much work it would be to get rid of BigFloat in Julia, it seems no problem for the parser, but BigInt can't be gotten rid of because of it.] |
While some optimisation based solely on "mental work" can be useful, I find it better to work with real data from profiling and benchmarks. So if you want to work on the performance of this package (or any package for that matter) it would be a good idea to come with concrete data (in form of profiles) that show that certain functions take up a significant time and that the changes you propose improve those times (in form of benchmarks). Computers are complicated and trying to guess the effect of certain code changes is very hard. As it is right now, it feels like we are just kind of guessing which is not a way to do incremental performance improvements. |
No description provided.