Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Styler is slow #558

Open
rillig opened this issue Oct 16, 2019 · 11 comments
Open

Styler is slow #558

rillig opened this issue Oct 16, 2019 · 11 comments

Comments

@rillig
Copy link

rillig commented Oct 16, 2019

Formatting a real-life example file takes 17 seconds on my computer. That's really much because the files is just 1215 lines. Styling this file should take at most 1 second.

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Oct 16, 2019

Yes, but you get the follwing styled right compared to the fast formatter formatR:

We also have features such as:

We are working on various features such as:

Often there is the trade-off between:

  • fast code
  • readable / maintainable code

We choose an approach that is quite elegant, flexible and customizable (read more here). As of v1.3. , we can cache on top-level expression level which should mitigate the problem, at least for repeated styling.

Now the ad block is over :-)


Indeed you are right, styler is not fast. Although it used to be twice as slow (#78), it's also bothering me. To identify more bottlenecks, I think we'd need to:

  • make use of profiling (which is quite difficult because of recursive function calls, but I think @krlmlr found a way to do it).
  • move some code to C or C++, but I am not an expert on that so I'd leave that to others.
  • identify other shortcuts from looking at logic in code (for people like me who are very familiar with the codebase). This includes moving certain rules into one transformer function, so the visitor has less function calls to make.
  • import more namespaces and avoid ::. No idea what the speed implications are, but we can spare the .onLoad() call (compare Benchmark study: Implicit imports #685).
  • use [['x']] instead of $ [[ subsetting much slower than $ tidyverse/tibble#780 (comment).
  • general tibble speed things: https://github.com/tidyverse/tibble/issues?q=speed.
  • remove as.character(transformers) since we have transformer name and version (Speed improvement in caching  #679).
  • In parse_transform_serialize_r_block() or parse_transform_serialize_r(): only pass a subset of all transformers to apply_transformers(), namely remove those who we know wont ever be applied (e.g. force_assignment_op() must not be applied to all nests if we know there is not a single instance of EQ_ASSIGN. This will have some speed-up for transformers that are expensive to run but hardly change any nest. Note that we should use a function to subset the transformers (e.g. subset_tranformers(), and that function should be packed in the style guide and not be hardcoded in styler (Remove unused transformers before processing #711).

We need to distinguish:

  • changes that will make styler pareto faster because it involves operations that are always faster when used an alternative to the current implementation. This can be tested easily.
  • changes that involve a trade off, e.g. will make it faster on cached code and make it slower on uncached code, or ignored code or aligned code etc. Or will make it faster on cached code if this code has structure x and slower when it has structure y. In particular changing the way caching works (e.g. when we introduced shallow parse tables, see 2921c7e and similar.

I can't say much more about that right now, unfortunately. If you want to contribute to any of the above or you have other suggestions, please let me know.

I think we should also establish benchmarking with CI services to understand the speed implications of new features more holistically.

We use {touchstone} to benchmark every PR: https://github.com/lorenzwalthert/touchstone

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Dec 12, 2019

Anyone interested in doing speed profiling with the new proffr package?

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Feb 4, 2020

#578 is now ready for testing (and looking for feedback). Styling your real world file form above took my notebook 40s (with some other tasks going on) in the first round. Then, I made 14 changes to the file (in like 10 different top-level expressions, you have about 190 top-level expressions, including 32 comments) and restyled. It took 10s. For all untouched top-level expressions, the cache was used. Assuming you run styler and change only a few top level expressions yields significant speed boosts. I agree 10s is still a lot but it's 4x faster than before. 🎉

Edit: We released v1.3.2 on CRAN that contains #578 and some fixes required later.

@y9c
Copy link

y9c commented Mar 23, 2020

@lorenzwalthert

Thanks for the improvement, but the latest version of styler is still quite slow...

$ R --version
R version 3.6.3 (2020-02-29) -- "Holding the Windsock"

$ R -e 'packageVersion("styler")'
[1] ‘1.3.2’

$ wc -l test.R
1243

$ time cat test.R | R --slave --no-restore --no-save -e "con <- file(\"stdin\");styler::style_text(readLines(con));close(con)" > /dev/null
cat test.R  0.00s user 0.00s system 75% cpu 0.001 total
R --slave --no-restore --no-save -e  > /dev/null  16.71s user 0.08s system 99% cpu 16.817 total

It is very similar with this issue, I still take ~17s for a single file.

@lorenzwalthert
Copy link
Collaborator

Yes. This is why this issue is still open. Caching only improves speed on repeated styling.

@y9c
Copy link

y9c commented Mar 23, 2020

I don't know much about the principles of styler, but I wonder if adding asynchronous programming can help, which is widely used in the formatter of other languages.

@lorenzwalthert
Copy link
Collaborator

Can you elaborate a bit on that?

@y9c
Copy link

y9c commented Mar 23, 2020

I suppose most of the functions in styler don't need to run sequentially, thus wrapping those functions by promises package in R might bring some improvement. Just in the same way as the improved shiny app.
This may not fit the styler, I may be wrong...

@lorenzwalthert
Copy link
Collaborator

We can parallelize on different levels. The most outer way is over files, as in #617. Indeed we could also parallelize over expressions at some stage but this was not a priority because I worked on caching.

@rillig
Copy link
Author

rillig commented Mar 23, 2020

Instead of tweaking the implementation details, I wonder whether the algorithms that are used here are appropriate. Using parallelism will not reduce the overall CPU time, choosing a different algorithm may.

Is it possible to use a simpler formatting algorithm for the easy cases and resort to the current algorithm in the really complicated cases? In other programming languages, formatting can be done in O(n).

@lorenzwalthert
Copy link
Collaborator

What do you mean exactly by "the algorithm"? How the core of styler works is probably best documented in this vignette. I think it would be a lot of work to change the core of styler, because that would be a fundamental change and for me, the advantages of the approach taken are significant (described above). But if you have suggestions, I am open to discuss them. I don't even know if styler is O(n) (depending on how you measure n here I think it might be), I guess we could do a little empirical investigation to figure that out...

I don't see why parallelism would not reduce overall CPU time in any case, #370 certainly in many cases because additional start-up time is made up for with parallelization. Maybe it does not help when styling a single file, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants