Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve column handling speed, fix infinite loops #1597

Merged
merged 1 commit into from
Mar 17, 2022

Conversation

aschmitz
Copy link
Contributor

@aschmitz aschmitz commented Mar 17, 2022

The primary change here is that the column calculation algorithm now attempts to render columns as if they were the full remaining height on the page, rather than to render the entire content regardless of how long the page is.

The worst-case behavior is effectively the same as that of the previous algorithm (as at least one additional pass would be required to determine how high balanced columns should be, but this applies in both cases), but if content would not fit on the page, this can bail and set the page immediately, rather than continuing significantly more calculations.

As an associated benefit, this closes #1020, as we now handle multiple column breaks (which would force a page break) correctly. A test for this behavior is included.

In unscientific tests, this appears to be a significant speedup in worst-case column calculations, for example taking a 100+ page multi-column render from 45 minutes to under two minutes. A more realistic document render time was cut by about 50% (from a starting point of ~five minutes).

It is certainly plausible that there exist better algorithms for determining column heights (such as calculating possible breakpoints for children), but this provides significantly improved execution time over the current algorithm in both pathological and real-world tests, with minimal algorithmic changes. I certainly won't be offended if this gets entirely rewritten in the path to v55/56, but it might be nice to have in the meantime.

The primary change here is that the column calculation algorithm now
attempts to render columns as if they were the full remaining height on
the page, rather than to render the entire content regardless of how
long the page is.

The worst-case behavior is effectively the same as that of the previous
algorithm (as at least one additional pass would be required to
determine how high balanced columns should be, but this applies in both
cases), but if content would not fit on the page, this can bail and set
the page immediately, rather than continuing significantly more
calculations.

As an associated benefit, this closes Kozea#1020, as we now handle multiple
column breaks (which would force a page break) correctly. A test for
this behavior is included.
@liZe liZe added the performance Too slow renderings label Mar 17, 2022
@liZe liZe added this to the 55.0 milestone Mar 17, 2022
@liZe
Copy link
Member

liZe commented Mar 17, 2022

The worst-case behavior is effectively the same as that of the previous algorithm (as at least one additional pass would be required to determine how high balanced columns should be, but this applies in both cases), but if content would not fit on the page, this can bail and set the page immediately, rather than continuing significantly more calculations.

That’s a nice improvement to avoid useless loops.

As an associated benefit, this closes #1020, as we now handle multiple column breaks (which would force a page break) correctly. A test for this behavior is included.

Thanks a lot for that 💜.

In unscientific tests, this appears to be a significant speedup in worst-case column calculations, for example taking a 100+ page multi-column render from 45 minutes to under two minutes. A more realistic document render time was cut by about 50% (from a starting point of ~five minutes).

🎉

It is certainly plausible that there exist better algorithms for determining column heights

Probably. But I’ve been talking with a researcher during a W3C conference who was working on this topic, and he told me that it wasn’t a solved problem and that the "best" way to find column height had yet to be found.

(such as calculating possible breakpoints for children)

That’s the biggest problem left in my opinion.

but this provides significantly improved execution time over the current algorithm in both pathological and real-world tests, with minimal algorithmic changes.

Sure.

I certainly won't be offended if this gets entirely rewritten in the path to v55/56, but it might be nice to have in the meantime.

We’ll keep this at least for version 55. Version 56 may include a lot of changes about the non-standard layouts, so… we’ll see!

@liZe liZe merged commit 58583fd into Kozea:master Mar 17, 2022
@aschmitz
Copy link
Contributor Author

Excellent, thanks! I let some benchmarks run on a sample document overnight, just for reference:

Commit Mean [s] Min [s] Max [s] Relative
5ad3b1d 255.381 ± 2.174 253.637 260.935 2.09 ± 0.02
ad6e0d7 122.222 ± 0.801 120.753 123.317 1.00

I really should pull in the WeasyPerf stuff to at least benchmark those as well, though I don't know if this would touch them because they probably don't use columns much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite loop in multi-column layout after first (page-)break-before
2 participants