perf: improve column handling speed, fix infinite loops #1597

aschmitz · 2022-03-17T05:42:15Z

The primary change here is that the column calculation algorithm now attempts to render columns as if they were the full remaining height on the page, rather than to render the entire content regardless of how long the page is.

The worst-case behavior is effectively the same as that of the previous algorithm (as at least one additional pass would be required to determine how high balanced columns should be, but this applies in both cases), but if content would not fit on the page, this can bail and set the page immediately, rather than continuing significantly more calculations.

As an associated benefit, this closes #1020, as we now handle multiple column breaks (which would force a page break) correctly. A test for this behavior is included.

In unscientific tests, this appears to be a significant speedup in worst-case column calculations, for example taking a 100+ page multi-column render from 45 minutes to under two minutes. A more realistic document render time was cut by about 50% (from a starting point of ~five minutes).

It is certainly plausible that there exist better algorithms for determining column heights (such as calculating possible breakpoints for children), but this provides significantly improved execution time over the current algorithm in both pathological and real-world tests, with minimal algorithmic changes. I certainly won't be offended if this gets entirely rewritten in the path to v55/56, but it might be nice to have in the meantime.

The primary change here is that the column calculation algorithm now attempts to render columns as if they were the full remaining height on the page, rather than to render the entire content regardless of how long the page is. The worst-case behavior is effectively the same as that of the previous algorithm (as at least one additional pass would be required to determine how high balanced columns should be, but this applies in both cases), but if content would not fit on the page, this can bail and set the page immediately, rather than continuing significantly more calculations. As an associated benefit, this closes Kozea#1020, as we now handle multiple column breaks (which would force a page break) correctly. A test for this behavior is included.

liZe · 2022-03-17T08:07:43Z

The worst-case behavior is effectively the same as that of the previous algorithm (as at least one additional pass would be required to determine how high balanced columns should be, but this applies in both cases), but if content would not fit on the page, this can bail and set the page immediately, rather than continuing significantly more calculations.

That’s a nice improvement to avoid useless loops.

As an associated benefit, this closes #1020, as we now handle multiple column breaks (which would force a page break) correctly. A test for this behavior is included.

Thanks a lot for that 💜.

In unscientific tests, this appears to be a significant speedup in worst-case column calculations, for example taking a 100+ page multi-column render from 45 minutes to under two minutes. A more realistic document render time was cut by about 50% (from a starting point of ~five minutes).

🎉

It is certainly plausible that there exist better algorithms for determining column heights

Probably. But I’ve been talking with a researcher during a W3C conference who was working on this topic, and he told me that it wasn’t a solved problem and that the "best" way to find column height had yet to be found.

(such as calculating possible breakpoints for children)

That’s the biggest problem left in my opinion.

but this provides significantly improved execution time over the current algorithm in both pathological and real-world tests, with minimal algorithmic changes.

Sure.

I certainly won't be offended if this gets entirely rewritten in the path to v55/56, but it might be nice to have in the meantime.

We’ll keep this at least for version 55. Version 56 may include a lot of changes about the non-standard layouts, so… we’ll see!

aschmitz · 2022-03-17T15:04:54Z

Excellent, thanks! I let some benchmarks run on a sample document overnight, just for reference:

Commit	Mean [s]	Min [s]	Max [s]	Relative
`5ad3b1d`	255.381 ± 2.174	253.637	260.935	2.09 ± 0.02
`ad6e0d7`	122.222 ± 0.801	120.753	123.317	1.00

I really should pull in the WeasyPerf stuff to at least benchmark those as well, though I don't know if this would touch them because they probably don't use columns much.

liZe added the performance Too slow renderings label Mar 17, 2022

liZe added this to the 55.0 milestone Mar 17, 2022

liZe merged commit 58583fd into Kozea:master Mar 17, 2022

liZe mentioned this pull request Mar 17, 2022

Problems with Footnotes when using Columns #1598

Closed

liZe mentioned this pull request May 11, 2022

Really slow rendering of a 3 column layout #1642

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: improve column handling speed, fix infinite loops #1597

perf: improve column handling speed, fix infinite loops #1597

aschmitz commented Mar 17, 2022 •

edited

Loading

liZe commented Mar 17, 2022

aschmitz commented Mar 17, 2022

perf: improve column handling speed, fix infinite loops #1597

perf: improve column handling speed, fix infinite loops #1597

Conversation

aschmitz commented Mar 17, 2022 • edited Loading

liZe commented Mar 17, 2022

aschmitz commented Mar 17, 2022

aschmitz commented Mar 17, 2022 •

edited

Loading