Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfomance on large pdf #545

Closed
alecasciaro opened this issue Dec 2, 2017 · 12 comments
Closed

Perfomance on large pdf #545

alecasciaro opened this issue Dec 2, 2017 · 12 comments
Labels
performance Too slow renderings

Comments

@alecasciaro
Copy link

Hi,

i'm trying to convert an html page that generate a 52 pages pdf.
Weasyprint needs 1.40 minute to create the the 52 page pdf.
I looked for a solution, removing image and all css (thinking that maybe some css convertion could be slow) but performance increase only 20 seconds.
Someone can give me some advice to improve the performance?
Thanks

@liZe
Copy link
Member

liZe commented Dec 2, 2017

WeasyPrint is known to be slow. The only general advice I can give you is to use Python 3.6 and the latest version of WeasyPrint. If you can share your HTML source, I can try to find why your document takes an especially long time to get rendered.

@alecasciaro
Copy link
Author

@liZe liZe added the performance Too slow renderings label Dec 4, 2017
@liZe
Copy link
Member

liZe commented Dec 4, 2017

@alecasciaro Thank you for this example. It takes about 2:20 minutes for me, with:

  • about 20 seconds to get, parse and apply HTML/CSS,
  • about 20 seconds to create the formatting structure,
  • about 40 seconds to create the layout,
  • about 60 seconds to create the fixed boxes and page margins layout,
  • a couple of seconds to generate the PDF.

There's a big performance problem about the fixed boxes or page margins. I'll try to find where it comes from.

@alecasciaro
Copy link
Author

Ok thank you @liZe, i realy appreciate it

@eligiobz
Copy link

@liZe Is there any advance on this issue? I'd like to know if there's something in particular to look at to help improve this

@liZe
Copy link
Member

liZe commented Feb 16, 2018

OK, I've found the source of the problem. More than half of the time is taken by downloading photos.

A solution to avoid that is to render PDFs on the same server than your website, and use something that gets the resources on disk rather than through network. That's what Flask-WeasyPrint or Django-WeasyPrint do for example, you can take a look on how they work and duplicate the behaviour for your own use.

@liZe liZe closed this as completed Feb 16, 2018
@fcaldas
Copy link

fcaldas commented Feb 21, 2018

@liZe Just a question do WeasyPrint download the images in parallel or serially?

@alecasciaro
Copy link
Author

alecasciaro commented Feb 21, 2018

@liZe , i tried to remove all image and all css but performance increase only 20 seconds.
In your case, the server is a test environment, that why it's slow. I think the issue it's not strictly related to images. If you want i can remove images to test better.

Thanks for your time.

@liZe
Copy link
Member

liZe commented Feb 21, 2018

Just a question do WeasyPrint download the images in parallel or serially?

They're downloaded serially. I would be better to have parallel download, but that would not be as interesting as in a browser: layout is only done once in WeasyPrint and depends on image sizes, when it's done again and again in a browser while images are still downloading.

I can't find any simple solution right now, but we may find a smart way to asynchronously download images when creating the boxes and wait for them if they're not downloaded yet during the layout. If you're interested, that's a cool new issue to open 😉!

@liZe
Copy link
Member

liZe commented Feb 21, 2018

@liZe , as i said i tried to remove all image and all css but performance increase only 20 seconds.

Sorry, I forgot this comment.

With only background-image disabled, WeasyPrint generates a 91-page PDF in 1:05 for me (2:20 with background-image enabled). It's slower than a browser, but it's in WeasyPrint's poor standards (between 0.1 and 1 second per page for large real-life documents).

Other bugs have been reported with performance problems (see #553 or #483 for example), but these issues were focused on special cases that were outstandingly slow. Your case just shows that WeasyPrint is generally slow 😢, there's unfortunately no magic wand to fix your problem.

I try hard to improve performance and memory use on a regular basis (see #384 and #70 for example), but it's an endless, very frustrating work. I can sometimes spend days getting a 5% speed improvement … or often nothing.

I'm open to suggestions (and parallel download of resources is a good one), but I didn't find another specific point of failure I can work on for your document.

@Saksow
Copy link

Saksow commented Apr 5, 2018

Hi @liZe it's been a while 😄 I am interested in learning the best practices when designing you HTML/CSS template and using WeasyPrint, in order to get the fastest result possible. Are there written guidelines or can you list few points? Thanks!

@jnoortheen
Copy link

@liZe As you know, In Python world, often speedup involves writing part of code in Cython. Eventhough it is tedious at first, it will not affect the users installing weasyprint through pip if binary wheels are available. Another option is to try nuitka for certain performance related modules and compile them alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

No branches or pull requests

6 participants