Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to palliate: Memory leak following rendering #611

Closed
goinnn opened this issue Apr 3, 2018 · 12 comments
Closed

Proposal to palliate: Memory leak following rendering #611

goinnn opened this issue Apr 3, 2018 · 12 comments
Labels
feature New feature that should be supported

Comments

@goinnn
Copy link

goinnn commented Apr 3, 2018

Hi,

We generate bills (PDFs) from a celery task and this month we generated a PDF with breakdown with 50 pages, until now every PDF had only one page, and we didn't have any problem.

You can see this problem in the next image, this Sunday I had done a manually restart and memory returned to a good score.

memory

I read the next issues: #220 and #70. And we think, we have a nice solution to palliate this problem.

import os
import tempfile

from weasyprint import HTML


def _render_pdf(rendered_html, stylesheets):
    return HTML(string=rendered_html).write_pdf(stylesheets=stylesheets)


def render_pdf(rendered_html, stylesheets=None):
    # This condition or another
    if len(rendered_html) < 50000:
        return _render_pdf(rendered_html, stylesheets)
    else:
        f = tempfile.NamedTemporaryFile()
        cpid = os.fork()
        if not cpid:
            # Child
            pdf = _render_pdf(rendered_html, stylesheets)
            f.write(pdf)
            f.flush()
            os._exit(0)
        else:
            # Parent
            os.waitpid(cpid, 0)
            f.seek(0)
            pdf = f.read()
            f.close()
            return pdf

With this code, we will create a fork of main proccess, we have a peak of memory but when the child proccess is finished the memory returns to regular value.

What do you think about it?

PS: Thanks for this great project!

@goinnn
Copy link
Author

goinnn commented Apr 4, 2018

FYI: The same PDF with WeasyPrint==0.36 our memory was increased ~330MB with WeasyPrint==0.42.2 our memory was increased ~415MB

@liZe
Copy link
Member

liZe commented Apr 18, 2018

Thank you for this issue, and for taking the time to read #220 and #70!

With this code, we will create a fork of main proccess, we have a peak of memory but when the child proccess is finished the memory returns to regular value.

What do you think about it?

Putting rendering steps into separate tasks looks like a good idea. More than memory consumption, it may be really helpful for parallelization.

It's the way to go, but I'm scared 😄.

Your problem is a good pretext to start working on that, but I'll feel ready to merge when we have a solid solution that can be used elsewhere on WeasyPrint and works perfectly on various operating systems. As we now target Python 3 only, we have many useful tools to achieve this in a clean way, but we have to choose them well.

About your memory problem only: some well-known memory leaks have been fixed in the current master branch (mainly caches that are not global to the process anymore), and there's no known memory leak left. If you could find some time to test it, that would be really helpful! You can try to play with the garbage collector as well. Forking is just an efficient solution to a memory problem that shouldn't exist 😉.

FYI: The same PDF with WeasyPrint==0.36 our memory was increased ~330MB with WeasyPrint==0.42.2 our memory was increased ~415MB

That's bad news. Is it possible to get samples that show this problem? Memory decreased by at least ~10% between 0.36 and 0.42.x with my informal testing suite.

One last thing: if you can, use Python 3.6.

@liZe liZe added the feature New feature that should be supported label Apr 18, 2018
@goinnn
Copy link
Author

goinnn commented Apr 20, 2018

That's bad news. Is it possible to get samples that show this problem? Memory decreased by at least ~10% between 0.36 and 0.42.x with my informal testing suite.

This pdf has sensitive information. But I think I can change it for a "lorem ipsum" text.

One last thing: if you can, use Python 3.6.

Currently, we use Python 2.7. But Python 3.X (surely 3.6) is coming soon.

I am very busy. But in 1-2 months I think I will be able to get several hours to create an HTML example. And I will test it with Python 3.6 and differents WeasyPrint versions

Please let me time, and I will be back ;-)

@rubgombar1
Copy link

Hi all!

I am a Pablo partner and i have a bit more time to get the HTML and test the pdf creation with Python 2.7 and Python 3.6. I have made tests with this html and different Python and WeasyPrint versions getting the followings results:

Python version WeasyPrint version Memory consumption Execution time (seconds)
2.7 0.36 ~393MB 18.0193021297
2.7 0.42.2 ~460MB 13.6812310219
2.7 0.42.3 ~459MB 14.5763962269
3.6 0.36 ~245MB 12.514110803604126
3.6 0.42.2 ~222MB 12.128995895385742
3.6 0.42.3 ~220MB 12.056615114212036

To reproduce this data, you can use the following test html:
lorem_ipsum.html.tar.gz

Regards!

@liZe
Copy link
Member

liZe commented Apr 27, 2018

@rubgombar1 Thank you for this table. I'm trying hard to keep both memory consumption and execution as low as possible. I only launch my memory/speed tests with the latest Python version that's why I missed the growing memory between 0.36 and 0.42.3 (we have the ~-10% I was talking about in my previous comment, but only for Python 3.6).

A lot of work has been done to use pure dicts wherever it was possible, to get the best of the huge benefits introduced in Python 3.5 and 3.6 dicts.

Currently, we use Python 2.7. But Python 3.X (surely 3.6) is coming soon.

You now know how much Python 3.6 is good for you! The memory gap between Python 3.5 and Python 3.6 is a big one.

@goinnn
Copy link
Author

goinnn commented May 1, 2018

You now know how much Python 3.6 is good for you! The memory gap between Python 3.5 and Python 3.6 is a big one.

:-)

Using the fork solution (first comment) it is a new memory grafic. It is similar that previous graphic. But in the previous grafic I did a manually restart, and in this grafic the memory went down by itself.

memory

@liZe
Copy link
Member

liZe commented Jan 4, 2019

Small update (results are different from previous table as we have different computers and systems).

Python version WeasyPrint version Memory consumption Execution time (seconds)
3.7 0.42.3 ~265MB 13.5
3.7 44 ~265MB 11.0

I'd like to know if there's really a memory leak (and change the issue title accordingly). @goinnn @rubgombar1 Do you have news about your problem?

@goinnn
Copy link
Author

goinnn commented Jan 18, 2019

@liZe I'm sorry we use Python 2.7 still :-(

We will be able to say something in several months

@liZe
Copy link
Member

liZe commented Jul 25, 2019

A dedicated website has now been created to follow performance and memory use. It makes me feel that I can close this issue for now 😄.

Version 49 is dedicated to performance and bug fixes, the current perf branch already gives impressive results (with changes included in cssselect2).

Feel free to open a new ticket if you find a memory leak that needs to be fixed!

@liZe liZe closed this as completed Jul 25, 2019
@goinnn
Copy link
Author

goinnn commented Mar 3, 2020

Hi @liZe we are migrating from Python 2.7 to Python 3.7 finally.

This month or next month we lanunch it in production enviroment. Then we will be able to comment something about it.

This is our leak memory during billing proccess (1th of every month) We generate (approximately, depends on the month) 1800 pdfs + 2400 excel files

billing

How you see, our memory graph is uglyer than two years ago because currently we generate a lot of more bills (pdfs & excel files) and because two years ago we did fork process when we were generated PDFs with a lot of pages. But we think this was risky and now we don't create fork process. We use a celery task for this process, and we have configured it with this configuration:

--max-memory-per-child=200000

So, celery restarts worker when memory is up to 2GB.

@liZe thanks another time for this project.

We will chat another time in several weeks

@goinnn
Copy link
Author

goinnn commented Apr 1, 2020

Hi @liZe It is our new metric.

billing

Currently, with Python 3.6.10 (almost) does not increase.

But these data are not conclusive, our billing this month is smaller than before month because COVID-19.

We will chat another time in several weeks when COVID-19 is out of our lifes.

Take care!

Thanks!!

@liZe
Copy link
Member

liZe commented Apr 2, 2020

Currently, with Python 3.6.10 (almost) does not increase.

That’s good news.

But these data are not conclusive, our billing this month is smaller than before month because COVID-19.

Of course.

We will chat another time in several weeks when COVID-19 is out of our lifes.

😄

Take care…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature that should be supported
Projects
None yet
Development

No branches or pull requests

3 participants