WeasyPrint consuming a lot of memory when rendering tables with 5000 rows #1104

gaurav1999 · 2020-04-22T20:06:15Z

Hi, Weasyprint made my life a lot easier, but recently I noticed that it's consuming a lot of memory and on top of that on every print call the previous memory adds up, I am running the latest version 51 of Weasyprint.

Python-> Python 3.7
Distro-> Fedora Workstation 31

        df = self.get_df(limit=True)
        no_of_col = len(df.columns)
        include_index = not isinstance(df.index, pd.RangeIndex)
        pd.set_option('display.width', 1000)
        pd.set_option('colheader_justify', 'center')

        html_string = """<html>
                        <head></head>
                        <style>
                                .tablestyle {
                                    font-size: 10pt;
                                    font-family: Arial;
                                    border-collapse: collapse;
                                    border: 1px solid silver;

                                }

                                .tablestyle td, th {
                                    padding: 5px;
                                }

                                .tablestyle tr:nth-child(even) {
                                    background: #E0E0E0;
                                }

                                .tablestyle tr:hover {
                                    background: silver;
                                    cursor: pointer;
                                }
                        </style>
                        <body>
                        """+ df.to_html(classes='tablestyle')+"""</body></html>"""

        # size_css = weasyprint.CSS(string=("@page {size: A3; margin: 0in 0.44in 0.2in 0.44in;}"))
        # pdf = weasyprint.HTML(string=html_string).write_pdf(stylesheets=[size_css])
        # gc.collect()
       #return pdf

Note: I actually commented last lines of code for testing purposes.

I am trying to print a pandas Dataframe , everything works good except memory and CSS.

Version 1.
I simply printed the page without mentioning the size, and the styling on tables worked.

Version 2.
I introduced size_css because my content was large, and I needed A3 paper, and post that the styling on tables is not working, which I am not sure why ?

I noticed performance issues as well when I ran this on 1000+ rows, it eats up a lot of memory, not sure why .. I read issue #220 about this, and tried the @font-face but it's not helping.

I ran this once it ate up 1.4 Gig of Ram, then on second time just after the previous one it added up and ate 2.1 Gig of memory.

I thought I might need to manually do gc.collect() but it has no effect.
Hence it's commented in the code.

Also, I thought that maybe the HTML string is getting a lot big, so I tested without rendering any PDF, but turns out it's less than 10Mb.

And, when I limit the dataset size to 50-100 rows something small, it behaves quite well, and on subsequent prints the memory do not add up like it happens with large ones.

I will attach the table's CSV, for your testing and also attach the Rendered PDF where you will be able to notice the table styling difference which I mentioned about.

Thanks!

The text was updated successfully, but these errors were encountered:

gaurav1999 · 2020-04-22T20:12:40Z

Data_and_pdf.zip

In this, the Data is of the large table, and you will see:

My rows printed are just 4999
My table formatting is not like mentioned in styling

Note: I queried 10,000 rows out of this data of 50,000 rows.

There is another Boys0604_2212.pdf file, which displays that before adding @page size CSS I was able to get CSS rendered on PDF.

Thanks.

liZe · 2020-04-22T20:21:56Z

#70 is probably interesting to read and could give expected levels of memory needed to render long tables.

I’ll check your example as soon as possible.

gaurav1999 · 2020-04-22T20:48:09Z

Yes, I think I checked the issue out, you mentioned about StyleDict, and deduplication of some rules. I might not be aware of those, but probably they are not in my code.

Can you point out what I might need to improve in my code on this reference ?

Also an Update:

I think CSS is applied at some extent. But if you will notice in the Boys.pdf and the Test.pdf borders are there, but Cell highlighting is maybe what's missing.
So I think I might re-frame my issue and say that:

Once I applied @page to resize my page to A3
the CSS rules in style:

.tablestyle tr:nth-child(even) {
      background: #E0E0E0;
 }
 .tablestyle tr:hover {
      background: silver;
     cursor: pointer;
 }

Have no effect.

Thanks for looking into this, really appreciate it.

gaurav1999 · 2020-04-30T19:21:59Z

Any updates on the issue ?

liZe · 2020-06-10T06:38:51Z

I’m back, sorry for the delay…

My rows printed are just 4999

They’re 5000, the first one is 0 😉.

2. My table formatting is not like mentioned in styling

There’s no reason why it shouldn’t work. Maybe there’s a problem in the CSS you generate? Could you please provide the generated HTML file?

I ran this once it ate up 1.4 Gig of Ram, then on second time just after the previous one it added up and ate 2.1 Gig of memory.

That’s not normal to have such a difference. If the variable holding the first document is deleted (by using del or going out of scope), most of the memory should be freed (it is for me). Could you please provide a simple Python script with this problem?

gaurav1999 · 2020-06-10T11:00:53Z

Hey, lize thanks for getting back, seems like the row thing was my own fault, I get it now 👍 , and same is for styling. About performance, I will get back to you.

Really embarrassed for making typo which messed my styling.

When I visited the code once again, to give you samples, after long time, I realised my mistake thanks to you :)

gaurav1999 · 2020-06-10T12:07:46Z

The code I am using is:

def get_pdf(self):	
        df = self.get_df()	
        no_of_col = len(df.columns)	
        html_string = """<html>	
                    <head></head>	
                    <style>	
                            .tablestyle {	
                                font-size: 11pt;	
                                font-family: Arial;	
                                border-collapse: collapse;	
                                border: 1px solid silver;	
                            }	
                            .tablestyle td, th {	
                                padding: 5px;	
                            }	
                            .tablestyle tr:nth-child(even) {	
                                background: #E0E0E0;	
                            }	
                            .tablestyle tr:hover {	
                                background: silver;	
                                cursor: pointer;	
                            }	
                    </style>	
                    <body>	
                    """+ df.to_html(classes='tablestyle')+"""</body></html>"""	
        if no_of_col > config.get("SOME_CONFIG_FLAG"):
            size_css = weasyprint.CSS(string=("@page {size: A3; margin: 0in 0.44in 0.2in 0.44in;}"))
        else:
            size_css = weasyprint.CSS(string=("@page {size: A4;}"))
        
        pdf = weasyprint.HTML(string=html_string).write_pdf(stylesheets=[size_css])
        del html_string
        return pdf

So I am modifying this code for export features in apache/incubator-superset project, under the file viz.py

When, I downloaded a chart with 6000 rows in pdf, I got the response, but initially it consumed 1.6 Gig of ram, then when I launched second request once the first got over the number jumped to 2.3 gigs, later on I launched two multiple requests and number further jumped to 3.9 gigs, not sure why is this happening, and it's of-course not good for multiple people using the web app and printing the chart.

I will be posting the csv data and pdf which gets printed.

So seems like styling is working, I am getting all the rows, at the end performance is huge bottle neck.

Thanks for taking a look, I will be happy to assist you with providing a modified superset branch if you want to test this out yourself on apache/superset.

Test_Flight_Data1006_173.zip

liZe · 2023-08-31T18:36:51Z

Related to #1950 and #1923.

liZe · 2024-08-03T07:11:09Z

With recent versions of WeasyPrint, there’s a difference of less than 20% between rendering long tables or the same amount of divs. WeasyPrint still uses too much memory, but tables are now not that much worse than other boxes.

Rendering times have been improved with 50456df too.

liZe added the performance Too slow renderings label Jun 10, 2020

liZe changed the title ~~WeasyPrint consuming a lot of memory when rendering tables of size 5000 rows~~ WeasyPrint consuming a lot of memory when rendering tables with 5000 rows Jan 16, 2021

liZe closed this as completed Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeasyPrint consuming a lot of memory when rendering tables with 5000 rows #1104

WeasyPrint consuming a lot of memory when rendering tables with 5000 rows #1104

gaurav1999 commented Apr 22, 2020

gaurav1999 commented Apr 22, 2020 •

edited

Loading

liZe commented Apr 22, 2020

gaurav1999 commented Apr 22, 2020

gaurav1999 commented Apr 30, 2020 •

edited

Loading

liZe commented Jun 10, 2020

gaurav1999 commented Jun 10, 2020

gaurav1999 commented Jun 10, 2020 •

edited

Loading

liZe commented Aug 31, 2023

liZe commented Aug 3, 2024

WeasyPrint consuming a lot of memory when rendering tables with 5000 rows #1104

WeasyPrint consuming a lot of memory when rendering tables with 5000 rows #1104

Comments

gaurav1999 commented Apr 22, 2020

gaurav1999 commented Apr 22, 2020 • edited Loading

liZe commented Apr 22, 2020

gaurav1999 commented Apr 22, 2020

gaurav1999 commented Apr 30, 2020 • edited Loading

liZe commented Jun 10, 2020

gaurav1999 commented Jun 10, 2020

gaurav1999 commented Jun 10, 2020 • edited Loading

liZe commented Aug 31, 2023

liZe commented Aug 3, 2024

gaurav1999 commented Apr 22, 2020 •

edited

Loading

gaurav1999 commented Apr 30, 2020 •

edited

Loading

gaurav1999 commented Jun 10, 2020 •

edited

Loading