Skip to content

A tiny python PDF generation library which is not based on wkhtmltopdf.

License

Notifications You must be signed in to change notification settings

lennyerik/pyhtmltopdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyhtmltopdf

A tiny python PDF generation library which is intentionally not based on wkhtmltopdf. Rather, it uses Chromium with the playwright library (a python implementation of Google's Puppeteer protocol).

Installation

pip install git+https://github.com/lennyerik/pyhtmltopdf.git

Or, for development:

git clone https://github.com/lennyerik/pyhtmltopdf.git
cd pyhtmltopdf
pip install -e ".[dev]"

Usage

Simple use-cases can use the from_file, from_url and from_string (or their async-enabled variants afrom_file, afrom_url, afrom_string) functions:

from pyhtmltopdf import from_file

# Convert input.html to output.pdf
from_file(
    "./input.html",
    "output.pdf",
    render_options={
        "margin": {
            "top": "3cm",
            "left": "2cm",
            "right": "2cm",
            "bottom": "3cm",
        },
    },
)


from pyhtmltopdf import from_url

# We can also write the file ourselves
out_file = open("output.pdf", "wb")
out_file.write(from_url(
    "https://example.com/",
    header_html="<p style=\"font-size: 12pt;\">This is a demo header</p>",
    footer_html="<p style=\"font-size: 12pt;\">Page No: <span class=\"pageNumber\"></span></p>",
    render_options={
        "margin": {
            "top": "2cm",
            "bottom": "2cm",
        }
    },
))

If you already have a chromium browser installed, you can add executable_path to the launch_options like so:

from_file(
    "./input.html",
    "output.pdf",
    launch_options={
        # This example uses Brave as the chromium-based browser
        "executable_path": "/usr/bin/brave",
    }
    render_options={
        "margin": {
            "top": "3cm",
            "left": "2cm",
            "right": "2cm",
            "bottom": "3cm",
        },
    },
)

In case you want to process multiple PDFs, the class based API is faster, since it only spins up one Chromium instance:

from pyhtmltopdf import HTMLToPDFConverter

with HTMLToPDFConverter() as converter:
    converter.from_url(
        "https://example.com/",
        "output.pdf",
    )

# Or, alternatively

converter = HTMLToPDFConverter(launch_options={
    # Launch options are passed in here
    "executable_path": "/usr/bin/brave",
})
converter.init()
converter.from_url(
    "https://example.com/",
    "output.pdf",
)
converter.finish()

Or even asynchronously:

from pyhtmltopdf import AHTMLToPDFConverter

async with AHTMLToPDFConverter() as converter:
    await converter.from_url(
        "https://example.com/",
        "output.pdf",
    )


# Or, alternatively

converter = AHTMLToPDFConverter(launch_options={
    # Launch options are passed in here
    "executable_path": "/usr/bin/brave",
})
await converter.init()
await converter.from_url(
    "https://example.com/",
    "output.pdf",
)
await converter.finish()

API

All from_x functions have the following parameters:

  • file_path / url / string: The input HTML / URL / string to process
  • output_path: An optional output path to save the PDF to. Defaults to None
  • header_html: An optional HTML string for the page header. Defaults to ""
  • footer_html: An optional HTML string for the page header. Defaults to ""
  • render_options: Can be any of these PDF rendering options

Additionally, the top-level from_x functions as well as the constructors of the HTMLToPDFConverter and AHTMLToPDFConverter classes take the launch_options argument which can be any of these launch options.

Development

To format the code, install with dev dependencies and run

black .

To run the unit tests, install with dev dependencies and execute

python -m unittest test

You can also set a specific browser:

CHROMIUM=/usr/bin/brave python -m unittest test

Why not wkhtmltopdf?

pyhtmltopdf uses an up-to-date version of Chromium, enabling use of features such as flexbox, which are not supported by wkhtmltopdf's old version of WebKit. Furthermore, the current status of the wkhtmltopdf project is questionable:

  • the version of Qt it uses is unsupported since 2015
  • it requires a patch to qt in order to enable full header and footer support
  • the version of WebKit it uses is over 4 years old

About

A tiny python PDF generation library which is not based on wkhtmltopdf.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages