Skip to content

Cut LLM costs and boost processing speed by transforming verbose HTML into efficient Emmet notation

License

Notifications You must be signed in to change notification settings

emmetify/emmetify-py

Repository files navigation

Emmetify 🚀

pypi versions PyPI Downloads codecov license

Cut your LLM processing costs by up to 90% by transforming verbose HTML into efficient Emmet notation, without losing structural integrity.

Why Emmetify? 🤔

  • 💰 Drastically Reduce Costs - Process HTML with your LLM agents at a fraction of the cost by using our efficient Emmet-based compression
  • 🎯 Maintain Performance - Your LLM agents can still generate XPath and CSS selectors with the same accuracy using the compressed format
  • 🔌 Seamless Integration - Emmet syntax is well-understood by all major LLMs thanks to its 10+ years of widespread use in frontend development
  • Fast Processing - Less tokens means faster processing times for your HTML analysis tasks

How It Works 🛠️

Emmetify converts complex HTML structures into concise Emmet notation. For example:

<div class="container">
    <header class="header">
        <nav class="nav">
            <ul class="nav-list">
                <li class="nav-item"><a href="#">Link</a></li>
            </ul>
        </nav>
    </header>
</div>

Becomes:

div.container>header.header>nav.nav>ul.nav-list>li.nav-item>a[href=#]

Using the OpenAI Tokenizer, we can see this simple transformation reduces token count from:

  • HTML: 59 tokens
  • Emmet: 20 tokens

That's 66% fewer tokens while preserving all structural information! And this is just with default settings.

You can achieve even higher compression rates (up to 90%, or even more depending on the HTML structure) by using advanced configuration options:

  • Removing unnecessary tags
  • Simplifying attributes
  • Optimizing class names
  • Shortening URLs

Check our documentation for detailed optimization strategies and their impact on token reduction.

The Technology Behind It 🔍

Emmetify leverages Emmet notation - a powerful and mature syntax that's been a standard in web development for over a decade. While developers typically use Emmet to expand short abbreviations into HTML:

div.container>h1{Title}+p{Content}

↓ Expands to ↓

<div class="container">
    <h1>Title</h1>
    <p>Content</p>
</div>

Emmetify uses this well-established syntax in reverse, converting verbose HTML back into this concise format that LLMs can understand just as well as raw HTML.

Installation 🔧

pip install emmetify

Usage 💻

Basic Usage

from emmetify import Emmetifier
import requests

emmetifier = Emmetifier()
html = requests.get("https://example.com").text
emmet = emmetifier.emmetify(html)
print(emmet)

Advanced HTML Simplification ⚡

Transform verbose HTML into its most essential form while preserving navigational structure. This mode intelligently:

  • Skips non-essential HTML tags
  • Prioritizes important attributes
  • Removes redundant information

For example, this verbose HTML:

<link rel="stylesheet" href="style.css">
<div id="main" class="container" style="color: red;" data-test="ignore">Example</div>

Becomes this concise Emmet notation:

div#main.container{Example}

Much shorter, yet retains all necessary information for LLM navigation and processing!

Advanced Usage:

from emmetify import Emmetifier
import requests
import openai

# Configure HTML simplification
emmetifier = Emmetifier(config={
    "html": {
        "skip_tags": True,
        "prioritize_attributes": True
    }
})

# Fetch and process HTML
html = requests.get("https://example.com").text
result = emmetifier.emmetify(html)["result"]
print(result)

# Use with your favorite LLM
llm = openai.OpenAI()
prompt = f"Get list of xpath selectors for all the links on the following page: {result}"
response = llm.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": emmet}],
)

Supported Formats 📊

  • ✅ HTML
  • 🚧 XML (Coming Soon)
  • 🚧 JSON (Coming Soon)

About

Cut LLM costs and boost processing speed by transforming verbose HTML into efficient Emmet notation

Topics

Resources

License

Stars

Watchers

Forks