Reliable and easy to setup way to deploy Crawl4ai #180

sean-cofinance · 2024-10-18T14:54:12Z

Hey everyone,

The final step of development—deployment—is the most challenging. I'm sure many of you will agree with me.

Could someone share their experience on the best way to deploy Crawl4AI? Some options to consider are:

Using a big cloud provider like AWS, Azure, or GCP
Considering modern cloud platforms like Railway, Fly, or Vultr
Choosing between a single container or a swarm
Using Kubernetes
Configuring for a server or cluster

Thank you in advance for your answers and thoughts!

chanmathew · 2024-10-20T03:44:48Z

@sean-cofinance I've just recently set this up on Modal.com which was a pretty smooth exercise, here's my code if it helps:

import modal

# Install the necessary dependencies as custom container image which we will pass to our functions
crawler = modal.Image.debian_slim(python_version="3.10").pip_install_from_requirements("requirements.txt").run_commands(
    "apt-get update",
    "apt-get install -y software-properties-common",
    "apt-add-repository non-free",
    "apt-add-repository contrib",
    "playwright install-deps chromium",
    "playwright install chromium",
    "playwright install",
)

import asyncio
from crawl4ai import AsyncWebCrawler
import playwright
from typing import Optional, Union, List
from pydantic import BaseModel, Field
from fastapi import Header, HTTPException
from jwt import decode, PyJWTError
import os

app = modal.App("crawler")

class CrawlRequest(BaseModel):
    url: str
    bypass_cache: bool = Field(default=False)
    # other kwargs

# Define the function that will be executed in the container
@app.function(image=crawler)
@modal.web_endpoint(method="POST", docs=True)
async def crawl(request: CrawlRequest, authorization: str = Header(...)):
    # You will want to have your own authorization strategy here to protect your endpoint
    print(f"Crawling URL: {request}")
    # Create an instance of AsyncWebCrawler
    async with AsyncWebCrawler(verbose=True) as crawler:
        # Run the crawler on the given URL
        crawl_kwargs = request.dict(exclude_unset=True)
        try:
            result = await crawler.arun(**crawl_kwargs)
            print(result)
            return result
        except Exception as e:
            error_message = f"Error during crawling: {str(e)}"
            print(error_message)
            return {"error": error_message}

# Entrypoint that will be used to trigger the crawler when testing locally
@app.local_entrypoint()
async def main(url: str):
    result = crawl.remote(CrawlRequest(url=url))
    print(result)
    return result

My requirements.txt:

crawl4ai
asyncio
playwright
fastapi[standard]
pydantic
PyJWT

sean-cofinance · 2024-10-20T05:21:24Z

@chanmathew, I can't wait to try this out today! Thank you so much. This is really intriguing, and I'm super excited about it!

unclecode mentioned this issue Oct 24, 2024

Using FastAPI for Crawl4AI in a production environment, handling up to 50 concurrent requests. #188

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reliable and easy to setup way to deploy Crawl4ai #180

Reliable and easy to setup way to deploy Crawl4ai #180

sean-cofinance commented Oct 18, 2024

chanmathew commented Oct 20, 2024

sean-cofinance commented Oct 20, 2024

Reliable and easy to setup way to deploy Crawl4ai #180

Reliable and easy to setup way to deploy Crawl4ai #180

Comments

sean-cofinance commented Oct 18, 2024

chanmathew commented Oct 20, 2024

sean-cofinance commented Oct 20, 2024