Docker Image #155

vikaskookna · 2024-10-11T06:43:59Z

I created aws lambda docker image, and it fails on this line
from crawl4ai import AsyncWebCrawler

  "errorMessage": "[Errno 30] Read-only file system: '/home/sbx_user1051'",
  "errorType": "OSError",
  "requestId": "",
  "stackTrace": [
    "  File \"/var/lang/lib/python3.12/importlib/__init__.py\", line 90, in import_module\n    return _bootstrap._gcd_import(name[level:], package, level)\n",
    "  File \"<frozen importlib._bootstrap>\", line 1387, in _gcd_import\n",
    "  File \"<frozen importlib._bootstrap>\", line 1360, in _find_and_load\n",
    "  File \"<frozen importlib._bootstrap>\", line 1331, in _find_and_load_unlocked\n",
    "  File \"<frozen importlib._bootstrap>\", line 935, in _load_unlocked\n",
    "  File \"<frozen importlib._bootstrap_external>\", line 995, in exec_module\n",
    "  File \"<frozen importlib._bootstrap>\", line 488, in _call_with_frames_removed\n",
    "  File \"/var/task/lambda_function.py\", line 3, in <module>\n    from crawl4ai import AsyncWebCrawler\n",
    "  File \"/var/lang/lib/python3.12/site-packages/crawl4ai/__init__.py\", line 3, in <module>\n    from .async_webcrawler import AsyncWebCrawler\n",
    "  File \"/var/lang/lib/python3.12/site-packages/crawl4ai/async_webcrawler.py\", line 8, in <module>\n    from .async_database import async_db_manager\n",
    "  File \"/var/lang/lib/python3.12/site-packages/crawl4ai/async_database.py\", line 8, in <module>\n    os.makedirs(DB_PATH, exist_ok=True)\n",
    "  File \"<frozen os>\", line 215, in makedirs\n",
    "  File \"<frozen os>\", line 225, in makedirs\n"
  ]
}

The text was updated successfully, but these errors were encountered:

akamf · 2024-10-11T08:01:20Z

I had the same issue and bypassed it by set the DP_PATH to '/tmp/' (the only write-able dir i AWS Lambda) befor importing the crawl4ai-package. My solution:

import os
from pathlib import Path

os.makedirs('/tmp/.crawl4ai', exist_ok=True)
DB_PATH = '/tmp/.crawl4ai/crawl4ai.db'
Path.home = lambda: Path("/tmp")

from crawl4ai import AsyncWebCrawler

Hope this works for you as well.

vikaskookna · 2024-10-11T10:04:46Z

ok I will try this @akamf
Did you create lambda layer or a docker image, when i tried with layer it exceeded 250 MB limit, how did you mange this?

vikaskookna · 2024-10-11T10:19:36Z

After doing what you mentioned I got this error
Error processing https://chatclient.ai: BrowserType.launch: Executable doesn't exist at /home/sbx_user1051/.cache/ms-playwright/chromium-1134/chrome-linux/chrome

akamf · 2024-10-11T11:34:44Z

I created a Docker image where I installed Playwright and its dependencies and then chromium with playwright. The Docker image is really big though (because of Playwright I guess), so I'm currently working on optimizing it.

But our latest Dockerfile looks like this:

FROM amazonlinux:2 AS build

RUN curl -sL https://rpm.nodesource.com/setup_16.x | bash - && \
    yum install -y nodejs gcc-c++ make python3-devel \
    libX11 libXcomposite libXcursor libXdamage libXext libXi libXtst cups-libs \
    libXScrnSaver pango at-spi2-atk gtk3 iputils libdrm nss alsa-lib \
    libgbm fontconfig freetype freetype-devel ipa-gothic-fonts

RUN npm install -g playwright && \
    PLAYWRIGHT_BROWSERS_PATH=/ms-playwright-browsers playwright install chromium

FROM public.ecr.aws/lambda/python:3.11

WORKDIR ${LAMBDA_TASK_ROOT}

COPY requirements.txt .
RUN pip3 install --upgrade pip && \
    pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" --verbose

COPY --from=build /usr/lib /usr/lib
COPY --from=build /usr/local/lib /usr/local/lib
COPY --from=build /usr/bin /usr/bin
COPY --from=build /usr/local/bin /usr/local/bin
COPY --from=build /ms-playwright-browsers /ms-playwright-browsers

ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright-browsers

COPY handler.py .

CMD [ "handler.main" ]

I don't know if this is the best solution, but it works for us. Like I said, I'm working on some optimisation for it.

vikaskookna · 2024-10-11T13:03:01Z

thanks @akamf I tried this but gave me these errors, i'm using m1 mac and built the image using this command

docker build --platform linux/amd64 -t fetchlinks .

var/task/playwright/driver/node: /lib64/libm.so.6: version GLIBC_2.27' not found (required by /var/task/playwright/driver/node)
/var/task/playwright/driver/node: /lib64/libc.so.6: version GLIBC_2.28' not found (required by /var/task/playwright/driver/node)

unclecode · 2024-10-12T05:50:09Z

Hi @vikaskookna @akamf

By the next week, I will create the Docker file and also upload the Docker image to a Docker hub. I hope this can also help you.

vikaskookna · 2024-10-22T04:53:29Z

@unclecode is docker image ready?

unclecode · 2024-10-24T11:05:56Z

@vikaskookna This weekend, hopefully everything will be ready.

vikaskookna · 2024-10-28T07:30:30Z

@unclecode is it ready, can you please share link?

unclecode · 2024-10-28T08:51:42Z

@vikaskookna I released 0.3.72, Now working on Docker, after done with testing, I share it soon. Stay tunes please.

vikaskookna · 2024-11-02T14:10:54Z

@unclecode any updates, can you give any estimates ?

unclecode · 2024-11-02T14:33:55Z

@unclecode any updates, can you give any estimates ?

Its in this coming week plan 🤞

vikaskookna · 2024-11-05T20:30:16Z

@akamf @unclecode i have tried to create lambda function with your code as well as the recently release docker image, i'm getting this error
Error: fork/exec /lambda-entrypoint.sh: exec format error Runtime.InvalidEntrypoint

Here's my docker file

FROM unclecode/crawl4ai:latest as build

FROM public.ecr.aws/lambda/python:3.11

WORKDIR ${LAMBDA_TASK_ROOT}

COPY requirements.txt .
RUN pip3 install --upgrade pip && \
    pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" --verbose

COPY --from=build /usr/lib /usr/lib
COPY --from=build /usr/local/lib /usr/local/lib
COPY --from=build /usr/bin /usr/bin
COPY --from=build /usr/local/bin /usr/local/bin

COPY lambda_function.py .

CMD [ "lambda_function.lambda_handler" ]

unclecode · 2024-11-06T06:16:12Z

@vikaskookna I'm not very familiar with Lambda and AWS, but based on my research, I can suggest a few things. The issue seems to be with your Dockerfile build. You don't need to pull the crowdfrey/eye image; instead, install it using pip. Here are some points to consider.

Your Dockerfile has this structure:

FROM unclecode/crawl4ai:latest as build    # Using your image as base
FROM public.ecr.aws/lambda/python:3.11     # Then switching to Lambda base

The issue is in the Dockerfile. The problem is you are doing a multi-stage build but not properly carrying over the necessary components from Crael4ai image. When you switch to the Lambda base image, you are only copying specific directories (/usr/lib, /usr/local/lib, etc.) which isn't sufficient to maintain the functionality of Crawl4ai library.

Here's how you should modify their Dockerfile:

Instead of using your Docker image as a base and trying to copy system libraries (which can cause compatibility issues), they should:
- Start directly with the AWS Lambda Python base image
- Install the necessary system dependencies
- Install your package directly via pip (pip install crawl4ai)
The Runtime.InvalidEntrypoint error they're seeing is likely because:
- The copied system libraries from your image might not be compatible with the Lambda environment
- The Lambda environment expects specific file permissions and configurations

If you need specific features from crawl4ai, you should:

# In lambda_function.py
from crawl4ai import whatever_they_need

def lambda_handler(event, context):
    # your code using crawl4ai
    return {
        'statusCode': 200,
        'body': 'Success'
    }

AWS Lambda has specific requirements and constraints that require a different approach to containerization.

unclecode · 2024-11-06T06:17:03Z

@vikaskookna Whenever you're able to handle this, I'd appreciate it if you could share a nice way to deploy it on AWS lambda. We may add it to our documentation and acknowledge you as a collaborator. Please do your best to fix and deploy it.

vikaskookna · 2024-11-06T07:37:34Z

@unclecode yes will do share if things starts to work, right now i'm getting this error in lambda can you please advise

[LOG] 🚀 Crawl4AI 0.3.73
[ERROR] 🚫 arun(): Failed to crawl https://example.com, error: Browser.new_context: Target page, context or browser has been closed
END RequestId: 7fa65902-acfd-45d4-b55a-1a77337f65f4

unclecode · 2024-11-06T13:47:25Z

@vikaskookna This is a challenging one :)) since I have to do it in the lambda and I don't have much experience with it. But this shows that it couldn't create it instead of the browser on the machine running the lambda functions anyway. We'll keep this issue open to see how it goes and I'll try to find time to practice, do it in the lambda and update my experience, which I'll then share.

vikaskookna · 2024-11-06T20:04:10Z

I figured out the lambda docker image but this error seems to be coming from library, on some digging i found out that this is playwright issue

unclecode · 2024-11-12T07:17:27Z

@vikaskookna any update or progress on this issue that you could handle to fix and get it running on the lambda

vikaskookna · 2024-11-13T06:22:06Z

No @unclecode I couldn't figure out this Browser closed issue

unclecode · 2024-11-13T06:49:49Z

@vikaskookna You are still trying to run it on lambda right?

tanwar4 · 2024-12-23T17:50:14Z

@vikaskookna Can you share the docker image snippet? I am unable to create the docker image using multi stage build, crawl4ai fails to start.

unclecode · 2024-12-25T12:27:53Z

@tanwar4 I appreciate it if you share your details with me because I am currently redesigning the entire Docker concepts from the ground up based on the issues I face. The new approach is very different and much better than before. I believe your information will also help me. In a week or two, we will have this crazy Docker version. ;)

unclecode self-assigned this Oct 12, 2024

unclecode added enhancement New feature or request question Further information is requested labels Oct 12, 2024

unclecode changed the title ~~Fails on aws lambda~~ Docker Image Oct 12, 2024

unclecode mentioned this issue Oct 12, 2024

aws lambda layer #151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Image #155

Docker Image #155

vikaskookna commented Oct 11, 2024

akamf commented Oct 11, 2024

vikaskookna commented Oct 11, 2024

vikaskookna commented Oct 11, 2024

akamf commented Oct 11, 2024

vikaskookna commented Oct 11, 2024 •

edited

Loading

unclecode commented Oct 12, 2024

vikaskookna commented Oct 22, 2024

unclecode commented Oct 24, 2024

vikaskookna commented Oct 28, 2024

unclecode commented Oct 28, 2024

vikaskookna commented Nov 2, 2024

unclecode commented Nov 2, 2024 •

edited

Loading

vikaskookna commented Nov 5, 2024

unclecode commented Nov 6, 2024

unclecode commented Nov 6, 2024

vikaskookna commented Nov 6, 2024

unclecode commented Nov 6, 2024

vikaskookna commented Nov 6, 2024

unclecode commented Nov 12, 2024

vikaskookna commented Nov 13, 2024

unclecode commented Nov 13, 2024

tanwar4 commented Dec 23, 2024

unclecode commented Dec 25, 2024

Docker Image #155

Docker Image #155

Comments

vikaskookna commented Oct 11, 2024

akamf commented Oct 11, 2024

vikaskookna commented Oct 11, 2024

vikaskookna commented Oct 11, 2024

akamf commented Oct 11, 2024

vikaskookna commented Oct 11, 2024 • edited Loading

unclecode commented Oct 12, 2024

vikaskookna commented Oct 22, 2024

unclecode commented Oct 24, 2024

vikaskookna commented Oct 28, 2024

unclecode commented Oct 28, 2024

vikaskookna commented Nov 2, 2024

unclecode commented Nov 2, 2024 • edited Loading

vikaskookna commented Nov 5, 2024

unclecode commented Nov 6, 2024

unclecode commented Nov 6, 2024

vikaskookna commented Nov 6, 2024

unclecode commented Nov 6, 2024

vikaskookna commented Nov 6, 2024

unclecode commented Nov 12, 2024

vikaskookna commented Nov 13, 2024

unclecode commented Nov 13, 2024

tanwar4 commented Dec 23, 2024

unclecode commented Dec 25, 2024

vikaskookna commented Oct 11, 2024 •

edited

Loading

unclecode commented Nov 2, 2024 •

edited

Loading