-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Image #155
Comments
I had the same issue and bypassed it by set the DP_PATH to '/tmp/' (the only write-able dir i AWS Lambda) befor importing the crawl4ai-package. My solution: import os
from pathlib import Path
os.makedirs('/tmp/.crawl4ai', exist_ok=True)
DB_PATH = '/tmp/.crawl4ai/crawl4ai.db'
Path.home = lambda: Path("/tmp")
from crawl4ai import AsyncWebCrawler Hope this works for you as well. |
ok I will try this @akamf |
After doing what you mentioned I got this error |
I created a Docker image where I installed Playwright and its dependencies and then chromium with playwright. The Docker image is really big though (because of Playwright I guess), so I'm currently working on optimizing it. But our latest Dockerfile looks like this:
I don't know if this is the best solution, but it works for us. Like I said, I'm working on some optimisation for it. |
thanks @akamf I tried this but gave me these errors, i'm using m1 mac and built the image using this command
var/task/playwright/driver/node: /lib64/libm.so.6: version GLIBC_2.27' not found (required by /var/task/playwright/driver/node) |
By the next week, I will create the Docker file and also upload the Docker image to a Docker hub. I hope this can also help you. |
@unclecode is docker image ready? |
@vikaskookna This weekend, hopefully everything will be ready. |
@unclecode is it ready, can you please share link? |
@vikaskookna I released 0.3.72, Now working on Docker, after done with testing, I share it soon. Stay tunes please. |
@unclecode any updates, can you give any estimates ? |
Its in this coming week plan 🤞 |
@akamf @unclecode i have tried to create lambda function with your code as well as the recently release docker image, i'm getting this error Here's my docker file
|
@vikaskookna I'm not very familiar with Lambda and AWS, but based on my research, I can suggest a few things. The issue seems to be with your Dockerfile build. You don't need to pull the crowdfrey/eye image; instead, install it using pip. Here are some points to consider. Your Dockerfile has this structure: FROM unclecode/crawl4ai:latest as build # Using your image as base
FROM public.ecr.aws/lambda/python:3.11 # Then switching to Lambda base The issue is in the Dockerfile. The problem is you are doing a multi-stage build but not properly carrying over the necessary components from Crael4ai image. When you switch to the Lambda base image, you are only copying specific directories ( Here's how you should modify their Dockerfile:
AWS Lambda has specific requirements and constraints that require a different approach to containerization. |
@vikaskookna Whenever you're able to handle this, I'd appreciate it if you could share a nice way to deploy it on AWS lambda. We may add it to our documentation and acknowledge you as a collaborator. Please do your best to fix and deploy it. |
@unclecode yes will do share if things starts to work, right now i'm getting this error in lambda can you please advise [LOG] 🚀 Crawl4AI 0.3.73 |
@vikaskookna This is a challenging one :)) since I have to do it in the lambda and I don't have much experience with it. But this shows that it couldn't create it instead of the browser on the machine running the lambda functions anyway. We'll keep this issue open to see how it goes and I'll try to find time to practice, do it in the lambda and update my experience, which I'll then share. |
I figured out the lambda docker image but this error seems to be coming from library, on some digging i found out that this is playwright issue |
@vikaskookna any update or progress on this issue that you could handle to fix and get it running on the lambda |
No @unclecode I couldn't figure out this Browser closed issue |
@vikaskookna You are still trying to run it on lambda right? |
@vikaskookna Can you share the docker image snippet? I am unable to create the docker image using multi stage build, crawl4ai fails to start. |
@tanwar4 I appreciate it if you share your details with me because I am currently redesigning the entire Docker concepts from the ground up based on the issues I face. The new approach is very different and much better than before. I believe your information will also help me. In a week or two, we will have this crazy Docker version. ;) |
I created aws lambda docker image, and it fails on this line
from crawl4ai import AsyncWebCrawler
The text was updated successfully, but these errors were encountered: