Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deployed crawl4AI tool not working - exception=AttributeError('copy is not supported.')> #349

Open
makispl opened this issue Dec 13, 2024 · 7 comments
Assignees

Comments

@makispl
Copy link

makispl commented Dec 13, 2024

Hi everyone,

I've made a custom scraping tool which incorporates craw4AI and use it in a crew of agents (crewAI). The crew runs perfectly locally and the custom_tool.py scrapes efficiently. However, when deploying the crew in the crewai+ enterprise platform, the tool does not work properly and not return scraped data at all.

The respective agent's output is:
"Unfortunately, I encountered persistent issues when attempting to use the Crawl4AI Crawler tool on all provided competitor URLs. As a result, I was unable to scrape the pricing plan data…" which means that the 'custom_tool.py' does not work as it is supposed to.

From the enterprise logs, all I can get is:

future: <Task finished name='Task-402' coro=<Connection.run() done, defined at /usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py:272> exception=AttributeError('`copy` is not supported.')>

File "/usr/local/lib/python3.12/site-packages/crewai/tools/tool_usage.py", line 168, in _use

After trying to resolve it with the help of windsurf/cursor, their suggestions rely on this:

The error you're encountering on the CrewAI+ enterprise platform appears to be related to a serialization issue with the Playwright browser instance. The error AttributeError('copy' is not supported.') suggests that there's a problem with copying or serializing the browser state, which is likely happening because the enterprise platform handles processes differently than your local environment.

Could someone more experienced with similar issues help?

As always, @unclecode I'd appreciate your help on that.

@unclecode
Copy link
Owner

Hi @makispl , the issue you are seeing is related to how the enterprise platform tries to serialize the browser or page objects from Playwright. These objects cannot be copied or pickled. This usually occurs if the scraping logic or Crawl4AI usage crosses process boundaries. To fix it, keep all Crawl4AI usage and the scraping process within the same agent or process. Instead of returning browser objects or passing them around, just return the final scraped results as plain data. Adjusting your code in this way should prevent the serialization attempts that cause the error. I have not tried to use this with CrewAI, and to be honest, I haven't used CrewAI at all, so I'm not very familiar with it. Therefore, it's difficult for me to engage with this unless someone, like yourself, can work on it. I can help you somehow; if we can fix it, we could create a wrapper and make it available for people who want to use Crawl4AI in the CrewAI library.

@unclecode unclecode self-assigned this Dec 14, 2024
@makispl
Copy link
Author

makispl commented Dec 14, 2024

Thanks for the reply, @unclecode!

I implemented your suggestions (with some help from the windsurf/cascade), approaching the issue like this:

  • Kept the scraping logic entirely within the async_run method.

  • Returned the final result as a JSON string immediately after scraping.

  • Ensured no Playwright objects are passed outside the method.

However, the issue still persists. Could you please take a look at the script I just emailed you (cannot attach .py here)?

Your help is invaluable! Once we fix this, it could pave the way for effective scraping for many CrewAI users—a wrapper would be a piece of cake! 🍰

@unclecode
Copy link
Owner

@makispl I can see now. Okay, sure. I will check the email. To speed it up a little bit, please attach the file contains entire of required code to test and send it to my email. If it's not, please create a file and put everything in it so that I can open it in my VS Code and run it. Then I will see what I can do. If there are any requirements that must be installed, please create a requirements.txt file. see how it goes.

@makispl
Copy link
Author

makispl commented Dec 15, 2024

@unclecode in order to verify if it works in the crewai platform, each time I modify the script, I then have to deploy it there (crewai platform) and by using its API (via my app) I check if data is scraped or not. I don't have any access to the platform's log files, except for the following snippet crewai team shared with me:

future: <Task finished name='Task-402' coro=<Connection.run() done, defined at /usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py:272> exception=AttributeError('`copy` is not supported.')>

File "/usr/local/lib/python3.12/site-packages/crewai/tools/tool_usage.py", line 168, in _use

So, running the tool locally (as is) it works well. This is why I wanted your experienced view on that, in order you check things that might be responsible for this bad behaviour (serialization, usage that crosses process boundaries etc). That said, if you want to check locally the custom_tool.py I emailed you, you can either:

  1. Check it directly like:
from dotenv import load_dotenv
from custom_tool import Crawl4AITool

def main():
    # Load environment variables from .env file
    load_dotenv()
    
    # Create an instance of the competitor detection tool
    scraper_tool = Crawl4AITool()
    
    # Test URL
    test_url = "https://www.schedulethreads.com/pricing"
    
    try:
        # Run the competitor detection
        result = scraper_tool._run(test_url)
        print("\nTiers' Scraping Results:")
        print(result)
    except Exception as e:
        print(f"Error occurred: {str(e)}")

if __name__ == "__main__":
    main()
  1. Check it along with the whole crew of agents (crewai), by installing crewai etc. I don't think it would be useful to do so, as I have already checked that and the tool runs also well within the crewai framework, but locally. In addition, I confirm that the tool is always used within only one agent.

Please let me know if can provide you with any additional information.

@unclecode
Copy link
Owner

@makispl I will try to run this locally. If I can't, I will let you know. If I can, I will start fresh, open a CrewAI project, and use Crawl4ai as a custom tool. Perhaps this will help me better take over the issue.

@makispl
Copy link
Author

makispl commented Dec 16, 2024

Exactly! But please keep in mind that most probably the custom tool will run perfectly with crewai locally and supposedly you won’t see any issues. Those issues only appear when the crew is deployed in the crewAI platform.

@unclecode
Copy link
Owner

@makispl Such a challenge haha, ok we will see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants