Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts from the js_snippets folder are not installed via pip #348

Open
blghtr opened this issue Dec 13, 2024 · 7 comments
Open

scripts from the js_snippets folder are not installed via pip #348

blghtr opened this issue Dec 13, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@blghtr
Copy link

blghtr commented Dec 13, 2024

Hi!

│ × Unexpected error in crawl_web at line 11 in load_js_script (.venv\lib\site- │
│ packages\crawl4ai\js_snippet_init
.py): │
│ Error: Script update_image_dimensions not found in the folder │
│ C:\Users\Gamer\PycharmProjects\scraper.venv\Lib\site-packages\crawl4ai\js_snippet │
│ │
│ Code context: │
│ 6 current_script_path = os.path.dirname(os.path.realpath(file)) │
│ 7 # Get the path of the script to load │
│ 8 script_path = os.path.join(current_script_path, script_name + '.js') │
│ 9 # Check if the script exists │
│ 10 if not os.path.exists(script_path): │
│ 11 → raise ValueError(f"Script {script_name} not found in the folder {current_script_path}") │
│ 12 # Load the content of the script │
│ 13 with open(script_path, 'r') as f: │
│ 14 script_content = f.read() │
│ 15 return script_content

returned by quick start

@1933211129
Copy link

Yes, I have also encountered this problem. The code in the provided colab notebook does not run, and this error is also reported. My local deployment of 0.4.1 is working fine.

@requizm
Copy link

requizm commented Dec 14, 2024

For now, I recommend downloading JS files from repo and manually copy to /path/to/python/Lib/site-packages/crawl4ai/js_snippet

@Udbhav8
Copy link

Udbhav8 commented Dec 15, 2024

also @unclecode even after doing this the js_only argument is broken to use

@Udbhav8
Copy link

Udbhav8 commented Dec 15, 2024

also @unclecode even after doing this the js_only argument is broken to use

my problem was coming from this code block
` try:
# Set up download handling
if self.browser_config.accept_downloads:
page.on("download", lambda download: asyncio.create_task(self._handle_download(download)))

        # Handle page navigation and content loading
        if not config.js_only:
            await self.execute_hook('before_goto', page, context=context)

            try:
                response = await page.goto(
                    url,
                    wait_until=config.wait_until,
                    timeout=config.page_timeout
                )
            except Error as e:
                raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}")
            
            await self.execute_hook('after_goto', page, context=context)
            
            status_code = response.status
            response_headers = response.headers
        else:
            status_code = 200
            response_headers = {}

`

basically response.status fails because response is None so probably just need to handle that condition and return a 200 when the hook is not there
also seems like log_console=True doesn't print any logs anymore of the js_code executing, this worked in 0.4.1

@wwwrookie
Copy link

me too

@unclecode
Copy link
Owner

Hey everyone please update to 0.4.22 @wwwrookie @requizm @blghtr @Udbhav8 @1933211129

@unclecode unclecode self-assigned this Dec 16, 2024
@unclecode unclecode added the bug Something isn't working label Dec 16, 2024
@unclecode
Copy link
Owner

@Udbhav8 I resolved the issue by putting it out in version 0.4.23, or 0.4.3, while collecting a few other issues and updating a patch. For js_code can you please share your code snippet? Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants