[BUG] Error: Page.wait_for_function: EvalError due to Content Security Policy restrictions #370

HamdiBarkous · 2024-12-25T20:18:51Z

Issue:
The scraper encounters an EvalError while attempting to crawl the page at https://www.tradingview.com/broker/FOREXcom/. The error is triggered due to the page's Content Security Policy (CSP)
code snippet to Reproduce:

import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url='https://www.tradingview.com/broker/FOREXcom/',
            )
        print(result.markdown)

asyncio.run(main())

Observed Error:

Error: Page.wait_for_function: EvalError: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the Content Security Policy directive.

Complete Error Trace:

[ERROR]... × https://www.tradingview.com/broker/FOREXcom/... | Error:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ × Unexpected error in _crawl_web at line 528 in wrap_api_call (.venv/lib/python3.10/site- │
│ packages/playwright/_impl/_connection.py): │
│ Error: Page.wait_for_function: EvalError: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not │
│ an allowed source of script in the following Content Security Policy directive: "script-src │
│ https://static.tradingview.com/static/ blob: https://*.ampproject.org/ https://*.paypal.com/ │
│ https://platform.twitter.com/ https://platform.x.com/ https://songbird.cardinalcommerce.com/edge/v1/ │
│ https://checkout.razorpay.com/ https://cdn.checkout.com/ 'nonce-v+WIeNdKFxEFsPPe9saCNA=='". │
│ │
│ at eval () │
│ at predicate (eval at evaluate (:234:30), :11:37) │
│ at next (eval at evaluate (:234:30), :32:31) │
│ │
│ Code context: │
│ 523 parsed_st = _extract_stack_trace_information_from_stack(st, is_internal) │
│ 524 self._api_zone.set(parsed_st) │
│ 525 try: │
│ 526 return await cb() │
│ 527 except Exception as error: │
│ 528 → raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None │
│ 529 finally: │
│ 530 self._api_zone.set(None) │
│ 531 │
│ 532 def wrap_api_call_sync( │
│ 533 self, cb: Callable[[], Any], is_internal: bool = False

The text was updated successfully, but these errors were encountered:

unclecode · 2024-12-26T11:56:08Z

@HamdiBarkous Thanks for the report. Yes, that's a bug, and I have already resolved it. I will push it in the next version 0.4.24.

mozou · 2024-12-27T03:35:37Z

I also encountered the same problem, but it was a little special. I directly used the code in the file https://colab.research.google.com/drive/1REChY6fXQf-EaVYLv0eHEWvzlYxGm0pd?usp=sharing#scrollTo=qUBKGpn3yZQN to execute.

The first time I used "headless=True", the execution was successful, and the second time I used "headless=False", the execution was also successful, but the third time and thereafter, regardless of whether the value of "headless=" was set, the execution failed.

import json

import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy


async def crawl_dynamic_content_pages_method_3():
    print("\n--- Advanced Multi-Page Crawling with JavaScript Execution using `wait_for` ---")

    async with AsyncWebCrawler(verbose=True, headless=True) as crawler:
        url = "https://github.com/microsoft/TypeScript/commits/main"
        session_id = "typescript_commits_session"
        all_commits = []

        js_next_page = """
        const commits = document.querySelectorAll('li.Box-sc-g0xbh4-0 h4');
        if (commits.length > 0) {
            window.firstCommit = commits[0].textContent.trim();
        }
        const button = document.querySelector('a[data-testid="pagination-next-button"]');
        if (button) button.click();
        """

        wait_for = """() => {
            const commits = document.querySelectorAll('li.Box-sc-g0xbh4-0 h4');
            if (commits.length === 0) return false;
            const firstCommit = commits[0].textContent.trim();
            return firstCommit !== window.firstCommit;
        }"""

        schema = {
            "name": "Commit Extractor",
            "baseSelector": "li.Box-sc-g0xbh4-0",
            "fields": [
                {
                    "name": "title",
                    "selector": "h4.markdown-title",
                    "type": "text",
                    "transform": "strip",
                },
            ],
        }
        extraction_strategy = JsonCssExtractionStrategy(schema, verbose=True)

        for page in range(3):  # Crawl 3 pages
            result = await crawler.arun(
                url=url,
                session_id=session_id,
                css_selector="li.Box-sc-g0xbh4-0",
                extraction_strategy=extraction_strategy,
                js_code=js_next_page if page > 0 else None,
                wait_for=wait_for if page > 0 else None,
                js_only=page > 0,
                bypass_cache=True,
                headless=False,
            )

            assert result.success, f"Failed to crawl page {page + 1}"

            commits = json.loads(result.extracted_content)
            all_commits.extend(commits)

            print(f"Page {page + 1}: Found {len(commits)} commits")

        await crawler.crawler_strategy.kill_session(session_id)
        print(f"Successfully crawled {len(all_commits)} commits across 3 pages")


if __name__ == "__main__":
    asyncio.run(crawl_dynamic_content_pages_method_3())

Complete Error Trace:

D:\python-project\Crawl4AI -learning\page_test.py:47: DeprecationWarning: Cache control boolean flags are deprecated and will be removed in version 0.5.0. Use 'cache_mode' parameter instead.
  result = await crawler.arun(
[ERROR]... × https://github.com/microsoft/TypeScript/commits/ma... | Error: 
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ × Unexpected error in _crawl_web at line 528 in wrap_api_call (E:\python-project\lib\site-                            │
│ packages\playwright\_impl\_connection.py):                                                                            │
│   Error: Page.wait_for_function: EvalError: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not   │
│ an allowed source of script in the following Content Security Policy directive: "script-src                           │
│ github.githubassets.com".                                                                                             │
│                                                                                                                       │
│   at eval (<anonymous>)                                                                                               │
│   at predicate (eval at evaluate (:234:30), <anonymous>:11:37)                                                        │
│   at next (eval at evaluate (:234:30), <anonymous>:32:31)                                                             │
│                                                                                                                       │
│   Code context:                                                                                                       │
│   523           parsed_st = _extract_stack_trace_information_from_stack(st, is_internal)                              │
│   524           self._api_zone.set(parsed_st)                                                                         │
│   525           try:                                                                                                  │
│   526               return await cb()                                                                                 │
│   527           except Exception as error:                                                                            │
│   528 →             raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None                          │
│   529           finally:                                                                                              │
│   530               self._api_zone.set(None)                                                                          │
│   531                                                                                                                 │
│   532       def wrap_api_call_sync(                                                                                   │
│   533           self, cb: Callable[[], Any], is_internal: bool = False                                                │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

unclecode · 2024-12-27T12:17:35Z

@mozou When you use it on Colab, you can't set the headless mode to false because there is no graphical virtualization available. I think that error occurred in the memory of Colab. I checked Colab, updated some of the code because they were using the old syntax, and I tested everything. Everything works well now, so you can give it another try.

unclecode self-assigned this Dec 26, 2024

unclecode added the bug Something isn't working label Dec 26, 2024

unclecode closed this as completed Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error: Page.wait_for_function: EvalError due to Content Security Policy restrictions #370

[BUG] Error: Page.wait_for_function: EvalError due to Content Security Policy restrictions #370

HamdiBarkous commented Dec 25, 2024 •

edited

Loading

unclecode commented Dec 26, 2024

mozou commented Dec 27, 2024 •

edited

Loading

unclecode commented Dec 27, 2024

[BUG] Error: Page.wait_for_function: EvalError due to Content Security Policy restrictions #370

[BUG] Error: Page.wait_for_function: EvalError due to Content Security Policy restrictions #370

Comments

HamdiBarkous commented Dec 25, 2024 • edited Loading

unclecode commented Dec 26, 2024

mozou commented Dec 27, 2024 • edited Loading

unclecode commented Dec 27, 2024

HamdiBarkous commented Dec 25, 2024 •

edited

Loading

mozou commented Dec 27, 2024 •

edited

Loading