IE 11 is not supported. For an optimal experience visit our site on another browser #208

thetesttoy · 2024-10-27T13:39:13Z

When scraping the ranking of movies on Douban, the message "IE 11 is not supported. For an optimal experience, visit our site on another browser" appears. I also encountered the same problem when scraping data from other websites. Could you please tell me how to solve this?this is my code

from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
import json
from crawl4ai.chunking_strategy import RegexChunking
from crawl4ai import AsyncWebCrawler
import asyncio

async def main():
    # 定义提取模式
    schema = {
        "name": "Douban Movies",
        "baseSelector": "div.item",  # 每个电影项的基础选择器
        "fields": [
            {
                "name": "title",
                "selector": "div.info div.hd a span.title:first-child",
                "type": "text",
            },
            {
                "name": "rating",
                "selector": "div.star span.rating_num",
                "type": "text",
            },
            {
                "name": "quote",
                "selector": "div.info div.bd p.quote span.inq",
                "type": "text",
            },
            {
                "name": "info",
                "selector": "div.info div.bd p:first-child",
                "type": "text",
            },
            {
                "name": "image",
                "selector": "div.pic img",
                "type": "attribute",
                "attribute": "src"
            }
        ],
    }

    # 设置请求头
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8"
    }

    # 创建爬虫实例并执行
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://movie.douban.com/top250",
            extraction_strategy=JsonCssExtractionStrategy(schema, verbose=True)
        )
        
        # 解析结果
        extracted_data = json.loads(result.extracted_content)
        print(f"提取到 {len(extracted_data)} 部电影")
        print(json.dumps(extracted_data, ensure_ascii=False, indent=2))

if __name__ == "__main__":
    asyncio.run(main())

unclecode · 2024-11-04T08:01:59Z

@thetesttoy I ran your code on my Mac and it worked fine. Could you provide more details about your specs and platform? There might be an issue on your end, ensure it's installed and running properly, so I can confirm everything works as expected on my side.

unclecode self-assigned this Nov 4, 2024

unclecode added bug Something isn't working question Further information is requested labels Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IE 11 is not supported. For an optimal experience visit our site on another browser #208

IE 11 is not supported. For an optimal experience visit our site on another browser #208

thetesttoy commented Oct 27, 2024

unclecode commented Nov 4, 2024 •

edited

Loading

IE 11 is not supported. For an optimal experience visit our site on another browser #208

IE 11 is not supported. For an optimal experience visit our site on another browser #208

Comments

thetesttoy commented Oct 27, 2024

unclecode commented Nov 4, 2024 • edited Loading

unclecode commented Nov 4, 2024 •

edited

Loading