Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

由于获取max_id为None时,导致触发'>' not supported between instances of 'NoneType' and 'int'错误的修复意见 #535

Open
2513502304 opened this issue Jan 10, 2025 · 2 comments

Comments

@2513502304
Copy link
Contributor

在爬取微博指定帖子的全部评论信息时,引起以下报错

报错信息:

2025-01-10 13:07:32 httpx INFO (_client.py:1773) - HTTP Request: GET https://m.weibo.cn/comments/hotflow?id=5117251109785706&mid=5117251109785706&max_id_type=0&max_id=5118541873545341 "HTTP/1.1 200 OK"
2025-01-10 13:07:34 MediaCrawler ERROR (core.py:215) - [WeiboCrawler.get_note_comments] may be been blocked, err:'>' not supported between instances of 'NoneType' and 'int'
2025-01-10 13:07:34 MediaCrawler INFO (core.py:103) - [WeiboCrawler.start] Weibo Crawler finished ...

阅读源码后,发现在media_platform/weibo.client.py下的get_note_all_comments函数中,存在一行

max_id: int = comments_res.get("max_id")

当get方法获取max_id为None时,下一轮循环传入给get_note_comments函数的max_id为None而不是期望接受的int,但剩余很多的评论没有爬取,因此不应该停止循环,而是在get_note_comments函数中捕获这个异常评论并继续爬取,将get_note_comments中的

if max_id > 0:
    params.update({"max_id": max_id})

更改为

try:
    if max_id > 0:
        params.update({"max_id": max_id})
except ValueError as e:
    print(e)

即可解决
最后的解决代码截图如下:
image

@2513502304
Copy link
Contributor Author

在爬取微博指定帖子的全部评论信息时,引起以下报错

报错信息:

2025-01-10 13:07:32 httpx INFO (_client.py:1773) - HTTP Request: GET https://m.weibo.cn/comments/hotflow?id=5117251109785706&mid=5117251109785706&max_id_type=0&max_id=5118541873545341 "HTTP/1.1 200 OK"
2025-01-10 13:07:34 MediaCrawler ERROR (core.py:215) - [WeiboCrawler.get_note_comments] may be been blocked, err:'>' not supported between instances of 'NoneType' and 'int'
2025-01-10 13:07:34 MediaCrawler INFO (core.py:103) - [WeiboCrawler.start] Weibo Crawler finished ...

阅读源码后,发现在media_platform/weibo.client.py下的get_note_all_comments函数中,存在一行

max_id: int = comments_res.get("max_id")

当get方法获取max_id为None时,下一轮循环传入给get_note_comments函数的max_id为None而不是期望接受的int,但剩余很多的评论没有爬取,因此不应该停止循环,而是在get_note_comments函数中捕获这个异常评论并继续爬取,将get_note_comments中的

if max_id > 0:
    params.update({"max_id": max_id})

更改为

try:
    if max_id > 0:
        params.update({"max_id": max_id})
except ValueError as e:
    print(e)

即可解决 最后的解决代码截图如下: image

Emmm...
捕获类型应该换成TypeError或Exception才对

@NanmiCoder
Copy link
Owner

能否贡献一个PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants