由于获取max_id为None时，导致触发'>' not supported between instances of 'NoneType' and 'int'错误的修复意见 #535

2513502304 · 2025-01-10T05:26:55Z

在爬取微博指定帖子的全部评论信息时，引起以下报错

报错信息：

2025-01-10 13:07:32 httpx INFO (_client.py:1773) - HTTP Request: GET https://m.weibo.cn/comments/hotflow?id=5117251109785706&mid=5117251109785706&max_id_type=0&max_id=5118541873545341 "HTTP/1.1 200 OK"
2025-01-10 13:07:34 MediaCrawler ERROR (core.py:215) - [WeiboCrawler.get_note_comments] may be been blocked, err:'>' not supported between instances of 'NoneType' and 'int'
2025-01-10 13:07:34 MediaCrawler INFO (core.py:103) - [WeiboCrawler.start] Weibo Crawler finished ...

阅读源码后，发现在media_platform/weibo.client.py下的get_note_all_comments函数中，存在一行

max_id: int = comments_res.get("max_id")

当get方法获取max_id为None时，下一轮循环传入给get_note_comments函数的max_id为None而不是期望接受的int，但剩余很多的评论没有爬取，因此不应该停止循环，而是在get_note_comments函数中捕获这个异常评论并继续爬取，将get_note_comments中的

if max_id > 0:
    params.update({"max_id": max_id})

更改为

try:
    if max_id > 0:
        params.update({"max_id": max_id})
except ValueError as e:
    print(e)

即可解决
最后的解决代码截图如下：

The text was updated successfully, but these errors were encountered:

2513502304 · 2025-01-10T05:44:41Z

在爬取微博指定帖子的全部评论信息时，引起以下报错

报错信息：
2025-01-10 13:07:32 httpx INFO (_client.py:1773) - HTTP Request: GET https://m.weibo.cn/comments/hotflow?id=5117251109785706&mid=5117251109785706&max_id_type=0&max_id=5118541873545341 "HTTP/1.1 200 OK"
2025-01-10 13:07:34 MediaCrawler ERROR (core.py:215) - [WeiboCrawler.get_note_comments] may be been blocked, err:'>' not supported between instances of 'NoneType' and 'int'
2025-01-10 13:07:34 MediaCrawler INFO (core.py:103) - [WeiboCrawler.start] Weibo Crawler finished ...
阅读源码后，发现在media_platform/weibo.client.py下的get_note_all_comments函数中，存在一行
max_id: int = comments_res.get("max_id")
当get方法获取max_id为None时，下一轮循环传入给get_note_comments函数的max_id为None而不是期望接受的int，但剩余很多的评论没有爬取，因此不应该停止循环，而是在get_note_comments函数中捕获这个异常评论并继续爬取，将get_note_comments中的
if max_id > 0:
    params.update({"max_id": max_id})
更改为
try:
    if max_id > 0:
        params.update({"max_id": max_id})
except ValueError as e:
    print(e)
即可解决最后的解决代码截图如下：

Emmm...
捕获类型应该换成TypeError或Exception才对

NanmiCoder · 2025-01-10T06:39:33Z

能否贡献一个PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

由于获取max_id为None时，导致触发'>' not supported between instances of 'NoneType' and 'int'错误的修复意见 #535

由于获取max_id为None时，导致触发'>' not supported between instances of 'NoneType' and 'int'错误的修复意见 #535

2513502304 commented Jan 10, 2025

2513502304 commented Jan 10, 2025

在爬取微博指定帖子的全部评论信息时，引起以下报错

NanmiCoder commented Jan 10, 2025

由于获取max_id为None时，导致触发'>' not supported between instances of 'NoneType' and 'int'错误的修复意见 #535

由于获取max_id为None时，导致触发'>' not supported between instances of 'NoneType' and 'int'错误的修复意见 #535

Comments

2513502304 commented Jan 10, 2025

在爬取微博指定帖子的全部评论信息时，引起以下报错

2513502304 commented Jan 10, 2025

在爬取微博指定帖子的全部评论信息时，引起以下报错

NanmiCoder commented Jan 10, 2025