Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2023-07-14起通过远古版本客户端_client_version=6.0.x获取网页端上吧首页主题帖列表接口中的主题帖最后回复人百度uid变成了楼主uid,但namename_show仍然正常 #208

Closed
n0099 opened this issue Jul 9, 2024 · 6 comments
Labels
discussion discussion

Comments

@n0099
Copy link

n0099 commented Jul 9, 2024

https://n0099.net/tbm/v1/client_tester.php?type=posts&forum=模拟城市&pn=1&rn=30&client_version=6.0.9999 archive

$ curl https://n0099.net/tbm/v1/client_tester.php\?type\=posts\&forum\=模拟城市\&pn\=1\&rn\=30\&client_version\=6.0.9999 \
| jq '.thread_list | map({ (.id | tostring): .last_replyer }) | add | ."5564277912"'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  140k    0  140k    0     0   292k      0 --:--:-- --:--:-- --:--:--  292k
{
  "name_show": "canguanzhe",
  "type": 1,
  "is_verify": 0,
  "id": 3448717235,
  "name": "canguanzhe"
}

uid3448717235是tid5564277912楼主SY_Here的uid而非该主题帖最后回复人canguanzhe的uid720938213

追溯最后回复人uid楼主uid相同的主题帖数量
发帖时间

SELECT COUNT(*), to_timestamp("postedAt")::date
FROM tbmc_f97650_thread WHERE "latestReplierUid" = "authorUid"
GROUP BY to_timestamp("postedAt")::date
ORDER BY to_timestamp("postedAt")::date DESC;
count to_timestamp
... ...
373 2023-07-21
365 2023-07-20
469 2023-07-19
392 2023-07-18
120 2023-07-17
113 2023-07-16
111 2023-07-15
53 2023-07-14
19 2023-07-13
17 2023-07-12
12 2023-07-11
11 2023-07-10
... ...

image

  • 首次爬到的时间
SELECT COUNT(*), to_timestamp("createdAt")::date
FROM tbmc_f97650_thread WHERE "latestReplierUid" = "authorUid"
GROUP BY to_timestamp("createdAt")::date
ORDER BY to_timestamp("createdAt")::date DESC;
count to_timestamp
... ...
374 2023-07-21
372 2023-07-20
475 2023-07-19
398 2023-07-18
121 2023-07-17
110 2023-07-16
96 2023-07-15
51 2023-07-14
19 2023-07-13
19 2023-07-12
12 2023-07-11
12 2023-07-10
... ...

image

  • 最后一次更新的时间
SELECT COUNT(*), to_timestamp("updatedAt")::date
FROM tbmc_f97650_thread WHERE "latestReplierUid" = "authorUid"
GROUP BY to_timestamp("updatedAt")::date
ORDER BY to_timestamp("updatedAt")::date DESC;
count to_timestamp
... ...
359 2023-07-21
419 2023-07-20
495 2023-07-19
239 2023-07-18
4 2023-07-17
2 2023-07-15
1 2023-07-14
5 2023-07-13
10 2023-07-12
3 2023-07-11
5 2023-07-10
... ...

image

  • 最后一次爬到的时间
SELECT COUNT(*), to_timestamp("lastSeenAt")::date
FROM tbmc_f97650_thread WHERE "latestReplierUid" = "authorUid"
GROUP BY to_timestamp("lastSeenAt")::date
ORDER BY to_timestamp("lastSeenAt")::date DESC;
count to_timestamp
... ...
14 2023-07-21
18 2023-07-20
8 2023-07-19
13 2023-07-18
17 2023-01-14
37 2023-01-13
46 2023-01-12
36 2023-01-11
34 2023-01-10
47 2023-01-09
33 2023-01-08
... ...

image

只有最后一次爬到的时间没有呈现相关性

@n0099 n0099 changed the title 2023-07-14起通过远古版本客户端__client_version=6.0.x获取网页端上吧首页主题帖列表接口中的主题帖最后回复人百度uid变成了楼主的uid,但namename_show仍然正常 2023-07-14起通过远古版本客户端__client_version=6.0.x获取网页端上吧首页主题帖列表接口中的主题帖最后回复人百度uid变成了楼主uid,但namename_show仍然正常 Jul 9, 2024
@n0099
Copy link
Author

n0099 commented Jul 9, 2024

@n0099
Copy link
Author

n0099 commented Jul 9, 2024

SELECT COUNT(*), to_timestamp("takenAt")::date
FROM tbmcr_thread
WHERE "latestReplierUid" IS NOT NULL
  AND "takenAt" > date_part('epoch', '2023-01-01'::date)
GROUP BY to_timestamp("takenAt")::date
ORDER BY to_timestamp("takenAt")::date DESC;
count to_timestamp
... ...
1 2023-08-01
2 2023-07-28
1 2023-07-26
1 2023-07-25
1 2023-07-22
1 2023-07-20
20 2023-07-18
1 2023-07-17
4 2023-07-16
3 2023-07-15
2 2023-07-14
3 2023-07-12
3 2023-07-11
1 2023-07-10
1 2023-07-08
1 2023-07-07
5 2023-07-05
3 2023-07-04
1 2023-07-03
1 2023-07-02
3 2023-06-29
... ...

image
也没有相关性

@n0099 n0099 changed the title 2023-07-14起通过远古版本客户端__client_version=6.0.x获取网页端上吧首页主题帖列表接口中的主题帖最后回复人百度uid变成了楼主uid,但namename_show仍然正常 2023-07-14起通过远古版本客户端_client_version=6.0.x获取网页端上吧首页主题帖列表接口中的主题帖最后回复人百度uid变成了楼主uid,但namename_show仍然正常 Jul 10, 2024
@n0099
Copy link
Author

n0099 commented Jul 10, 2024

  • 将主题帖最后回复人仅有的3个字段百度uid 百度用户名 贴吧覆盖ID存为部分填充的贴吧用户的趋势
SELECT COUNT(*), to_timestamp("createdAt")::date FROM tbmc_user
WHERE portrait = '' AND "portraitUpdatedAt" IS NULL AND gender IS NULL AND "fansNickname" IS NULL AND icon IS NULL AND "ipGeolocation" IS NULL
GROUP BY to_timestamp("createdAt")::date
ORDER BY to_timestamp("createdAt")::date DESC;
count to_timestamp
4 2022-12-29
7 2022-12-28
7 2022-12-27
5 2022-12-26
1 2022-12-25
5 2022-12-23
6 2022-12-22
3 2022-12-21
10 2022-12-20
4 2022-12-19
6 2022-12-18
9 2022-12-17
15 2022-12-16
16 2022-12-15
9 2022-12-14
6 2022-12-13
4 2022-12-12
4 2022-12-11
10 2022-12-10
11 2022-12-09
7 2022-12-08
5 2022-12-07
5 2022-12-06
9 2022-12-05
8 2022-12-04
8 2022-12-03
9 2022-12-02
14 2022-12-01
9 2022-11-30
9 2022-11-29
17 2022-11-28
10 2022-11-27
18 2022-11-26
21 2022-11-25
11 2022-11-24
12 2022-11-23
12 2022-11-22
12 2022-11-21
23 2022-11-20
15 2022-11-19
5 2022-11-18
8 2022-11-17
15 2022-11-16
13 2022-11-15
5 2022-11-14
5 2022-11-13
9 2022-11-12
6 2022-11-11
10 2022-11-10
14 2022-11-09
6 2022-11-08
11 2022-11-07
8 2022-11-06
9 2022-11-05
8 2022-11-04
12 2022-11-03
14 2022-11-02
11 2022-11-01
8 2022-10-31
4 2022-10-30
3 2022-10-29
1 2022-10-28
4 2022-10-27
8 2022-10-26
2 2022-10-25
4 2022-10-24
1 2022-10-23
2 2022-10-22
3 2022-10-21
2 2022-10-20
5 2022-10-19
1 2022-10-18
6 2022-10-17
3 2022-10-16
10 2022-10-15
14 2022-10-14
6 2022-10-13
18 2022-10-12
11 2022-10-11
13 2022-10-10
20 2022-10-09
9 2022-10-08
12 2022-10-07
7 2022-10-06
5 2022-10-05
7 2022-10-04
10 2022-10-03
8 2022-10-02
14 2022-10-01
9 2022-09-30
6 2022-09-29
16 2022-09-28
14 2022-09-27
10 2022-09-26
9 2022-09-25
8 2022-09-24
6 2022-09-23
8 2022-09-22
7 2022-09-21
6 2022-09-20
14 2022-09-19
13 2022-09-18
19 2022-09-17
11 2022-09-16
18 2022-09-15
14 2022-09-14
19 2022-09-13
8 2022-09-12
8 2022-09-11
16 2022-09-10
20 2022-09-09
19 2022-09-08
10 2022-09-07
14 2022-09-06
9 2022-09-05
12 2022-09-04
4 2022-09-03
14 2022-09-02
13 2022-09-01
10 2022-08-31
16 2022-08-30
19 2022-08-29
14 2022-08-28
11 2022-08-27
9 2022-08-26
7 2022-08-25
14 2022-08-24
18 2022-08-23
24 2022-08-22
46 2022-08-21
17 2022-08-20
12 2022-08-19
7 2022-08-18
27 2022-08-17
15 2022-08-16
12 2022-08-15
10 2022-08-14
12 2022-08-13
11 2022-08-12
6 2022-08-11
7 2022-08-10
5 2022-08-09
12 2022-08-08
17 2022-08-07
29 2022-08-06
5 2022-08-05
8 2022-08-04
14 2022-08-03
13 2022-08-02
8 2022-08-01
8 2022-07-31
12 2022-07-30
10 2022-07-29
9 2022-07-28
15 2022-07-27
72 2022-07-26
16 2022-07-25

image
不知道为什么只有22年7月~12月间存在部分填充的贴吧用户
如果后续递归爬回复贴楼中楼时找到(按相同百度uid)了该最后回复人就会填充该用户完整信息(如果没找到(如挖坟自删爆吧攻击)自然就不会填充),但其应该与该时间段正交(即任何时候都可能找不到,而上表显示22年12月后就必定找到,这可能意味着早在那时就开始最后回复人百度uid变成楼主uid了)

n0099 added a commit to n0099/open-tbm that referenced this issue Jul 10, 2024
+ entity class `LatestReplier` that has one-to-one relationship with entity `ThreadPost.LatestReplier`
* replace field `LatestReplierUid` with foreign key to entity `LatestReplier` @ ThreadPost.cs, also affects `ThreadRevision`, `ThreadSaver` & `CrawlPost.CrawlReplies()`

* replace field `_latestRepliers` with `_latestRepliersKeyByUnique` to reuse latest repliers with same `UniqueLatestReplier` for `FillFromRequestingWith602()` @ ThreadCrawlFacade.cs

* now will invoke `ThreadLatestReplierSaver.Save()`
FieldRevisionIgnorance @ `Save()`
* no longer ignrore revision for field `ThreadPost.LatestReplierUid` @ `FieldRevisionIgnorance()`
@ ThreadSave.cs
@ c#
@lumina37 lumina37 added the discussion discussion label Jul 12, 2024
@lumina37
Copy link
Owner

奠。以后反挖坟只能提高扫描频率了

@n0099
Copy link
Author

n0099 commented Jul 12, 2024

没有封人接口可以仅根据百度用户名覆盖ID封? #146 (comment)
我记得利空id吃的第二版覆盖ID全面推广前可以仅根据百度用户名#77 (comment)
挖坟自删爆吧的仍然可以挑水了几千上万楼的水贴中某个回复贴的楼中楼来挖,再赶在在10/30rps限制下全部递归爬一遍找到其百度uid前自删

@lumina37
Copy link
Owner

没有封人接口可以仅根据百度用户名覆盖ID封?

还真没有

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion
Projects
None yet
Development

No branches or pull requests

2 participants