Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongo-shake-v2.4.7同步mongo集群数据到kafka刷屏报错 #430

Closed
dx8439 opened this issue Aug 27, 2020 · 13 comments
Closed

mongo-shake-v2.4.7同步mongo集群数据到kafka刷屏报错 #430

dx8439 opened this issue Aug 27, 2020 · 13 comments
Labels
bug Something isn't working mark something needs to be noticed

Comments

@dx8439
Copy link

dx8439 commented Aug 27, 2020

mongo-shake-v2.4.7同步mongo集群数据到kafka刷屏报错,并且kafka有消息堆积。mongo-shake日志如下:

[2020/08/27 15:06:37 CST] [INFO] Collector-worker-2 transfer retransmit:false send [1] logs. reply_acked [-1[-1, 4294967295]], list_unack [0]
[2020/08/27 15:06:37 CST] [WARN] Collector-worker-2 transfer oplogs failed with reply value -1
[2020/08/27 15:06:37 CST] [EROR] KafkaWriter json marshal data[json: unsupported value: NaN] error[json: unsupported value: NaN]
[2020/08/27 15:06:37 CST] [INFO] Collector-worker-2 transfer retransmit:false send [1] logs. reply_acked [-1[-1, 4294967295]], list_unack [0]
[2020/08/27 15:06:37 CST] [WARN] Collector-worker-2 transfer oplogs failed with reply value -1
[2020/08/27 15:06:37 CST] [EROR] KafkaWriter json marshal data[json: unsupported value: NaN] error[json: unsupported value: NaN]
[2020/08/27 15:06:37 CST] [INFO] Collector-worker-2 transfer retransmit:false send [1] logs. reply_acked [-1[-1, 4294967295]], list_unack [0]
[2020/08/27 15:06:37 CST] [WARN] Collector-worker-2 transfer oplogs failed with reply value -1
[2020/08/27 15:06:37 CST] [EROR] KafkaWriter json marshal data[json: unsupported value: NaN] error[json: unsupported value: NaN

@vinllen
Copy link
Collaborator

vinllen commented Aug 28, 2020

开DEBUG日志打印看一下,看起来读入的oplog是空,但实际上不应该会有空的Oplog。

@vinllen vinllen added the mark something needs to be noticed label Aug 28, 2020
@dx8439
Copy link
Author

dx8439 commented Aug 28, 2020

[2020/08/28 17:15:15 CST] [EROR] KafkaWriter json marshal data[json: unsupported value: NaN] error[json: unsupported value: NaN]
[2020/08/28 17:15:15 CST] [INFO] Collector-worker-2 transfer retransmit:false send [1] logs. reply_acked [-1[-1, 4294967295]], list_unack [0]
[2020/08/28 17:15:15 CST] [WARN] Collector-worker-2 transfer oplogs failed with reply value -1
[2020/08/28 17:15:15 CST] [DEBG] Tunnel message checksum value 0xebd3030e
[2020/08/28 17:15:15 CST] [EROR] KafkaWriter json marshal data[json: unsupported value: NaN] error[json: unsupported value: NaN]
[2020/08/28 17:15:15 CST] [INFO] Collector-worker-2 transfer retransmit:false send [1] logs. reply_acked [-1[-1, 4294967295]], list_unack [0]
[2020/08/28 17:15:15 CST] [WARN] Collector-worker-2 transfer oplogs failed with reply value -1
[2020/08/28 17:15:15 CST] [DEBG] Tunnel message checksum value 0xebd3030e

@dongkun-nb
Copy link

遇到了相同的报错,mongoshake-2.4.10 ,错误日志:

[2020/09/03 08:06:59 UTC] [WARN] Collector-worker-3 transfer oplogs failed with reply value -1
[2020/09/03 08:06:59 UTC] [DEBG] Tunnel message checksum value 0x29ed874d
[2020/09/03 08:06:59 UTC] [EROR] KafkaWriter json marshal data[json: unsupported value: -Inf] error[json: unsupported value: -Inf]
[2020/09/03 08:06:59 UTC] [INFO] Collector-worker-3 transfer retransmit:false send [1] logs. reply_acked [-1[-1, 4294967295]], list_unack [0]

@dongkun-nb
Copy link

像是 go 的 json 库解析 NaN -Inf 这样的特殊 value 报错了。

@dongkun-nb
Copy link

dongkun-nb commented Sep 3, 2020

比较奇怪的是,和这个 mongo-shake source 一样的另外的 mongo-shake 使用 direct 模式却没有报错,mongo-shake 的版本都是 2.4.10

@dongkun-nb
Copy link

@vinllen 请问这个问题有进展么?

@dx8439
Copy link
Author

dx8439 commented Sep 25, 2020

@vinllen 请问这个问题有进展么?

目前还没有解决,同步mongo是没有问题的,kafka就不行

@vinllen
Copy link
Collaborator

vinllen commented Sep 25, 2020

最近我投入人力定位一下,你们方便发下报错时间点的oplog吗?根据checkpoint可以看到同步的时间点,然后这个时间点之后的oplog拉出来看下

@vinllen
Copy link
Collaborator

vinllen commented Sep 29, 2020

@dx8439 @dongkun-nb 方便的话钉钉联系我一下,我给个版本试一下。钉钉id:e6ccxf0

@vinllen
Copy link
Collaborator

vinllen commented Oct 14, 2020

这个问题是因为用户的某些字段是NaN,inf,-inf,导致json序列化没办法识别,这些不是json的规范。此外,这些不应该在mongodb的字段中出现,一旦出现,通常是源端本身写入就有问题了。
nlohmann/json#1599

@vinllen vinllen added the bug Something isn't working label Oct 14, 2020
@vinllen
Copy link
Collaborator

vinllen commented Oct 14, 2020

MongoShake针对修改方式:一旦发现NaN,Inf和-Inf,将会报错Error,当前oplog不会推送kafka,但是程序不会block,仍然会继续往下走

@vinllen
Copy link
Collaborator

vinllen commented Oct 14, 2020

2.4.15版本已发布

@vinllen vinllen closed this as completed Oct 14, 2020
@zzggf7777
Copy link

这个问题在2.4.15已经解决了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mark something needs to be noticed
Projects
None yet
Development

No branches or pull requests

4 participants