New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

instanceid溢出问题 #166

Open

weipatty opened this issue Sep 19, 2018 · 0 comments

weipatty commented Sep 19, 2018

图一：由于溢出，SetHoldPaxosLogCount设进去的值为300.

图二：溢出的函数

图三：相关函数

如图，由于GetOldestInstanceIDofFile这个函数返回值是int，会导致如果instanceid>int max时溢出，进而影响agentmonitor定时设SetHoldPaxosLogCount的值，这样会导致快照前保留的paxoslog只有300条，

结合这里：

关于PhxPaxos在LoadCheckpointState后会进行自杀
首先这里自杀的目的是为了方便程序以新的Checkpoint状态机数据来进行重启，那么会涉及到如何重启的问题。PhxPaxos只负责自杀，不负责重启，开发者需要自行解决重启的问题。我们微信内部一般会通过守护进程的方式来自动拉起工作进程。

其次当你使用到PhxPaxos多个Group的特性的时候，那么当多个Group整体落后非常多的时候，每个Group都需要各自进行Checkpoint的对齐，那么每个Group都要经历一次自杀的操作，想象如果有100个Group，那么程序可能要经过100次重启才能完成Checkpoint的对齐，效率非常低下。这时候开发者需要根据自己的业务特性，在LoadCheckpointState的函数过程中进行一些延缓等待操作，使得一次自杀可以完成更多Group的Checkpoint对齐。

对于业务的表现就是：如果有一个点落后一点点（5分钟），那么就会进入传输Checkpoint模式，这是一个重操作，5分钟内完成不了，那么导致落后5分钟，又进入checkpoint模式，循环

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment