Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lotus sync fall behind under load #4522

Closed
jsign opened this issue Oct 21, 2020 · 5 comments · Fixed by #4541
Closed

Lotus sync fall behind under load #4522

jsign opened this issue Oct 21, 2020 · 5 comments · Fixed by #4541

Comments

@jsign
Copy link
Contributor

jsign commented Oct 21, 2020

I've a lotus node which as ~300 deals not in DealStorageActive.

At this point, I see that the node has a hard time syncing and falls behind. CPU usage is < 10% and memory looks far from being under pressure. Looks to me there might be some mutex contention or similar, but haven't digged in the pprofs.

I'm leaving here three pprof outputs that might be interesting to check out: goroutine, heap, and a 25s trace.

pprofStacksHeapAndTrace.zip

Lotus version: v1.1.0

@Kubuxu
Copy link
Contributor

Kubuxu commented Oct 21, 2020

Can you also take CPU profile?

@jsign
Copy link
Contributor Author

jsign commented Oct 21, 2020

@Kubuxu , 45s CPU profile:
cpu45s.out.zip

@jsign
Copy link
Contributor Author

jsign commented Oct 21, 2020

@Kubuxu , this might be helpful too:
logs2hs.txt.zip

@jimpick
Copy link
Contributor

jimpick commented Oct 21, 2020

Sounds like it could be this #4427 ... I was able to fix it by commenting out a single line of code

@jsign
Copy link
Contributor Author

jsign commented Oct 21, 2020

In this case, I see that it makes progress in lotus sync wait seeing Validated XXX messages, but in slow and small bursts. I feel this is pretty much related to the number of in-progress deals. I've >30 other Lotus nodes with less load and they sync up faster.

So maybe that change will help, but since this is the most important Lotus node that we have running I'm a bit off trying maybe some unsafe changes. That might be a good clue to devs though.

@Kubuxu Kubuxu linked a pull request Oct 22, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants