Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fork tendermint client creator to NOT use mutex #6

Merged
merged 5 commits into from
Dec 1, 2021

Conversation

kjessec
Copy link
Contributor

@kjessec kjessec commented Nov 22, 2021

  • Default tendermint client uses RWMutex
  • This disallows concurrently executing sync & query
  • Disable RWMutex to see if:
    • Performance is better
    • Is safe

@jeffwoooo
Copy link

구동 중 panic 되어 확인 결과

  • app client 가 service.Service interface 도 구현해야 함
  • abcicli.Client interface를 composite 만 하고 구현이 없음

두 이슈 해결 중에 local_client.go 를 통째로 가져오게 되었습니다. (localClient가 private 하여 구현을 가져올 수가 없음ㅜㅜ)

해결 후 쿼리 실행까지는 잘되고 performance와 sync 중 안정성을 테스트해보겠습니다.

@jeffwoooo
Copy link

did some apache benchmark test between main branch(with mutex) and this branch(without mutex),
and it seems there is no performance improvement.

main branch with mutex - a2d6343
this branch without mutex - edc40db

Document Path: /staking/validators/terravaloper1259cmu5zyklsdkmgstxhwqpe0utfe5hhyty0at/delegations?height=5354666
Document Length: 40452 bytes

  1. Concurrency Level: 1 & Total requests: 10
with mutex
Time taken for tests:   48.678 seconds
Requests per second:    0.21 [#/sec] (mean)
Time per request:       4867.779 [ms] (mean)
Time per request:       4867.779 [ms] (mean, across all concurrent requests)
Transfer rate:          8.18 [Kbytes/sec] received

without mutex
Time taken for tests:   48.794 seconds
Requests per second:    0.20 [#/sec] (mean)
Time per request:       4879.447 [ms] (mean)
Time per request:       4879.447 [ms] (mean, across all concurrent requests)
Transfer rate:          8.16 [Kbytes/sec] received
  1. Concurrency Level: 10 & Total requests: 50
with mutex
Time taken for tests:   57.891 seconds
Requests per second:    0.86 [#/sec] (mean)
Time per request:       11578.123 [ms] (mean)
Time per request:       1157.812 [ms] (mean, across all concurrent requests)
Transfer rate:          34.40 [Kbytes/sec] received

without mutex
Time taken for tests:   55.611 seconds
Requests per second:    0.90 [#/sec] (mean)
Time per request:       22244.443 [ms] (mean)
Time per request:       1112.222 [ms] (mean, across all concurrent requests)
Transfer rate:          35.81 [Kbytes/sec] received
  1. Concurrency Level: 20 & Total requests: 50
with mutex
Time taken for tests:   55.539 seconds
Requests per second:    0.90 [#/sec] (mean)
Time per request:       22215.505 [ms] (mean)
Time per request:       1110.775 [ms] (mean, across all concurrent requests)
Transfer rate:          35.85 [Kbytes/sec] received

without mutex
Time taken for tests:   55.611 seconds
Requests per second:    0.90 [#/sec] (mean)
Time per request:       22244.443 [ms] (mean)
Time per request:       1112.222 [ms] (mean, across all concurrent requests)
Transfer rate:          35.81 [Kbytes/sec] received

@kjessec
Copy link
Contributor Author

kjessec commented Nov 23, 2021

@jeffwoooo the PR was more about allowing block injection while a query takes longer than a block interval

@jeffwoooo
Copy link

thank you for the reply. that is right.
so now, I want to check if a long query slow down block time.

I am testing with a request to /staking/validators/terravaloper1p54hc4yy2ajg67j645dn73w3378j6k05vmx9r9/delegations\?height=5354666
but the request does not seem to slow block time even in main branch since it took only about 7 seconds.

is there a query that take much longer than 7 seconds?

@kjessec
Copy link
Contributor Author

kjessec commented Nov 24, 2021

@jeffwoooo let's try terravaloper15zcjduavxc5mkp8qcqs9eyhwlqwdlrzy6jln3m instead, with differing heights.

@jeffwoooo
Copy link

@kjessec Hi!

While testing with queries, I wondered why long query doesn't slow down syncing and finally figured out that block excutor(=> sync) is not using same mutex with query client.

Multi app conn create 4 abci clients who have each one’s mutex, and assign them for each Mempool, Consensus, Query and Snapshot connection.

So, long query would not slow down syncing by locking at least.

Would you please check following and confirm that?

https://github.com/tendermint/tendermint/blob/849461aab254014f6b7976710400ea59f83265ce/proxy/multi_app_conn.go#L85-L121

@kjessec
Copy link
Contributor Author

kjessec commented Nov 24, 2021

@jeffwoooo oh this may be the case for sure. Hmm weird -- I thought we observed a case where the entire sync process is blocked during long queries.

We should close this PR maybe?

@jeffwoooo
Copy link

Yes, I agree to close it.

I thought that I observed that case too.
But I'm sure it was caused by other issue that we don't know yet.

Let's open another issue again if we watch that case again.

@jeffwoooo jeffwoooo closed this Nov 24, 2021
@jeffwoooo
Copy link

turned out the 4 local clients use same mutex given from local client creator.
reopening it.

@jeffwoooo jeffwoooo reopened this Nov 26, 2021
Copy link

@jeffwoooo jeffwoooo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to test running sync under heavy load.

@jeffwoooo
Copy link

it still slowing down syncing. i suspect db-side locking.

Jeff Woo and others added 2 commits November 30, 2021 15:21
* wip

* feat: subscribe for cache channel

* test: test cache module

Co-authored-by: Jesse Chung <[email protected]>
@jeffwoooo jeffwoooo merged commit a595edc into main Dec 1, 2021
@jeffwoooo jeffwoooo deleted the feat/concurrent-archival-query branch December 1, 2021 07:00
jeffwoooo pushed a commit that referenced this pull request Dec 22, 2021
This reverts commit a595edc.

# Conflicts:
#	bin/v0.34.x/rpc/cache.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants