Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

[R4R] Upgrade from 0301 to 0315 #87

Merged
merged 138 commits into from
May 21, 2019
Merged

[R4R] Upgrade from 0301 to 0315 #87

merged 138 commits into from
May 21, 2019

Conversation

rickyyangz
Copy link
Contributor

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

thanethomson and others added 30 commits February 20, 2019 09:45
As per #3043, this adds a ticker to sync the WAL every 2s while the WAL is running.

* Flush WAL every 2s

This adds a ticker that flushes the WAL every 2s while the WAL is
running. This is related to #3043.

* Fix spelling

* Increase timeout to 2mins for slower build environments

* Make WAL sync interval configurable

* Add TODO to replace testChan with more comprehensive testBus

* Remove extraneous debug statement

* Remove testChan in favour of using system time

As per
tendermint/tendermint#3300 (comment),
this removes the `testChan` WAL member and replaces the approach with a
system time-oriented one. In this new approach, we keep track of the
system time at which each flush and periodic flush successfully
occurred.

The naming of the various functions is also updated here to be more
consistent with "flushing" as opposed to "sync'ing".

* Update naming convention and ensure lock for timestamp update

* Add Flush method as part of WAL interface

Adds a `Flush` method as part of the WAL interface to enforce the idea
that we can manually trigger a WAL flush from outside of the WAL. This
is employed in the consensus state management to flush the WAL prior to
signing votes/proposals, as per tendermint/tendermint#3043 (comment)

* Update CHANGELOG_PENDING

* Remove mutex approach and replace with DI

The dependency injection approach to dealing with testing concerns could
allow similar effects to some kind of "testing bus"-based approach. This
commit introduces an example of this, where instead of relying on
(potentially fragile) timing of things between the code and the test, we
inject code into the function under test that can signal the test
through a channel.

This allows us to avoid the `time.Sleep()`-based approach previously
employed.

* Update comment on WAL flushing during vote signing

Co-Authored-By: thanethomson <[email protected]>

* Simplify flush interval definition

Co-Authored-By: thanethomson <[email protected]>

* Expand commentary on WAL disk flushing

Co-Authored-By: thanethomson <[email protected]>

* Add broken test to illustrate WAL sync test problem

Removes test-related state (dependency injection code) from the WAL data
structure and adds test code to illustrate the problem with using
`WALGenerateNBlocks` and `wal.SearchForEndHeight` to test periodic
sync'ing.

* Fix test error messages

* Use WAL group buffer size to check for flush

A function is added to `libs/autofile/group.go#Group` in order to return
the size of the buffered data (i.e. data that has not yet been flushed
to disk). The test now checks that, prior to a `time.Sleep`, the group
buffer has data in it. After the `time.Sleep` (during which time the
periodic flush should have been called), the buffer should be empty.

* Remove config root dir removal from #3291

* Add godoc for NewWAL mentioning periodic sync
* cs: update wal comments

Follow-up to tendermint/tendermint#3300

* Update consensus/wal.go

Co-Authored-By: melekes <[email protected]>
The test was sometimes failing due to processFlushTicks being called too
early. The solution is to call wal#Start later in the test.
* green pubsub tests :OK:

* get rid of clientToQueryMap

* Subscribe and SubscribeUnbuffered

* start adapting other pkgs to new pubsub

* nope

* rename MsgAndTags to Message

* remove TagMap

it does not bring any additional benefits

* bring back EventSubscriber

* fix test

* fix data race in TestStartNextHeightCorrectly

```
Write at 0x00c0001c7418 by goroutine 796:
  github.com/tendermint/tendermint/consensus.TestStartNextHeightCorrectly()
      /go/src/github.com/tendermint/tendermint/consensus/state_test.go:1296 +0xad
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162

Previous read at 0x00c0001c7418 by goroutine 858:
  github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:1631 +0x1366
  github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:1476 +0x8f
  github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:667 +0xa1e
  github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:628 +0x794

Goroutine 796 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:878 +0x659
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1119 +0xa8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1117 +0x4ee
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1034 +0x2ee
  main.main()
      _testmain.go:214 +0x332

Goroutine 858 (running) created at:
  github.com/tendermint/tendermint/consensus.(*ConsensusState).startRoutines()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:334 +0x221
  github.com/tendermint/tendermint/consensus.startTestRound()
      /go/src/github.com/tendermint/tendermint/consensus/common_test.go:122 +0x63
  github.com/tendermint/tendermint/consensus.TestStateFullRound1()
      /go/src/github.com/tendermint/tendermint/consensus/state_test.go:255 +0x397
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
```

* fixes after my own review

* fix formatting

* wait 100ms before kicking a subscriber out

+ a test for indexer_service

* fixes after my second review

* no timeout

* add changelog entries

* fix merge conflicts

* fix typos after Thane's review

Co-Authored-By: melekes <[email protected]>

* reformat code

* rewrite indexer service in the attempt to fix failing test

tendermint/tendermint#3227

* Revert "rewrite indexer service in the attempt to fix failing test"

This reverts commit 0d9107a098230de7138abb1c201877c246e89ed1.

* another attempt to fix indexer

* fixes after Ethan's review

* use unbuffered channel when indexing transactions

Refs tendermint/tendermint#3227 (comment)

* add a comment for EventBus#SubscribeUnbuffered

* format code
* reject the shared secret if is all zeros in case the blacklist was not
sufficient

* Add test that verifies lower order pub-keys are rejected at the DH step

* Update changelog

* fix typo in test-comment
* bound mempool memory usage

Closes #3079

* rename SizeBytes to TxsTotalBytes

and other small fixes after Zarko's review

* rename MaxBytes to MaxTxsTotalBytes

* make ErrMempoolIsFull more informative

* expose mempool's txs_total_bytes via RPC

* test full response

* fixes after Ethan's review

* config: rename mempool.size to mempool.max_txs

tendermint/tendermint#3248 (comment)

* test more cases

tendermint/tendermint#3248 (comment)

* simplify test

* Revert "config: rename mempool.size to mempool.max_txs"

This reverts commit 39bfa3696177aa46195000b90655419a975d6ff7.

* rename count back to n_txs

to make a change non-breaking

* rename max_txs_total_bytes to max_txs_bytes

* format code

* fix TestWALPeriodicSync

The test was sometimes failing due to processFlushTicks being called too
early. The solution is to call wal#Start later in the test.

* Apply suggestions from code review
* libs/common: TrapSignal accepts logger as a first parameter

 and does not block anymore
* previously it was dumping "captured ..." msg to os.Stdout
* TrapSignal should not be responsible for blocking thread of execution

Refs #3238

* exit with zero (0) code upon receiving SIGTERM/SIGINT

Refs #3238

* fix formatting in docs/app-dev/abci-cli.md

Co-Authored-By: melekes <[email protected]>

* fix formatting in docs/app-dev/abci-cli.md

Co-Authored-By: melekes <[email protected]>
Just a minor followup on the review if #3347: Fixes a comment. [#3347 (comment)](tendermint/tendermint#3347 (comment))
* rename WAL#Flush to WAL#FlushAndSync

- rename auto#Flush to auto#FlushAndSync
- cleanup WAL interface to not leak implementation details!
  * remove Group()
  * add WALReader interface and return it in SearchForEndHeight()
- add interface assertions

Refs #3337

* replace WALReader with io.ReadCloser
* docs: explain create_empty_blocks configurations

Closes #3307

* Vagrantfile: install nodejs for docs

* update docs instructions

npm install does not make sense since there's no packages.json file

* explain broadcast_tx_* tx format

Closes #536

* docs: explain how transaction ordering works

Closes #2904

* bring in consensus parameters explained

* example for create_empty_blocks_interval

* bring in explanation from tendermint/tendermint#2487 (comment)

* link to formatting instead of duplicating info
This issue is related to #3107
This is a first renaming/refactoring step before reworking and removing heartbeats.
As discussed with @liamsi , we preferred to go for a couple of independent and separate PRs to simplify review work.

The changes:

    Help to clarify the relation between the validator and remote signer endpoints
    Differentiate between timeouts and deadlines
    Prepare to encapsulate networking related code behind RemoteSigner in the next PR

My intention is to separate and encapsulate the "network related" code from the actual signer.

SignerRemote ---(uses/contains)--> SignerValidatorEndpoint <--(connects to)--> SignerServiceEndpoint ---> SignerService (future.. not here yet but would like to decouple too)

All reconnection/heartbeat/whatever code goes in the endpoints. Signer[Remote/Service] do not need to know about that.

I agree Endpoint may not be the perfect name. I tried to find something "Go-ish" enough. It is a common name in go-kit, kubernetes, etc.

Right now:
SignerValidatorEndpoint:

    handles the listener
    contains SignerRemote
    Implements the PrivValidator interface
    connects and sets a connection object in a contained SignerRemote
    delegates PrivValidator some calls to SignerRemote which in turn uses the conn object that was set externally

SignerRemote:

    Implements the PrivValidator interface
    read/writes from a connection object directly
    handles heartbeats

SignerServiceEndpoint:

    Does most things in a single place
    delegates to a PrivValidator IIRC.

* cleanup

* Refactoring step 1

* Refactoring step 2

* move messages to another file

* mark for future work / next steps

* mark deprecated classes in docs

* Fix linter problems

* additional linter fixes
* fix dirty data in peerset

startInitPeer before PeerSet add the peer,
once mconnection start and Receive of one
Reactor faild, will try to remove it from PeerSet
while PeerSet still not contain the peer. Fix
this by change the order.

* fix test FilterDuplicate

* fix start/stop race

* fix err
When remove peer, block pool simple remove bpPeer,
but do not stop timer, that cause stopError for recorrected
peers. Stop timer when remove from pool.
1."abci_query": rpcserver.NewRPCFunc(c.ABCIQuery, "path,data,prove")
"validators": rpcserver.NewRPCFunc(c.Validators, "height"),
the parameters and function do not match, cause index out of range error.
2. the prove of query is forced to be true, while default option is false.
3. fix the wrong key of merkle
…om 1.1.0 to 1.3.0 (#3357)

* deps: update gogo/proto from 1.1.1 to 1.2.1

- verified changes manually
  git diff 636bf030~ ba06b47c --stat -- ':!*.pb.go' ':!test'

* deps: update golang/protobuf from 1.1.0 to 1.3.0

- verified changes manually
  git diff b4deda0~ c823c79 -- ':!*.pb.go' ':!test'
Before we're using it to get a round state in tests. Now it can be done
by calling csX.GetRoundState. We will need to rewrite
TestStateSlashingPrevotes and TestStateSlashingPrecommits, which are
commented right now, to not rely on EventDataRoundState#RoundState
field.

Refs #1527
…… (#3048)

* make BlockTimeIota a consensus parameter, not a locally configurable option

Refs #2920

* make TimeIota int64 ms

Refs #2920

* update Gopkg.toml

* fixes after Ethan's review

* fix TestRemoteSignerProposalSigningFailed

* update changelog
Fixes: #3378

* Add stats to cleveldb implementation

* update changelog

* remote TODO

also
- sort keys
- preallocate memory

* fix const initializer []string literal is not a constant

* add test
 - update docker image on circleci
 - remove GOCACHE=off from Makefile (see:
 https://tip.golang.org/doc/go1.12#gocache)
 - update badge in readme
 - update in scripts/install
 - update Vagrantfile
 - update in networks/remote/integration.sh
 - tools/build/Makefile
* fix failure in TestProposerFrequency

* Add test to check priority order after updates

* Changed applyRemovals() and removed Remove()

Changed applyRemovals() similar to applyUpdates()
Removed function Remove()
Updated comments

* review comments

* simplify applyRemovals and add more comments

* small correction in comment

* Fix check in test

* Fix priority check for centering, address review comments

* fix assert for priority centering

* review comments

* review comments

* cleanup and review comments

added upper limit check for validator voting power
moved check for empty validator set earlier
moved panic on potential negative set length in verifyRemovals
added more tests

* review comments
https://tools.ietf.org/html/rfc6962#section-2.1
"The largest power of two less than the number of items" is actually correct!

    For n > 1, let k be the largest power of two smaller than n (i.e.,
    k < n <= 2k).
We're pinning repos without releases because it's very easy to upgrade all the dependencies by executing dep ensure --upgrade. Instead, we should just never run this command directly, only dep ensure --upgrade <some repo>. And we can defend that in PRs.

Refs #3374

The problem with pinning to exact revisions: people who import Tendermint as a library (e.g. abci/types) are stuck with these revisions even though the code they import may not even use them.
Although the version we were pinning to is from Nov. 2016 there were no substantial changes:

    jmhodges/levigo@2b8c778 added go-modules support (no code changes)
    jmhodges/levigo@853d788 added a badge to the readme

closes #3381
Update Gopkg.lock via dep ensure --update golang.org/x/crypto

see #3391 (comment) (nothing to review here really).
ebuchman and others added 16 commits April 15, 2019 08:16
* [adr] Peer behaviour adr updates
* [docs] fix Behaved function signature
* [adr] typo fix in code example
What happened:

New code was supposed to fall back to last height changed when/if it
failed to find validators at checkpoint height (to make release
non-breaking).

But because we did not check if validator set is empty, the fall back
logic was never executed => resulting in LoadValidators returning an
empty validator set for cases where `lastStoredHeight` is checkpoint
height (i.e. almost all heights if the application does not change
validator set often).

How it was found:

one of our users - @sunboshan reported a bug here
tendermint/tendermint#3537 (comment)

* use last height changed in validator set is empty
* add a changelog entry
Fixes #3457

The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.

Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
@yutianwu yutianwu requested a review from unclezoro May 17, 2019 13:16
@yutianwu yutianwu changed the title [WIP] Upgrade from 0301 to 0315 [R4R] Upgrade from 0301 to 0315 May 17, 2019
@yutianwu yutianwu merged commit 500b85f into develop May 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.