Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify the default grpc keepalive time parameter #5478

Closed
wants to merge 957 commits into from

Conversation

inolddays
Copy link
Contributor

1.set grpc_keepalive_time to 10 seconds
2.set grpc_keepalive_timeout to 10 seconds
These settings will help to avoid the situation which query will hang there when pod or Bare metal is broken.

@sougou
Copy link
Contributor

sougou commented Dec 1, 2019

This looks safe to me, but let's have @rafael and @tirsen (or @mpawliszyn ) who are sensitive to grpc behavior changes have a look before we merge.

@rafael
Copy link
Member

rafael commented Dec 5, 2019

This particular change shouldn't create issues in our end as we already set values for this.

However, there is something I think it is worth calling out. We've been super conservative in setting default values for external dependencies. The rationale is that introducing this silently could create weird behavior for folks that were relying on default values of the library itself.

For instance, in this case we even add some extra logic in the case where this is zero to not pass the value at all: https://github.com/vitessio/vitess/blob/master/go/vt/grpcclient/client.go#L69

If we would like to change/reconsider this pattern and this we should discuss it more broadly as we use this approach in many other places in the code base.

Personally, for external libraries like grpc/s3/xtrabackup I prefer not setting any defaults from Vitess perspective and let users be explicit about it. We can come up with recommendations, but by default not set it.

@sougou
Copy link
Contributor

sougou commented Dec 10, 2019

I think @rafael has a point. However, VTGate hanging forever on a down tablet is a problem that still needs solving.

In reality, there are already specific changes in behavior that we make based on different use cases within vitess. For example, VTGate dials vttablet with the FailFast option, while the default behavior is not to fail fast. This change is required so that vtgate can quickly retry another tablet if the current one is down.

Along the same lines, I think VTGate should explicitly pass these flags, and I believe these should be the same as the healthcheck interval. In other words, there should be no need to export a separate value.

Based on my understanding, if the existing (global) flag is overridden, it will end up superseding the values supplied by vtgate (later values win). So, it should be a non-breaking change for anyone that's already overriding these values.

In other words, I think we should make the tablet dialer require these values on Dial, and then pass then through to grpcclient.Dial.

@morgo
Copy link
Contributor

morgo commented Feb 6, 2020

@Johnny-Three Can you rebase this PR on master so the testsuite passes? From the feedback, it looks like we should be good to merge this.

deepthi and others added 23 commits February 6, 2020 14:34
Merge Sharding tests in Go migrated from Python
Signed-off-by: Saif Alharthi <[email protected]>
* initial commit for backup_only

Signed-off-by: Ajeet jain <[email protected]>

* changes in package structure

Signed-off-by: Ajeet jain <[email protected]>

* updating package name and fixing a test

Signed-off-by: Ajeet jain <[email protected]>

* removed unrequired teardown code

Signed-off-by: Ajeet jain <[email protected]>

* Removed debug code

Signed-off-by: Ajeet jain <[email protected]>

* inital commit for xtrabackup

Signed-off-by: Ajeet jain <[email protected]>

* fix sequencing of cleanup

Signed-off-by: Ajeet jain <[email protected]>

* fix terminate restore for xtrabackup

Signed-off-by: Ajeet jain <[email protected]>

* updated config for xtrabackup

Signed-off-by: Ajeet jain <[email protected]>

* backup-mysqlctld: mysqlctld setup module created.

Signed-off-by: pradip parmar <[email protected]>

* added xtrabackup stream mode

Signed-off-by: Ajeet jain <[email protected]>

* updated config for xtrabackup stream

Signed-off-by: Ajeet jain <[email protected]>

* minor changes to config

Signed-off-by: Ajeet jain <[email protected]>

* initial commit for backup_only

Signed-off-by: Ajeet jain <[email protected]>

* changes in package structure

Signed-off-by: Ajeet jain <[email protected]>

* updating package name and fixing a test

Signed-off-by: Ajeet jain <[email protected]>

* removed unrequired teardown code

Signed-off-by: Ajeet jain <[email protected]>

* Removed debug code

Signed-off-by: Ajeet jain <[email protected]>

* inital commit for xtrabackup

Signed-off-by: Ajeet jain <[email protected]>

* fix sequencing of cleanup

Signed-off-by: Ajeet jain <[email protected]>

* fix terminate restore for xtrabackup

Signed-off-by: Ajeet jain <[email protected]>

* updated config for xtrabackup

Signed-off-by: Ajeet jain <[email protected]>

* added xtrabackup stream mode

Signed-off-by: Ajeet jain <[email protected]>

* updated config for xtrabackup stream

Signed-off-by: Ajeet jain <[email protected]>

* minor changes to config

Signed-off-by: Ajeet jain <[email protected]>

* rebased to resolve conflict

Signed-off-by: Arindam Nayak <[email protected]>

* backup-mysqlctld: mysqlctld health check changes.

Signed-off-by: pradip parmar <[email protected]>

* backup_mysqlctld: mysqlctld teardown fixes. vttablet restart method created.

Signed-off-by: pradip parmar <[email protected]>

* backup-mysqlctld: mysqlctld restart fixed, backup utils refactor.

Signed-off-by: pradip parmar <[email protected]>

* backup_transform_mysqlctld: backup_transform testing using mysqlctld.

Signed-off-by: pradip parmar <[email protected]>

* Added percona 56 new dependency

Signed-off-by: Ajeet jain <[email protected]>

* Putting the dependency at right place

Signed-off-by: Ajeet jain <[email protected]>

* updated apt tp apt-get

Signed-off-by: Ajeet jain <[email protected]>

* corrected the typo

Signed-off-by: Ajeet jain <[email protected]>

* fixed a comma in config.json

Signed-off-by: Ajeet jain <[email protected]>

* review changes.

Signed-off-by: pradip parmar <[email protected]>

* test added in config.

Signed-off-by: pradip parmar <[email protected]>

* package name changed.

Signed-off-by: pradip parmar <[email protected]>

* mysql_ctld: file name change.

Signed-off-by: pradip parmar <[email protected]>

* mysqlctld: review changes,
code refactor, removed some functions.

Signed-off-by: pradip parmar <[email protected]>

* backup_transform: refator.

Signed-off-by: pradip parmar <[email protected]>

* backup_mysqlctld: config changes.

Signed-off-by: pradip parmar <[email protected]>

* restore vtBackup test

Signed-off-by: Ajeet jain <[email protected]>

* redistribute travis tests

Signed-off-by: Ajeet jain <[email protected]>

* updated shard matrix for transform test

Signed-off-by: Ajeet jain <[email protected]>

* mysqlctld: mysqlctld teardown issue resolved.

Signed-off-by: pradip parmar <[email protected]>

* mysqlctld: mysqlctld teardown issue resolved.

Signed-off-by: pradip parmar <[email protected]>

* mysql-ctld: process teardown changes.

Signed-off-by: pradip parmar <[email protected]>

* review changes and newConnection method modified.

Signed-off-by: pradip parmar <[email protected]>

* mysqlctld: process hang issue resolved.

Signed-off-by: pradip parmar <[email protected]>

Co-authored-by: Ajeet Jain <[email protected]>
Co-authored-by: Arindam Nayak <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]>
Signed-off-by: pradip parmar <[email protected]>
Signed-off-by: Ajeet jain <[email protected]>
Fixes vitessio#5800

The function hasAnother commit is supposed to separate out events
that have to be applied in a transaction from events that are
applied as autocommits. But it was not checking for the OTHER
and JOURNAL events. They should not be batched with regular
transactions.

This caused a bug where a regular transaction occured followed
by an OTHER event. The apply of this event happened assuming
autocommit behavior, but it actually got added to the previous
uncommitted transaction.

If the stop position is reached at this point, vplayer exits
because it thinks the autocommit happened, and the entire
transaction gets rolled back, and the stop position ends
up not being actually reached.

The copier which expects this behavior will then start applying
the next set of rows, but they are not consistent with the current
stop position. This will cause the target to go out of sync.

Signed-off-by: Sugu Sougoumarane <[email protected]>
Signed-off-by: Ajeet jain <[email protected]>
Signed-off-by: Sugu Sougoumarane <[email protected]>
migrating messaging python testcase to go
Signed-off-by: Sugu Sougoumarane <[email protected]>
Fetch MariaDB 10.1 from MariaDB repos (works more consistently)

Signed-off-by: Morgan Tocker <[email protected]>
sougou and others added 22 commits March 8, 2020 21:10
Also fix an incorrect test.

Signed-off-by: Sugu Sougoumarane <[email protected]>
Signed-off-by: Sugu Sougoumarane <[email protected]>
Not all code paths were updating the stats. This was causing
the stats reported in /debug/status to be unreliable. This refactor
keeps the stats and reporting in sync by performing the updates
at the lower level functions that also update the vreplication table.

Signed-off-by: Sugu Sougoumarane <[email protected]>
Signed-off-by: Sugu Sougoumarane <[email protected]>
Do not drop leading zeroes in microsecond timestamps for prepared statements.
Signed-off-by: Sugu Sougoumarane <[email protected]>
Signed-off-by: Sugu Sougoumarane <[email protected]>
vstreamer: send immediate GTID on "current"
vrepl: more documentation and fix stats
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
1.set grpc_keepalive_time to 10 seconds
2.set grpc_keepalive_timeout to 10 seconds
These settings will help to avoid the situation which query will hang there when pod or Bare metal is broken.

Signed-off-by: JohnnyThree <[email protected]>
@inolddays
Copy link
Contributor Author

close this and created 5922 to replace .

@inolddays
Copy link
Contributor Author

close this and created 5922 to replace.

@inolddays inolddays closed this Mar 16, 2020
@morgo morgo mentioned this pull request Mar 16, 2020
@inolddays inolddays deleted the master branch March 17, 2020 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.