Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Enabling the transaction throttler can lead to vttablet crash #12619

Open
ejortegau opened this issue Mar 13, 2023 · 0 comments
Open

Comments

@ejortegau
Copy link
Contributor

Overview of the Issue

Enabling the transaction throttler can lead to vttablet segfaulting:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x100ebf282]

goroutine 496 [running]:
vitess.io/vitess/go/vt/throttler.replicationLagRecord.lag(...)
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/replication_lag_record.go:40
vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).recalculateRate(0xc00039c540, {{0xc0fbf3ac9824f050, 0x11539634, 0x1055b9260}, {{0x0, 0x0}, 0xc0002c01a0, 0xc0006a2360, 0x0, 0x0, ...}})
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:306 +0x122
vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).processRecord(0xc00039c540, {{0xc0fbf3ac9824f050, 0x11539634, 0x1055b9260}, {{0x0, 0x0}, 0xc0002c01a0, 0xc0006a2360, 0x0, 0x0, ...}})
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:272 +0x165
vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).ProcessRecords(0xc00039c540)
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:261 +0x125
created by vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).Start
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:169 +0x8e

Process finished with the exit code 2

This comes from the MaxReplicationLagModule trying to process a lagRecord with nil Stats.

Reproduction Steps

  1. Start the local deployment with ./101_initial_cluster.sh.
  2. Bring down the primary vttablet with CELL=zone1 KEYSPACE=commerce TABLET_UID=100 bash -x ../common/scripts/vttablet-down.sh
  3. Start it up again, adding the TxThrottler arguments shown below:
--enable-tx-throttler --tx-throttler-config "target_replication_lag_sec: 5 max_replication_lag_sec: 20 initial_rate: 10000 max_increase: 1 emergency_decrease: 0.5 min_duration_between_increases_sec: 2 max_duration_between_increases_sec:5 min_duration_between_decreases_sec: 1 spread_backlog_across_sec: 1 age_bad_rate_after_sec: 180 bad_rate_increase: 0.1 max_rate_approach_threshold: 0.9" --tx-throttler-healthcheck-cells zone1
  1. See it crash with the message above.

Binary Version

[ 8060 ] eduardo.ortega@eduardo-ltmnjyh ~/git/vitess (main)% ➜    ./bin/vttablet --version
Version: 17.0.0-SNAPSHOT (Git revision c89fa57b6330a4a878aea82350f8525b50ad4b77 branch 'main') built on Mon Mar 13 18:00:40 CET 2023 by [email protected] using go1.20.2 darwin/amd64

Operating System and Environment details

macOS Ventura 13.2.1
Darwin 22.3.0
x86_64

Log Fragments

No response

@ejortegau ejortegau added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Mar 13, 2023
ejortegau added a commit to ejortegau/vitess that referenced this issue Mar 13, 2023
…be done

This defends against lag records with nil stats which can lead to segfaults.
See vitessio#12619

Signed-off-by: Eduardo J. Ortega U <[email protected]>
@deepthi deepthi changed the title Bug Report: Bug Report: Enabling the transaction throttler can lead to vttablet segfault Mar 13, 2023
@deepthi deepthi changed the title Bug Report: Enabling the transaction throttler can lead to vttablet segfault Bug Report: Enabling the transaction throttler can lead to vttablet crash Mar 13, 2023
@shlomi-noach shlomi-noach added Component: TabletManager and removed Needs Triage This issue needs to be correctly labelled and triaged labels Mar 14, 2023
timvaillancourt pushed a commit to slackhq/vitess that referenced this issue Apr 5, 2023
…be done

This defends against lag records with nil stats which can lead to segfaults.
See vitessio#12619

Signed-off-by: Eduardo J. Ortega U <[email protected]>
timvaillancourt added a commit to slackhq/vitess that referenced this issue Apr 6, 2023
… 2 (#69)

* Skip recalculating the rate in MaxReplicationLagModule when it can't be done

This defends against lag records with nil stats which can lead to segfaults.
See vitessio#12619

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Address PR comments.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Make linter happy

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Add support for criticality query directive, and have TxThrottler respect that

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Remove unused variable

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Fix CI pipeline

Signed-off-by: Eduardo J. Ortega U <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>

* Address PR comments.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Make linter happy & add extra test cases.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Address PR comments.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Fix circular import

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Make linter happy

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Address PR comments:

* Invalid criticality in query directive fails the query.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Fix executor.go

Signed-off-by: Tim Vaillancourt <[email protected]>

* Fix go/vt/vtgate/executor.go again

Signed-off-by: Tim Vaillancourt <[email protected]>

* Fix TestNewMaxReplicationLagModule_recalculateRate

* Fix go/vt/vtgate/executor_test.go

* Regen protos from linux

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Eduardo J. Ortega U <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Eduardo J. Ortega U <[email protected]>
deepthi pushed a commit that referenced this issue Apr 14, 2023
…be done (#12620)

* Skip recalculating the rate in MaxReplicationLagModule when it can't be done

This defends against lag records with nil stats which can lead to segfaults.
See #12619

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Address PR comments.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Make linter happy

Signed-off-by: Eduardo J. Ortega U <[email protected]>

---------

Signed-off-by: Eduardo J. Ortega U <[email protected]>
dbussink pushed a commit to dbussink/vitess that referenced this issue Apr 17, 2023
…be done (vitessio#12620)

* Skip recalculating the rate in MaxReplicationLagModule when it can't be done

This defends against lag records with nil stats which can lead to segfaults.
See vitessio#12619

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Address PR comments.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Make linter happy

Signed-off-by: Eduardo J. Ortega U <[email protected]>

---------

Signed-off-by: Eduardo J. Ortega U <[email protected]>
timvaillancourt pushed a commit to slackhq/vitess that referenced this issue May 27, 2024
…be done (vitessio#12620)

* Skip recalculating the rate in MaxReplicationLagModule when it can't be done

This defends against lag records with nil stats which can lead to segfaults.
See vitessio#12619

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Address PR comments.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Make linter happy

Signed-off-by: Eduardo J. Ortega U <[email protected]>

---------

Signed-off-by: Eduardo J. Ortega U <[email protected]>
timvaillancourt added a commit to slackhq/vitess that referenced this issue May 28, 2024
* Skip recalculating the rate in MaxReplicationLagModule when it can't be done (vitessio#12620)

* Skip recalculating the rate in MaxReplicationLagModule when it can't be done

This defends against lag records with nil stats which can lead to segfaults.
See vitessio#12619

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Address PR comments.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Make linter happy

Signed-off-by: Eduardo J. Ortega U <[email protected]>

---------

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* Throttled transactions return MySQL error code 1041 ER_OUT_OF_RESOURCES (vitessio#12949)

This error code seems better suited to represent the fact that transactions are
being throttled by the server due to some form of resource contention than the
current code 1203 ER_TOO_MANY_USER_CONNECTIONS.

Signed-off-by: Eduardo J. Ortega U <[email protected]>

* MaxReplicationLagModule.recalculateRate no longer fills the log (vitessio#14875)

Signed-off-by: Eduardo J. Ortega U <[email protected]>

---------

Signed-off-by: Eduardo J. Ortega U <[email protected]>
Co-authored-by: Eduardo J. Ortega U <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants