Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race that can lead to panic during seal #23906

Merged
merged 2 commits into from
Oct 30, 2023
Merged

Conversation

banks
Copy link
Member

@banks banks commented Oct 30, 2023

I'm not sure we'll be able to reproduce the panic but it has been observed internally when seal happens to race with a background metrics collection:

 "panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x3c7f811]

goroutine 11485 [running]:
github.com/hashicorp/vault/vault.(*PolicyStore).getACLView(0x0, 0x0?)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/policy_store_util_ent.go:161 +0x31
github.com/hashicorp/vault/vault.(*PolicyStore).policiesByNamespace(0x0?, {0xc149050, 0xc1d222d2c0}, 0x0, 0x0?)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/policy_store.go:687 +0xaf
github.com/hashicorp/vault/vault.(*PolicyStore).policiesByNamespaces(0xc1490f8?, {0xc149050, 0xc1d222d2c0}, 0x7850de80?, {0xc7b9964068, 0x1, 0x16d9d43?})
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/policy_store.go:721 +0xf6
github.com/hashicorp/vault/vault.(*Core).configuredPoliciesGaugeCollector(0xc002dc9200, {0xc1490f8, 0xc2950c8690})
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/core_metrics.go:596 +0x1a5
github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).collectAndFilterGauges(0xc1fc6fabe0)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/helper/metricsutil/gauge_process.go:186 +0x155
github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).Run(0xc1fc6fabe0)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/helper/metricsutil/gauge_process.go:270 +0x8a
created by github.com/hashicorp/vault/vault.(*Core).emitMetricsActiveNode in goroutine 10785
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/core_metrics.go:335 +0xdf6"

After some analysis the only explanation I've been able to fit is around this racey nil check.

	func (c *Core) configuredPoliciesGaugeCollector(ctx context.Context) ([]metricsutil.GaugeLabelValues, error) {
		if c.policyStore == nil {
			return []metricsutil.GaugeLabelValues{}, nil
		}
                // ^^^ Above check is racey as it read c.policyStore without a lock.
                // Assume that seal runs concurrently and `teardownPolicyStore` 
                // sets c.policyStore to nil at this point. The `Rlock` below will 
                // wait for the sealing Goroutine to release stateLock and then will
                // read policyStore := nil and use it blindly below.
                // The first place this is dereferenced is the line indicated in the 
                // stack trace where the panic occurred.
		c.stateLock.RLock()
		policyStore := c.policyStore
		c.stateLock.RUnlock()

@banks banks added this to the 1.16.0-rc1 milestone Oct 30, 2023
@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Oct 30, 2023
Copy link
Contributor

@hghaf099 hghaf099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

@github-actions
Copy link

Build Results:
All builds succeeded! ✅

@github-actions
Copy link

CI Results:
All Go tests succeeded! ✅

Copy link
Contributor

@raskchanky raskchanky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banks banks merged commit 0634564 into main Oct 30, 2023
107 checks passed
@banks banks deleted the fix/VAULT-21072-metrics-panic branch October 30, 2023 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants