Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert getRepositoryData only on master node #67780

Conversation

DaveCTurner
Copy link
Contributor

A trap for the uninitiated: only the master should be calling
getRepositoryData(), but today this isn't checked anywhere so there's
a risk that we inadvertently introduce some code that gets the
repository data on other nodes too. This commit introduces an assertion
to catch that.

A trap for the uninitiated: only the master should be calling
`getRepositoryData()`, but today this isn't checked anywhere so there's
a risk that we inadvertently introduce some code that gets the
repository data on other nodes too. This commit introduces an assertion
to catch that.
@DaveCTurner DaveCTurner added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.12.0 labels Jan 20, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ to the idea, but I'm not sure we can do it this easily

// consistency guarantees there, but electedness is too ephemeral to assert. We can say for sure that this node should be
// master-eligible, which is almost as strong since all other snapshot-related activity happens on data nodes whether they be
// master-eligible or not.
assert clusterService.localNode().isMasterNode() : "should only load repository data on master nodes";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about adding it like this, but I'm a little afraid we could see some tripped assertions in tests here. It's conceivable (albeit very improbable) that we failed over to another master just before calling this method isn't it? (I could definitely see SnapshotResiliencyTests trip this pretty easily)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we assert we're master-eligible and not the elected master.

The code comment echoes what you said, and justifies why this is almost as strong as the real thing.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤦 I should learn to read better sorry!

@DaveCTurner DaveCTurner merged commit b467595 into elastic:master Jan 20, 2021
@DaveCTurner DaveCTurner deleted the 2021-01-20-assert-getrepositorydata-on-master branch January 20, 2021 19:21
DaveCTurner added a commit that referenced this pull request Jan 20, 2021
A trap for the uninitiated: only the master should be calling
`getRepositoryData()`, but today this isn't checked anywhere so there's
a risk that we inadvertently introduce some code that gets the
repository data on other nodes too. This commit introduces an assertion
to catch that.
DaveCTurner added a commit that referenced this pull request Jan 20, 2021
This PR caused test failures after merge, one of which was subsequently
muted. It's quite possible that this isn't the only test that's now
broken so this commit reverts it (and the muting) pending further
investigation.

This reverts commit e8ed1b4.
This reverts commit b467595.
DaveCTurner added a commit that referenced this pull request Jan 20, 2021
This PR caused test failures after merge, one of which was subsequently
muted. It's quite possible that this isn't the only test that's now
broken so this commit reverts it (and the muting) pending further
investigation.

This reverts commit bc40524.
This reverts commit 233c7ab.
@DaveCTurner
Copy link
Contributor Author

I reverted this from master (5e3ad9f) and 7.x (66b328a) pending further investigation into #67797.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jan 21, 2021
A trap for the uninitiated: only the master should be calling
`getRepositoryData()`, but today this isn't checked anywhere so there's
a risk that we inadvertently introduce some code that gets the
repository data on other nodes too. This commit introduces an assertion
to catch that.

Second attempt at elastic#67780 which was reverted due to test failures.
DaveCTurner added a commit that referenced this pull request Jan 21, 2021
A trap for the uninitiated: only the master should be calling
`getRepositoryData()`, but today this isn't checked anywhere so there's
a risk that we inadvertently introduce some code that gets the
repository data on other nodes too. This commit introduces an assertion
to catch that.

Second attempt at #67780 which was reverted due to test failures.
DaveCTurner added a commit that referenced this pull request Jan 21, 2021
A trap for the uninitiated: only the master should be calling
`getRepositoryData()`, but today this isn't checked anywhere so there's
a risk that we inadvertently introduce some code that gets the
repository data on other nodes too. This commit introduces an assertion
to catch that.

Second attempt at #67780 which was reverted due to test failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants