-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBG-3742: Allow registry rollbacks based on db config doc rollbacks #6709
Conversation
…config vbucket rollback/config restore)
…ing by SG. Allows for manual repair of database config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of something off the top of my head, but is there a more than 2 way rollback scenario we should worry about?
rest/config.go
Outdated
logMessage += " Conflicting collections detected" | ||
} else { | ||
// Nothing is expected to hit this case, but we might add more invalid sentinel values and forget to update this code. | ||
logMessage += "Invalid config with no known cause." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to log the config contents here so we have a chance to update it? I think it might expose information, but we could at least list the name of the document so it's easier for support to pull this from a customer.
I think this is low priority because we shouldn't hit this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't like to log a full config and I wouldn't expect to require bucket access to the config doc to repair it. Especially thinking forwards to system collections where that won't be possible.
The only reason we'd end up here is some new code in SG called _handleInvalidDatabaseConfig
without updating this bit of code to appropriately handle that case.
I've reworded the log message to be less accusatory about the config itself being invalid. It could be something else that made us consider the database invalid.
logArgs = append(logArgs, base.MD(d.dbNames[dbname].configBucketName), base.MD(d.dbNames[dbname].persistedBucketName)) | ||
} else if cnf.Version == invalidDatabaseConflictingCollectionsVersion { | ||
base.SyncGatewayStats.GlobalStats.ConfigStat.DatabaseCollectionCollisions.Add(1) | ||
logMessage += " Conflicting collections detected" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be logged upstream, but I think it would great to somewhere print the name of the registry / config docs and what collections are conflicting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not feasible to do that here on load, but the warning log that occurred when the rollback set this invalid flag logs what collection(s) caused it.
2024-02-29T15:39:54.284Z [WRN] db:c1_db1 db <ud>c1_db1</ud> config rollback would cause collection conflicts (<ud>map[sg_test_0.sg_test_1:c1_db2]</ud>) - marking database as invalid to allow for manual repair -- rest.(*GatewayRegistry).rollbackDatabaseConfig() at config_registry.go:234
2024-02-29T15:39:54.285Z [WRN] Must repair invalid database config for "c1_db1" for it to be usable! Conflicting collections detected -- rest.(*invalidDatabaseConfigs).addInvalidDatabase() at config.go:334
Given the rollback that just happened, the customer probably doesn't want to change the non-rolled back database, so I'm not sure there's value in knowing which database this one conflicts with. The rollback is the problem that needs correcitng and we log the collection that needs removing.
rest/config_registry.go
Outdated
if registryDatabase.PreviousVersion == nil { | ||
return fmt.Errorf("Rollback requested but registry did not include previous version for db %s", base.MD(dbName)) | ||
base.InfofCtx(ctx, base.KeyConfig, "Rollback requested but registry did not include previous version for db %s - using config doc as previous version", base.UD(dbName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "config doc" might be "last read of doc %s" if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes any material difference to change this? Last read is generally implied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Added subtest to cover simultaneous db rollback |
Changes look good, just a test tweak needed for the CI failures. |
…acks based on db config doc rollbacks (#6709) * Extend startBootstrapServerWithoutConfigPolling polling interval * tidy return * Use config from bucket to roll back if no previous version is found (config vbucket rollback/config restore) * wip * Mark conflicting rollbacks as invalid in db registry and prevent loading by SG. Allows for manual repair of database config. * Improve log message for invalid database configurations - handle multiple scenarios * Handle unknown reasons why a db could be invalid * Do database repair via normal Admin REST API rather to ensure externally recoverable * Equal(len())->Len() * Rename stat * UD->MD fix * lower retry timeout for testing * Require 3 datastores * Replace registry entry with one built from config * Reword unexpected InvalidDatabase case * Push mutliDatabsaeRollback into subtest * fix ineffassign * Fix non-deterministic slice ordering from RequireInvalidDatabaseConfigNames helper
#6718) * [3.1.4 Backport] CBG-3751: Cherry-pick CBG-3742: Allow registry rollbacks based on db config doc rollbacks (#6709) * Extend startBootstrapServerWithoutConfigPolling polling interval * tidy return * Use config from bucket to roll back if no previous version is found (config vbucket rollback/config restore) * wip * Mark conflicting rollbacks as invalid in db registry and prevent loading by SG. Allows for manual repair of database config. * Improve log message for invalid database configurations - handle multiple scenarios * Handle unknown reasons why a db could be invalid * Do database repair via normal Admin REST API rather to ensure externally recoverable * Equal(len())->Len() * Rename stat * UD->MD fix * lower retry timeout for testing * Require 3 datastores * Replace registry entry with one built from config * Reword unexpected InvalidDatabase case * Push mutliDatabsaeRollback into subtest * fix ineffassign * Fix non-deterministic slice ordering from RequireInvalidDatabaseConfigNames helper * New tests require bootstrap connection
…6709) * Extend startBootstrapServerWithoutConfigPolling polling interval * tidy return * Use config from bucket to roll back if no previous version is found (config vbucket rollback/config restore) * wip * Mark conflicting rollbacks as invalid in db registry and prevent loading by SG. Allows for manual repair of database config. * Improve log message for invalid database configurations - handle multiple scenarios * Handle unknown reasons why a db could be invalid * Do database repair via normal Admin REST API rather to ensure externally recoverable * Equal(len())->Len() * Rename stat * UD->MD fix * lower retry timeout for testing * Require 3 datastores * Replace registry entry with one built from config * Reword unexpected InvalidDatabase case * Push mutliDatabsaeRollback into subtest * fix ineffassign * Fix non-deterministic slice ordering from RequireInvalidDatabaseConfigNames helper
CBG-3742: Allow registry rollbacks based on db config doc rollbacks.
Example log output for a rollback causing conflicting collections:
TODO
sc.BootstrapContext
inTestPersistentConfigRegistryRollbackCollectionConflictAfterDbConfigRollback
)Pre-review checklist
fmt.Print
,log.Print
, ...)base.UD(docID)
,base.MD(dbName)
)docs/api
Integration Tests
GSI=true,xattrs=true
https://jenkins.sgwdev.com/job/SyncGateway-Integration/2336/