-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBG-3292: Handling for corrupt db config #6377
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left comments on the code, and the code comments will hold true regardless, but there is a bigger problem, maybe.
We store the map by dbName but if you run a backup on bucketA
with db1
to bucketB
and bucketC
, there are actually three buckets with a matching database name, but we index the invalid names by dbname only so fixing in bucketB
will remove this from bucketC
. This code may behave OK, but if that's the case, I'd prefer a test for this. I'd probably try to break that test from the other test cases, or at least parametrize the test for the different cases to make it easier to understand what the test is doing in the future.
Additionally, I'm worried about the frequency of logging here and I'd like other people's input. It's definitely possible that this isn't possible to fix quickly and I don't know the right level to log on. It might be right to log the first time on warning and then log on info for the subsequent errors. (Just throwing out ideas).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we need to worry about different cases and can write tests for them - I'd each of these as a separate test case so it's clear what they are doing.
- start
RestTester
with no buckets and write a dbconfig to a bucket with the SDK that is an invalid bucket (this is a restore of a bucket with nothing running). Assert that the database doesn't exist or we hit the point in time where it doesn't work - make sure if you do a PUT with a mismatched bucket name in the config it will fail (this may already be tested)
- in
bucketA
andbucketB
, write two dbconfigs with bucket name asbucketC
; then start RestTester, make sure neither of them are picked up (this is the case where we have two buckets that are backed up/restored but no working buckets) - create
bucketA
andbucketB
with dbconfigs that that both list bucket name asbucketA
, then start RestTester make sure database for bucketA is picked up and there is logging forbucketB
having the wrong bucket name (this is the case where we did a backup/restore of bucketA as bucketB and started SG) - create a working dbconfig in
bucketA
in a resttester, make sure db is running. write the config intobucketB
while the RestTester is running, make sure the database backed bybucketA
is still running and there's appropriate logging. (this is the case where SG is running and backup/restore hits)
These roughly match the scenarios of what happens if you start a SG when you have multiple buckets with different information and also what happens if a SG is already running and a bucket appears.
Because an invalid database can never appear, I don't think we need code in handler.go
- if there's an bucket name mismatch, we just won't load the database, so you can't query it with /db
-> it'll just return 404 as not found because the database wasn't loaded.
The last change we want to make to the bootstrap logic is to log a warning the first time we see a mismatched bucket, but then after log a warning. This is a change from the code on a timer that I wrote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most comments are just nitpicks about readability.
There is one bigger problem which I think it's worth writing a new test or two for - what happens if you have
- dbconfig A, groupID A backed by bucket A in bucket A
- dbconfig B, groupID B backed by bucket A in bucket B
- dbconfig C, groupID A backed by bucket A in bucket B
It's possible most tests need to do include groupID so make sure that we are tracking dbName, bucket, groupID because it seems like our invalid database structure drops groupID.
Generally I think groupID is used to have a duplicate configuration but avoid having certain nodes participate in import.
This is actually not an issue outside of tests because Sync Gateway only has a single |
CBG-3292
First I have changed it so that when sync gateway on the config interval looks for database configs in the bucket, will also check that the config bucket name matches the bucket the config is found in. If it doesn't we add the db context to the server context corrupt database map to keep track of. Then the changes will remove the in memory representation of the db config and the db context off of the server context all together.
This forces operation on the database to fail so I have added some handling for a suitable error message to be returned to the user in this eventuality informing them that they need to update config to correct the bucket name.
Lastly we needed to be able to correct this corrupt config which is a challenge as at this stage all requests to the db would fail. But I added handling to handle a PUT/POST requests to the db to update the config to correct the corrupt bucket name.
Test seems to panic against rosmar when trying to update bucket config so added walrus check to avoid it against rosmar.
Pre-review checklist
fmt.Print
,log.Print
, ...)base.UD(docID)
,base.MD(dbName)
)docs/api
Integration Tests
GSI=true,xattrs=true
https://jenkins.sgwdev.com/job/SyncGateway-Integration/1982/