Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-3292: Handling for corrupt db config #6377

Merged
merged 8 commits into from
Sep 11, 2023
Merged

CBG-3292: Handling for corrupt db config #6377

merged 8 commits into from
Sep 11, 2023

Conversation

gregns1
Copy link
Contributor

@gregns1 gregns1 commented Aug 23, 2023

CBG-3292

First I have changed it so that when sync gateway on the config interval looks for database configs in the bucket, will also check that the config bucket name matches the bucket the config is found in. If it doesn't we add the db context to the server context corrupt database map to keep track of. Then the changes will remove the in memory representation of the db config and the db context off of the server context all together.
This forces operation on the database to fail so I have added some handling for a suitable error message to be returned to the user in this eventuality informing them that they need to update config to correct the bucket name.
Lastly we needed to be able to correct this corrupt config which is a challenge as at this stage all requests to the db would fail. But I added handling to handle a PUT/POST requests to the db to update the config to correct the corrupt bucket name.
Test seems to panic against rosmar when trying to update bucket config so added walrus check to avoid it against rosmar.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

@gregns1 gregns1 self-assigned this Aug 23, 2023
rest/config.go Outdated Show resolved Hide resolved
rest/admin_api.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Show resolved Hide resolved
rest/config.go Outdated Show resolved Hide resolved
rest/config.go Outdated Show resolved Hide resolved
rest/handler.go Outdated Show resolved Hide resolved
rest/server_context.go Outdated Show resolved Hide resolved
@gregns1 gregns1 assigned torcolvin and unassigned gregns1 Aug 31, 2023
@gregns1 gregns1 requested a review from torcolvin August 31, 2023 13:33
Copy link
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments on the code, and the code comments will hold true regardless, but there is a bigger problem, maybe.

We store the map by dbName but if you run a backup on bucketA with db1 to bucketB and bucketC, there are actually three buckets with a matching database name, but we index the invalid names by dbname only so fixing in bucketB will remove this from bucketC. This code may behave OK, but if that's the case, I'd prefer a test for this. I'd probably try to break that test from the other test cases, or at least parametrize the test for the different cases to make it easier to understand what the test is doing in the future.

Additionally, I'm worried about the frequency of logging here and I'd like other people's input. It's definitely possible that this isn't possible to fix quickly and I don't know the right level to log on. It might be right to log the first time on warning and then log on info for the subsequent errors. (Just throwing out ideas).

rest/config.go Outdated Show resolved Hide resolved
rest/config.go Outdated Show resolved Hide resolved
rest/config.go Outdated Show resolved Hide resolved
rest/handler.go Outdated Show resolved Hide resolved
rest/utilities_testing.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/handler.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we need to worry about different cases and can write tests for them - I'd each of these as a separate test case so it's clear what they are doing.

  1. start RestTester with no buckets and write a dbconfig to a bucket with the SDK that is an invalid bucket (this is a restore of a bucket with nothing running). Assert that the database doesn't exist or we hit the point in time where it doesn't work
  2. make sure if you do a PUT with a mismatched bucket name in the config it will fail (this may already be tested)
  3. in bucketA and bucketB, write two dbconfigs with bucket name as bucketC; then start RestTester, make sure neither of them are picked up (this is the case where we have two buckets that are backed up/restored but no working buckets)
  4. create bucketA and bucketB with dbconfigs that that both list bucket name as bucketA, then start RestTester make sure database for bucketA is picked up and there is logging for bucketB having the wrong bucket name (this is the case where we did a backup/restore of bucketA as bucketB and started SG)
  5. create a working dbconfig in bucketA in a resttester, make sure db is running. write the config into bucketB while the RestTester is running, make sure the database backed by bucketA is still running and there's appropriate logging. (this is the case where SG is running and backup/restore hits)

These roughly match the scenarios of what happens if you start a SG when you have multiple buckets with different information and also what happens if a SG is already running and a bucket appears.

Because an invalid database can never appear, I don't think we need code in handler.go - if there's an bucket name mismatch, we just won't load the database, so you can't query it with /db -> it'll just return 404 as not found because the database wasn't loaded.

The last change we want to make to the bootstrap logic is to log a warning the first time we see a mismatched bucket, but then after log a warning. This is a change from the code on a timer that I wrote.

rest/server_context.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments are just nitpicks about readability.

There is one bigger problem which I think it's worth writing a new test or two for - what happens if you have

  1. dbconfig A, groupID A backed by bucket A in bucket A
  2. dbconfig B, groupID B backed by bucket A in bucket B
  3. dbconfig C, groupID A backed by bucket A in bucket B

It's possible most tests need to do include groupID so make sure that we are tracking dbName, bucket, groupID because it seems like our invalid database structure drops groupID.

Generally I think groupID is used to have a duplicate configuration but avoid having certain nodes participate in import.

rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
base/constants.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/adminapitest/admin_api_test.go Outdated Show resolved Hide resolved
rest/config.go Outdated Show resolved Hide resolved
rest/config.go Outdated Show resolved Hide resolved
rest/config.go Outdated Show resolved Hide resolved
rest/handler.go Outdated Show resolved Hide resolved
rest/server_context.go Outdated Show resolved Hide resolved
@torcolvin torcolvin assigned gregns1 and unassigned torcolvin Sep 5, 2023
@torcolvin
Copy link
Collaborator

Most comments are just nitpicks about readability.

There is one bigger problem which I think it's worth writing a new test or two for - what happens if you have

  1. dbconfig A, groupID A backed by bucket A in bucket A
  2. dbconfig B, groupID B backed by bucket A in bucket B
  3. dbconfig C, groupID A backed by bucket A in bucket B

It's possible most tests need to do include groupID so make sure that we are tracking dbName, bucket, groupID because it seems like our invalid database structure drops groupID.

Generally I think groupID is used to have a duplicate configuration but avoid having certain nodes participate in import.

This is actually not an issue outside of tests because Sync Gateway only has a single ServerContext and it can only have one groupID

@torcolvin torcolvin enabled auto-merge (squash) September 11, 2023 16:21
@torcolvin torcolvin merged commit b35e52e into master Sep 11, 2023
26 checks passed
@torcolvin torcolvin deleted the CBG-3292 branch September 11, 2023 18:28
torcolvin pushed a commit that referenced this pull request Sep 15, 2023
* CBG-3292: force config update on corrupt db config
bbrks pushed a commit that referenced this pull request Oct 3, 2023
bbrks pushed a commit that referenced this pull request Oct 3, 2023
bbrks pushed a commit that referenced this pull request Mar 28, 2024
* CBG-3292: force config update on corrupt db config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants