Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-1950] Use the new lease store in the lease manager #15002

Merged
merged 21 commits into from
Dec 16, 2022

Conversation

manadart
Copy link
Member

Under #14918 we added a new implementation of the lease store indirection, backed by a relational database.

Here we change the dependency graph so that the db-accessor worker becomes a dependency of the lease manager, and is used to create the new store for lease state.

A new db-expiry worker is added that periodically deletes leases that have passed their expiry time. It replaces the old global clock updater worker that was used to "tick" the clock inside the Raft FSM, triggering expired lease deletion.

Testing concerns are aided by a new base suite that provides an in-memory SQLite database primed with the controller schema.

Almost all Raft concerns are deleted now that they are no longer required. Some logic remains in core/raftlease, and is kept as a temporary reference for metrics while we decide what we need to capture with the new method - many of the prior metrics actually informed the performance of the pub/sub-based lease client, which is now gone.

Scale testing will inform refinements around retry strategies in the future.

QA steps

  • Bootstrap and enable HA.
  • Use a logging config that includes juju.worker.lease=TRACE;juju.worker.leaseexpiry=DEBUG
  • Deploy some workload models.
  • Observe the controller logs, which will show lease operations.

Documentation changes

Probably not official docs changes, but there are some play-books around for what to do when Raft leases are at sea.

Bug reference

N/A

store constructor.

The FSM and sundries are no longer required and are removed along with
all Raft concerns from the manifold declarations.
retrieve the controller DB, which is used by the lease store.
in-memory SQLite.

This is used by the lease store tests, which are also fixed for the
corrected lease namespaces.

// IsErrRetryable returns true if the given error might be
// transient and the interaction can be safely retried.
func IsErrRetryable(err error) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifted this straight from LXD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really feels like it should be apart of the dqlite go library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think @MathieuBordere?

Copy link

@MathieuBordere MathieuBordere Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that go-dqlite is probably better suited to determine this, tracking it here .

for the purposes of the removed Raft lease client.

The "dropped" error is removed, as we no longer emit it anywhere.
Copy link
Member

@SimonRichardson SimonRichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ControllerDDL should be a string, as you can create multiple tables in one exec. You never want to have a partial controller, with partially applied data.

tx, err := s.DB.Begin()
c.Assert(err, jc.ErrorIsNil)

for _, stmt := range schema.ControllerDDL() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to use the migration Apply here, I don't think it's expensive to run, and it would exercise that code path more.

return nil, errors.Trace(err)
}

db, err := dbGetter.GetDB("controller")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: constant "controller"

Comment on lines +65 to +71
q := `
DELETE FROM lease WHERE uuid in (
SELECT l.uuid
FROM lease l LEFT JOIN lease_pin p ON l.uuid = p.lease_uuid
WHERE p.uuid IS NULL
AND l.expiry < datetime('now')
)`[1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the wrong location to have SQL. I'm not sure where we would want to move this or even call this from. For now, whilst we're moving to the new DB I think it's fine to leave this here until we see more emergent patterns

the error string.

This is intended as a temporary measure which we work out how to ensure
detection with Dqlite codes.
@manadart manadart merged commit d798238 into juju:3.0-dqlite Dec 16, 2022
@manadart manadart deleted the dqlite-use-lease-store branch December 16, 2022 16:35
jujubot added a commit that referenced this pull request Feb 10, 2023
#15177

The following brings the 3.0-dqlite feature branch into the develop branch.

### Changes

This brings in the dqlite database to sit along side the mongo database. Currently, only leases are implemented in Juju using dqlite, more controller base configuration and data will be subsequently moved over to dqlite once this branch has landed.

#### Leases/Raft

The whole raft implementation has been removed from Juju completely. This includes the following workers:

 - raft backstop
 - raft clusterer
 - raft log
 - raft transport
 - global clock updater

In addition, the raft API implementation has also been removed. Instead, the lease has changed to handle the store (dqlite db) directly, improving readability and complexity.

### Jujud 

The `jujud` agent is now built using musl (specifically musl-gcc). This allows `juju` to be built statically embedding `dqlite` in the same process. There are still some rough edges when building and testing and when this lands, we expect to see some churn to polish any of those issues.

Using `go test` is expected to still work as is, this is a last-minute change so that we can utilize sqlite directly for local tests. If you require to test with dqlite (linux only), then running `-tags="dqlite"` with builds/tests/installs is required. All CI jobs are required to run with the dqlite tag.

Some notes:

 1. `CGO_ENABLED=1` and `CGO_LDFLAGS_ALLOW="(-Wl,-wrap,pthread_create)|(-Wl,-z,now)"` are required if you're using dqlite directly.
 2. You are expected to install musl directly on your system if you want to build, using `make musl-install`. This will require sudo.
 3. For development purposes we will download dqlite `.a` files from an s3 bucket to facilitate the setup process. The tar file is sha256 summed to ensure no MITM. You can build these locally if you want to bypass s3 using `make dqlite-build-lxd`. This will spin up an lxd container to build. **Do not attempt** to run `make dqlite-build` locally, unless you know what you're doing.
 4. To access dqlite from a controller, use `make repl`, this will open up a pseudo repl when you can then explore the database with. `.open <db name>` and then you can use SQL from there.
 5. Cross compilation to other architectures can be done using `GOARCH` and `GOOS` before `make install` or `make build`.

There are probably some things I've forgotten, expect a discourse post soon, which will highlight the development flow.

----

Two conflicts when merging. The resolution was to bring in the secret backends for the manifold tests and the controller config type changed for `DefaultMigrationMinionWaitMax`.

```
CONFLICT (content): Merge conflict in cmd/jujud/agent/machine/manifolds_test.go
CONFLICT (content): Merge conflict in controller/config.go
```

c141b2e (upstream/3.0-dqlite) Merge pull request #15159 from SimonRichardson/system-install-musl-by-default
83656e2 Merge pull request #15156 from SimonRichardson/change-log-ddl
125c19d Fix static-analysis pipeline (#15168)
5abfa24 Merge pull request #15140 from SimonRichardson/allow-testing-on-mac
1dc60f6 (3.0-dqlite) Merge pull request #15153 from SimonRichardson/content-addressable-deps
5a1cd24 Merge pull request #15150 from jack-w-shaw/JUJU-2615_symlink_sudo
4502d63 Merge pull request #15148 from SimonRichardson/better-install-method
88941dd Merge pull request #15134 from SimonRichardson/bootstrap-dqlite-agent-tests
2551ffc Merge pull request #15130 from SimonRichardson/build-jujud-snap
0180a53 (origin/3.0-dqlite, manadart/3.0-dqlite) Merge pull request #15123 from SimonRichardson/fix-manifold-lease-expiry-tests
fdf9cc7 Merge pull request #15115 from SimonRichardson/remove-jujud-main-test-file
bf58843 Merge pull request #15113 from SimonRichardson/remove-api-raftlease-api-client
f9419c0 Merge pull request #15112 from SimonRichardson/fix-initializing-state-twice
334d557 Merge pull request #15108 from SimonRichardson/github-action-go-build
2ee6e1a Merge pull request #15107 from SimonRichardson/cross-building-jujud
5a93305 Merge pull request #15087 from SimonRichardson/ensure-placement-of-file
da95dc0 Merge pull request #15086 from SimonRichardson/more-sudo-changes
7b86376 Merge pull request #15085 from SimonRichardson/sudo-apt-get
c4d4eb6 Merge pull request #15057 from SimonRichardson/dqlite-local-build
0ac79b3 Merge pull request #15061 from manadart/develop-into-3.0-dqlite
adc20f7 Merge pull request #15043 from SimonRichardson/allow-overriding-arch-machine
8c02f22 Merge pull request #15048 from SimonRichardson/static-analysis-fix
4547c06 Merge pull request #15050 from manadart/dqlite-address-option
d51b324 Merge pull request #15049 from manadart/dqlite-bootstrap-options
3801b78 Merge pull request #15047 from manadart/develop-into-3.0-dqlite
22d5247 Merge pull request #15037 from SimonRichardson/standardise-dqlite-build
25640a2 Merge pull request #15036 from SimonRichardson/remove-batch-fsm-controller-config
dfa4cb1 Merge pull request #15041 from manadart/dqlite-fix-mock
caf9481 Merge pull request #15034 from manadart/develop-into-3.0-dqlite
c91985d Merge pull request #15035 from SimonRichardson/remove-typed-lease-error
42d17be Merge pull request #15009 from SimonRichardson/allow-repl-via-juju-ssh
d798238 Merge pull request #15002 from manadart/dqlite-use-lease-store
e4f0d39 Merge pull request #14918 from manadart/3.0-dqlite-lease-store
8315fb7 Merge pull request #14986 from manadart/dqlite-build-from-tags
a73b947 Merge pull request #14927 from manadart/3.0-dqlite-lease-store-interface
1657a1d Merge pull request #14910 from manadart/3.0-dqlite-db-supply
27b23f3 Merge pull request #14909 from manadart/3.0-into-3.0-dqlite
6adff35 Merge pull request #14756 from manadart/develop-into-3.0-dqlite
37d81ff Merge pull request #14717 from manadart/develop-into-3.0-dqlite
fe2edb8 Merge pull request #14671 from manadart/3.0-simplify-dbaccessor
1a09836 Merge pull request #14604 from manadart/3.0-bootstrap-controller-db
5ad011e Merge pull request #14652 from manadart/develop-into-3.0-dqlite
1c3d250 Merge pull request #14591 from manadart/develop-into-3.0-dqlite
229cd3e Merge pull request #14578 from manadart/3.0-dqlite-simplify
9d715ba Merge pull request #14565 from manadart/develop-into-3.0-dqlite
92ffd88 Merge pull request #14466 from manadart/develop-into-3.0-dqlite
57f67ce Merge pull request #14336 from manadart/develop-into-3.0-dqlite
648d354 Merge pull request #14364 from manadart/update-musl
198621d Merge pull request #14241 from manadart/develop-into-3.0-dqlite
0360db6 Merge pull request #14153 from manadart/develop-into-3.0-dqlite
17950b2 Merge pull request #14053 from manadart/develop-into-3.0-dqlite
9452026 Merge pull request #14016 from manadart/develop-into-3.0-dqlite
741baca Merge pull request #13963 from manadart/develop-into-3.0-dqlite
5449603 Merge pull request #13969 from manadart/dqlite-manifolds
7b612a0 Merge pull request #13944 from SimonRichardson/dqlite-develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants