-
Notifications
You must be signed in to change notification settings - Fork 1
Design
RSConf is a policy-rich configuration management system with a very specific set of goals:
- High-level opinions baked in (CentOS 7, docker started via systemd, etc.)
- Fail-fast, atomically with debuggable context
- Configure in as few YAML files as feasible or as many as you like
- Program in programming languages (not YAML or Jinja)
- Build everything on the master first and download with Curl/Bash
- The master holds a copy of all configuration for the clients
- Manage secrets automatically and persistently
- Default mode is pull from client; master push via ssh (if desired, but not required)
- Just a configurator, not a remote execution engine (that's what ssh is for)
- Single master serving dev/alpha/beta/prod channels (stages)
- Updates to files, container images, etc. cause server restarts in proper order
- Zero-config: client only needs bash, curl, and credentials
- Explicit coupling between dependencies (components import and call each other)
- Server requirements minimal: nginx and python
- Can operate serverless so master can be bootstrapped using curl file://
These goals are different from Ansible, Salt, and other configuration management systems. We discuss our issues with Ansible and Salt in our DevOps Wiki.
The master uses rsconf build
to create a treat of host-specific
files. Each host has its private tree, which mirrors its file
system. For example, /etc/motd
is accessible via
/host/
hostname/etc/motd
. The clients only have access to their
own trees, and symlinks are used to bring in certain files.
In normal operation, clients
run the command rsc
as root. rsc
is a simple wrapper for a curl installer, which consists of two
bash libraries: install.sh
and rsconf.sh.
Once these libraries load, they run a function that downloads
/host/
hostname/000.sh
, which is the root configuration script
on the master.
Configuration is found in two files /srv/rsconf/db/000.yml
and
/srv/rsconf/db/secret/000.yml
. The latter is used only for secrets although
either file can contain anything. You can have more files if you like, too,
just name them 0
*.yml
so they get processed in order.
The YAML configuration files are structured into three sections in increasing precedence: default, channel, and host. A channel is one of: dev, alpha, beta, and prod. A host resides in only one channel
A simple configuration looks like:
default: {}
channel:
dev:
var1: 1
var2: 2
host:
dev:
v3.radia.run:
var1: 3
The host v3.radia.run
resides in the dev
channel. The value
of var1
will be 3
, because it is overriden in the host section.
var2
will be aggregated from the channel section. There are no defaults
in this example.
A channel is a way of promoting software delivery through test systems. The first test stage is alpha. If a software package passes the alpha tests, it is promoted to beta. And, then on to prod. dev is used only on development machines so packages get rebuilt before they are labeled as alpha.
The advantage of channels is that you get binary compatibility between various testing stages. To promote a package, a symbolic link is updated. It's really that simple.
The dev channel is special, because packages are rebuilt on alpha, not promoted to alpha. The dev channel allows developers to operate within rsconf's eco-system.
rsconf build
generates an entire file tree (/srv/rsconf/srv/host
)
with all hosts. A build is atomic: either it works or it fails, and nothing
is updated. This is probably the major design design differences between
RSConf and other configuration management systems.
When rsconf build
runs, it is debuggable. You don't have to look on
the server and the client. You just have to look on the server. This
avoids security issues related to communicating what went wrong to the
person running the program. With Salt, for example, when a client asks
for configuration, some of the generation happens on the server and
other parts happen on the client. This means you have to look into and
to synchronize two outputs: the server log and the client log. RSConf
decouples generation from execution.
Furthermore, every attempt is made to avoid running complex software
on the client. RSConf drops in files and restarts servers. That's what
it is designed to do. Sometimes it has to run package managers or
special configurators (e.g. sysctl
), but for the most part, all
the client does is download configuration and restart servers.
RSConf limits the programming to Python on the master and Bash on the client. The configuration is declared (not programmed) in a list of YAML. Wherever possible, "options" are avoided in the YAML, that is, if there's a policy decision, it's made in the Python. The client executes the policy decisions made on the master in Python.
Many configuration management systems require you to program in YAML. This can lead to some very cumbersome code that crosses too many language boundaries and requires programmers to "think in YAML" instead of a more convencient programming language like Python.
A component usually corresponds to a service. For example, postfix
,
nginx
, and network
are all components. This mirrors what goes on
with a modern Linux system where system startup is organized into a
collection of Systemd units or init scripts.
A component can require other components in which case those components
are installed and started before the requiring component. The postfix
component requires postgrey
and spamd
:
self.buildt.require_component('postgrey', 'spamd')
RSConf sorts topologically and checks there are no circular dependencies. An exception is raised if a requirement loop is detected, and the build is aborted. While this is only likely to happen during development, this is an example of RSConf's fail-fast philosophy, which prevents errors being ignored as they are by default with many configuration managers such as Ansible and Salt.
RSConf bakes in the decision that Systemd controls how services are managed. Systemd replaced System V init scripts in two major Linux distributes: CentOS 7 and Ubuntu 15. Since RSConf is designed to support CentOS 7, it made sense to bake in the support for Systemd.
This choice simplifies both the specification of the service and the code. For example, the configuration to specify the use of nginx is:
rsconf_db:
components: [ postfix ]
The rest of the decisions about when the service is reloaded
or started is left up to the component itself. The
call of
systemctl daemon-reload
is therefore implicit and automatic unlike other system
configuration managers, which require the user to
manage when systemctl daemon-reload
is called.
Service components have a unit configuration and a run directory,
which is almost always /srv/
service. Strict naming is required,
that is, the run directory, the service, and the systemd unit
files are all named exactly the same. For the majority of services,
the component is named the same, too.
A service is automatically enabled and (re)started if it is not
running or if its configuration has changed. Components tell RSConf
to watch particularly directories or files with the run and unit
files being watched implicitly. If a file that is being watch changes,
the service will be started. RSConf restarts the service immediately
if the service requests it with a call to rsconf_service_restart
,
e.g. postgresql
has this at the end of its Python module:
self.rsconf_service_restart()
This will start all services that have pending restarts. Only one restart will occur per client execution.
To prevent a service with complex dependencies from restarting until the end, a service would specify
self.rsconf_service_restart_at_end()
The postfix
and nginx
components are restarted at the end, because
other components may change files in their watched directories, and their
services are not needed during execution of rsconf.
Systemd has support for timers, which are a
replacement for cron.
The main advantage of timers over
cron
is that you can start the timer services easily in a consistent environment.
While cron
does have a consistent environment, you cannot easily reproduce
it from the command line. With Systemd timers, it is as easy as:
systemctl start logrotate
Timers are not a separately configurable component as yet. They used as a
part of various parts of the system. For example, there is a db_bkp
component that runs on a timer, and executes db_bkp.sh
scripts of
services that install them.
All configuration management systems have secrets. RSConf manages
secrets in /srv/rsconf/db/secret
, which contains YAML files like the
db
directory, and also files in any format that contain secrets. For
example, there is an rsconf_auth
file that contains the hashed
credentials for all known RSConf client hosts and is installed on the
rsconf master.
To help manage the database, secrets have a visibility level: global
,
channel
, or host
_. When a secret is being accessed, the caller
passes the appropriate level. For example, a dovecot
password
database is visible at the host level so that /etc/dovecot/users
is unique for each host that runs the dovecot
IMAP/POP service.
Note that this visibility is only relative to the build on the master,
that is, it is just for name space management during rsconf build
.
Only the master can see this database, and each client can see only
its own files that are generated on the master from the secrets database.
Sometimes there's a need to have plain text and hashed form of
a secret. The plain text file for rsconf_auth
is rsconf_auth.json
,
which contains a machine readable version of the secrets in plain text
so that RSConf can create /root/.netrc
files for the clients.
The need for both versions is subtle. The hashed passwords in rsconf_auth
are salted individually.
While rsconf_auth
could be recreated on every build from rsconf_auth.json
,
RSConf needs to know if the file has changed (a new client host added) in
order to notify the services that depend on it. Therefore, we save a copy
in order to avoid churn.
The same is true for dynamically generated self-signed TLS certificates
and other secrets in test and development environments. RSConf could
regenerate them ever build, this would create too much churn.
rsconf.component.install_secret_path
supports the dynamic generation
of secrets explicitly with an existence test.
rsconf build
creates the entire host tree for all hosts configured
in /srv/rsconf/db
. This is surprisingly fast, taking only a couple of
seconds.
TODO: overview better:
- read the db
- rsconf_db.components
To help avoid bugs due to cross-contamination between host
configurations, the build for each host involves reading all the YAML
files to create a nested datastructure (rsconf.db.Host
class and
instance hdb
) of their merged contents. Precedence is important:
secrets override non-secrets, host overrides channel, etc. To aid
communication between components, the hdb
is modified dynamically,
which is why cross-contamination is possible.
The vast majority of host files are generated from
Jinja2 templates.
In order to avoid further cross-contamination, echo component
copies hdb
to a j2_ctx
(Jinja2 context) variable before populating
the j2_ctx
with generated values.
TODO:
mention jinja in yaml
A component is a Python module in the rsconf.component
package with
a class called T
that subclasses rsconf.component.T
and must
implement an internal_build
method. This method is guaranteed to be
called only once per host build.