Ansible for the docs-rs builder #231

rylev · 2023-02-07T14:46:12Z

This adds Ansible configuration for the docs builder for docs-rs.

In addition to adding the ansible tasks, this PR also:

bumps the boto3 version (needed for sso support)
Adds a aws_iam_instance_profile (and the appropriate role and policy) for the builder so that the ec2 instance the builder runs on can access s3.

ansible/roles/common/tasks/metrics.yml

ansible/playbooks/docs-rs-builder.yml

jyn514 · 2023-02-07T14:53:59Z

ansible/roles/docs-rs-builder/defaults/main.yml

+---
+
+vars_mountpoint_file_path: /builder.fs
+vars_mountpoint_size: 53687091200


How did you determine this number? I'm not sure 50 GB is enough; how often is this builder going to be torn down and recreated? Cargo has several caches that grow without bound, we ended up bumping this all the way to 200 GB in prod because we kept running out of space on the 100 GB disk.

This was completely unscientific. I took what the playground is using (which was my basis for these tasks) and added some zeros until I didn't run out of memory. I think we'll need to think more about this...

ansible/roles/docs-rs-builder/defaults/main.yml

ansible/roles/docs-rs-builder/tasks/main.yml

ansible/roles/docs-rs-builder/templates/sudoers

jyn514 · 2023-02-07T15:23:03Z

ansible/roles/docs-rs-builder/templates/builder.service

+#
+# {{ ansible_managed }}
+#
+
+[Unit]
+Description=The docs.rs Builder
+
+[Service]
+Environment=TMPDIR={{ vars_mountpoint_path }}
+Environment=DOCSRS_PREFIX={{ vars_mountpoint_path }}
+Environment=DOCSRS_RUSTWIDE_WORKSPACE={{ vars_mountpoint_path }}
+Environment=DOCSRS_LOG=debug
+Environment=DOCSRS_DATABASE_URL={{ vars_database_url }}
+Environment=DOCSRS_DOCKER_IMAGE=ghcr.io/rust-lang/crates-build-env/linux-micro
+Environment=DOCSRS_STORAGE_BACKEND=s3
+Environment=S3_REGION=us-east-1
+Environment=DOCSRS_S3_BUCKET={{ vars_s3_bucket }}
+Environment=RUST_BACKTRACE=1
+
+WorkingDirectory={{ vars_working_dir }}
+
+ExecStart={{ vars_executable_path }} daemon --registry-watcher=disabled
+
+[Install]
+WantedBy=multi-user.target


Don't have time right now but I want to compare this to what we're running in prod.

Sounds good. We can also land this now and iterate on it as we go forward too.

ok, here's the existing config:

[Unit] Description=Cratesfyi daemon After=network.target postgresql.service [Service] User=cratesfyi Group=cratesfyi EnvironmentFile=/home/cratesfyi/.docs-rs-env ExecStart=/home/ubuntu/docs.rs/target/release/cratesfyi daemon WorkingDirectory=/home/ubuntu/docs.rs LimitNOFILE=20000 Restart=on-failure [Install] WantedBy=multi-user.target

The file limit I think should be ok to omit because we now upload files as .zips instead of .html, but I'd prefer to keep it around until we have everything else working. Restart=on-failure looks important. Environment file is ok to omit since it's configured directly in the .service file, but ideally we'd have some way to configure DOCSRS_TOOLCHAIN in prod in case we need to lock it to a specific version.

Here's the env file:

DOCSRS_PREFIX=/opt/docs-rs-prefix DOCSRS_DATABASE_URL=postgresql://cratesfyi@%2Fvar%2Frun%2Fpostgresql DOCSRS_GITHUB_ACCESSTOKEN=redacted DOCSRS_RUSTWIDE_WORKSPACE=/home/cratesfyi/workspace DOCSRS_TOOLCHAIN=nightly DOCSRS_BUILD_CPU_LIMIT=3 DOCSRS_STORAGE_BACKEND=s3 DOCSRS_LOG=docs_rs=debug,docs_rs::web::strangler=info,rustwide=info RUST_BACKTRACE=1 SENTRY_DSN=https://[email protected]/redacted CACHE_CONTROL_MAX_AGE=600 CACHE_CONTROL_STALE_WHILE_REVALIDATE=86400 CLOUDFRONT_DISTRIBUTION_ID_WEB=redacted CLOUDFRONT_DISTRIBUTION_ID_STATIC=redacted DOCSRS_CDN_BACKEND=cloudfront

All the CACHE_CONTROL vars are important to keep - cc @syphar @jsha to confirm. DOCSRS_CDN_BACKEND and the CLOUDFRONT vars are important to keep. SENTRY_DSN and the GITHUB_ACCESSTOKEN are important. We need some way of configuring secrets, I'm not sure how you're doing that today.

I think it's ok to drop DOCSRS_BUILD_CPU_LIMIT, that's set so we can run the web server in parallel but this is just the builder without a web server.

CACHE_CONTROL_MAX_AGE=600 can be deleted; we don't use that config field anymore.

CACHE_CONTROL_STALE_WHILE_REVALIDATE=86400 should stay; we still use that.

Would the build-server only get the config that it needs for the builds?

Or should the config be the same across the server / container types?

While it feels cleaner to only have the needed config here,
due to the shared the codebase it could lead to bugs when we do changes.

ah good point, I forgot that CACHE_CONTROL is only used by the web server - ok, those variables are probably unnecessary then. the CLOUDFRONT variables are still necessary since we invalidate the cache after a build, though.

ah good point, I forgot that CACHE_CONTROL is only used by the web server - ok, those variables are probably unnecessary then. the CLOUDFRONT variables are still necessary since we invalidate the cache after a build, though.

That's the kind of questions that made me think if we should have the same config / environment in all server types. At least until we reflect this in our implementation so a developer wouldn't assume locally / in testing that the config is available.

The environment variables exposed to the worker vs the web frontend are configured separately in this repo. The web frontend is given its env variables through the terraform config here while the worker gets its environment variables through the systemd unit file created through Ansible here.

It would be pretty difficult to unify these two. I personally think it's best we start treating these two parts of the system separately as they will only become more and more separate over time.

I've updated the unit config with the suggestions here except for the CLOUDFRONT variables. I need to understand invalidations a bit better before I can make that change. I personally would prefer we not block merging this PR before we tackle that issue. The point of the staging environment is that we don't need to be perfect before deploying it.

ansible/roles/docs-rs-builder/templates/builder.service

jyn514 · 2023-02-07T19:19:18Z

terragrunt/modules/docs-rs/builder.tf

+  policy = jsonencode({
+    Version = "2012-10-17"
+    Statement = [
+      // Access to s3
+      {
+        Effect = "Allow"
+        Action = "s3:*"
+
+        Resource = [
+          aws_s3_bucket.storage.arn,
+          "${aws_s3_bucket.storage.arn}/*"


Is there a way to make this append-only, so it can't delete or read existing files?

Yes, we can set the Action above to only non-delete write operations.

I tried moving to only allowing the following actions:

PutObject, PutObjectTagging, PutObjectAcl, CreateBucket

This failed in a strange way. All artifacts were created and successfully uploaded to the s3. The logs only showed one error which simply stated that the library in question could not be documented (even though it clearly was successfully being done so). There were no other errors in the logs as far as I could see to indicate some additional error. Any idea what might be happening here?

ansible/roles/docs-rs-builder/templates/builder.service

syphar · 2023-02-07T20:07:10Z

ansible/roles/docs-rs-builder/templates/builder.service

+#
+# {{ ansible_managed }}
+#
+
+[Unit]
+Description=The docs.rs Builder
+
+[Service]
+Environment=TMPDIR={{ vars_mountpoint_path }}
+Environment=DOCSRS_PREFIX={{ vars_mountpoint_path }}
+Environment=DOCSRS_RUSTWIDE_WORKSPACE={{ vars_mountpoint_path }}
+Environment=DOCSRS_LOG=debug
+Environment=DOCSRS_DATABASE_URL={{ vars_database_url }}
+Environment=DOCSRS_DOCKER_IMAGE=ghcr.io/rust-lang/crates-build-env/linux-micro
+Environment=DOCSRS_STORAGE_BACKEND=s3
+Environment=S3_REGION=us-east-1
+Environment=DOCSRS_S3_BUCKET={{ vars_s3_bucket }}
+Environment=RUST_BACKTRACE=1
+
+WorkingDirectory={{ vars_working_dir }}
+
+ExecStart={{ vars_executable_path }} daemon --registry-watcher=disabled
+
+[Install]
+WantedBy=multi-user.target


Would the build-server only get the config that it needs for the builds?

Or should the config be the same across the server / container types?

While it feels cleaner to only have the needed config here,
due to the shared the codebase it could lead to bugs when we do changes.

syphar · 2023-02-08T16:54:25Z

One thing that I remembered, but this probably won't block merging this for staging:

start-build-server currently doesn't start the webserver, so doesn't have a metrics endpoint yet.

But we can add this before this goes live.

jdno

The Ansible role looks pretty good. We should break it up into separate roles for Docker, Rust, and the builder, but I'll create an issue so that we can do that later.

rylev · 2023-02-13T13:26:28Z

Going to merge this for now as I think we've reached a good starting point. I will make issues for the additional suggestions for improvements in this PR.

rylev commented Feb 7, 2023

View reviewed changes

ansible/roles/common/tasks/metrics.yml Show resolved Hide resolved

jyn514 reviewed Feb 7, 2023

View reviewed changes

rylev added 12 commits February 7, 2023 16:33

Bootstrap docs-rs builder ansible

5a697c3

Some clean up

f59a6e1

Get builder to store in s3

e489c46

Get AWS communication working

beae5db

Adjust prefix

991f5eb

Grab database url from ssm param store

1bd3549

Fix missing env variable

e472974

Don't get by path

d25df37

Use full linux image

0f01f8a

Allow restarting the service

e3d5286

Use better syntax for chdir

518bfaa

No need to install rustup-toolchain-install-master

bdef738

rylev force-pushed the docs-rs-builder-ansible branch from a0971b1 to bdef738 Compare February 7, 2023 17:41

jyn514 reviewed Feb 7, 2023

View reviewed changes

syphar reviewed Feb 7, 2023

View reviewed changes

rylev added 3 commits February 8, 2023 13:31

Deploy a specific sha

e28aa72

Improvements to the unit config

e5541f2

Use start-build-server command

9882ae1

jdno reviewed Feb 10, 2023

View reviewed changes

rylev merged commit 7951385 into rust-lang:master Feb 13, 2023

rylev deleted the docs-rs-builder-ansible branch February 13, 2023 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ansible for the docs-rs builder #231

Ansible for the docs-rs builder #231

rylev commented Feb 7, 2023

jyn514 Feb 7, 2023

rylev Feb 7, 2023

jyn514 Feb 7, 2023

rylev Feb 7, 2023

jyn514 Feb 7, 2023

jsha Feb 7, 2023

syphar Feb 7, 2023

jyn514 Feb 7, 2023

syphar Feb 8, 2023

rylev Feb 8, 2023

rylev Feb 8, 2023

jyn514 Feb 8, 2023

jyn514 Feb 7, 2023

rylev Feb 8, 2023

rylev Feb 8, 2023

syphar Feb 7, 2023

syphar commented Feb 8, 2023

jdno left a comment

rylev commented Feb 13, 2023

Ansible for the docs-rs builder #231

Ansible for the docs-rs builder #231

Conversation

rylev commented Feb 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

syphar commented Feb 8, 2023

jdno left a comment

Choose a reason for hiding this comment

rylev commented Feb 13, 2023