Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible for the docs-rs builder #231

Merged
merged 15 commits into from
Feb 13, 2023
Merged

Conversation

rylev
Copy link
Member

@rylev rylev commented Feb 7, 2023

This adds Ansible configuration for the docs builder for docs-rs.

In addition to adding the ansible tasks, this PR also:

  • bumps the boto3 version (needed for sso support)
  • Adds a aws_iam_instance_profile (and the appropriate role and policy) for the builder so that the ec2 instance the builder runs on can access s3.

ansible/playbooks/docs-rs-builder.yml Show resolved Hide resolved
---

vars_mountpoint_file_path: /builder.fs
vars_mountpoint_size: 53687091200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you determine this number? I'm not sure 50 GB is enough; how often is this builder going to be torn down and recreated? Cargo has several caches that grow without bound, we ended up bumping this all the way to 200 GB in prod because we kept running out of space on the 100 GB disk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was completely unscientific. I took what the playground is using (which was my basis for these tasks) and added some zeros until I didn't run out of memory. I think we'll need to think more about this...

ansible/roles/docs-rs-builder/tasks/main.yml Outdated Show resolved Hide resolved
ansible/roles/docs-rs-builder/tasks/main.yml Outdated Show resolved Hide resolved
Comment on lines 1 to 25
#
# {{ ansible_managed }}
#

[Unit]
Description=The docs.rs Builder

[Service]
Environment=TMPDIR={{ vars_mountpoint_path }}
Environment=DOCSRS_PREFIX={{ vars_mountpoint_path }}
Environment=DOCSRS_RUSTWIDE_WORKSPACE={{ vars_mountpoint_path }}
Environment=DOCSRS_LOG=debug
Environment=DOCSRS_DATABASE_URL={{ vars_database_url }}
Environment=DOCSRS_DOCKER_IMAGE=ghcr.io/rust-lang/crates-build-env/linux-micro
Environment=DOCSRS_STORAGE_BACKEND=s3
Environment=S3_REGION=us-east-1
Environment=DOCSRS_S3_BUCKET={{ vars_s3_bucket }}
Environment=RUST_BACKTRACE=1

WorkingDirectory={{ vars_working_dir }}

ExecStart={{ vars_executable_path }} daemon --registry-watcher=disabled

[Install]
WantedBy=multi-user.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have time right now but I want to compare this to what we're running in prod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. We can also land this now and iterate on it as we go forward too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, here's the existing config:

[Unit]
Description=Cratesfyi daemon
After=network.target postgresql.service

[Service]
User=cratesfyi
Group=cratesfyi
EnvironmentFile=/home/cratesfyi/.docs-rs-env
ExecStart=/home/ubuntu/docs.rs/target/release/cratesfyi daemon
WorkingDirectory=/home/ubuntu/docs.rs
LimitNOFILE=20000
Restart=on-failure

[Install]
WantedBy=multi-user.target

The file limit I think should be ok to omit because we now upload files as .zips instead of .html, but I'd prefer to keep it around until we have everything else working. Restart=on-failure looks important. Environment file is ok to omit since it's configured directly in the .service file, but ideally we'd have some way to configure DOCSRS_TOOLCHAIN in prod in case we need to lock it to a specific version.

Here's the env file:

DOCSRS_PREFIX=/opt/docs-rs-prefix
DOCSRS_DATABASE_URL=postgresql://cratesfyi@%2Fvar%2Frun%2Fpostgresql
DOCSRS_GITHUB_ACCESSTOKEN=redacted
DOCSRS_RUSTWIDE_WORKSPACE=/home/cratesfyi/workspace
DOCSRS_TOOLCHAIN=nightly
DOCSRS_BUILD_CPU_LIMIT=3
DOCSRS_STORAGE_BACKEND=s3
DOCSRS_LOG=docs_rs=debug,docs_rs::web::strangler=info,rustwide=info
RUST_BACKTRACE=1
SENTRY_DSN=https://[email protected]/redacted
CACHE_CONTROL_MAX_AGE=600
CACHE_CONTROL_STALE_WHILE_REVALIDATE=86400
CLOUDFRONT_DISTRIBUTION_ID_WEB=redacted
CLOUDFRONT_DISTRIBUTION_ID_STATIC=redacted
DOCSRS_CDN_BACKEND=cloudfront

All the CACHE_CONTROL vars are important to keep - cc @syphar @jsha to confirm. DOCSRS_CDN_BACKEND and the CLOUDFRONT vars are important to keep. SENTRY_DSN and the GITHUB_ACCESSTOKEN are important. We need some way of configuring secrets, I'm not sure how you're doing that today.

I think it's ok to drop DOCSRS_BUILD_CPU_LIMIT, that's set so we can run the web server in parallel but this is just the builder without a web server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CACHE_CONTROL_MAX_AGE=600 can be deleted; we don't use that config field anymore.

CACHE_CONTROL_STALE_WHILE_REVALIDATE=86400 should stay; we still use that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the build-server only get the config that it needs for the builds?

Or should the config be the same across the server / container types?

While it feels cleaner to only have the needed config here,
due to the shared the codebase it could lead to bugs when we do changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point, I forgot that CACHE_CONTROL is only used by the web server - ok, those variables are probably unnecessary then. the CLOUDFRONT variables are still necessary since we invalidate the cache after a build, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point, I forgot that CACHE_CONTROL is only used by the web server - ok, those variables are probably unnecessary then. the CLOUDFRONT variables are still necessary since we invalidate the cache after a build, though.

That's the kind of questions that made me think if we should have the same config / environment in all server types. At least until we reflect this in our implementation so a developer wouldn't assume locally / in testing that the config is available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variables exposed to the worker vs the web frontend are configured separately in this repo. The web frontend is given its env variables through the terraform config here while the worker gets its environment variables through the systemd unit file created through Ansible here.

It would be pretty difficult to unify these two. I personally think it's best we start treating these two parts of the system separately as they will only become more and more separate over time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the unit config with the suggestions here except for the CLOUDFRONT variables. I need to understand invalidations a bit better before I can make that change. I personally would prefer we not block merging this PR before we tackle that issue. The point of the staging environment is that we don't need to be perfect before deploying it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ansible/roles/docs-rs-builder/templates/builder.service Outdated Show resolved Hide resolved
Comment on lines +29 to +39
policy = jsonencode({
Version = "2012-10-17"
Statement = [
// Access to s3
{
Effect = "Allow"
Action = "s3:*"

Resource = [
aws_s3_bucket.storage.arn,
"${aws_s3_bucket.storage.arn}/*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this append-only, so it can't delete or read existing files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can set the Action above to only non-delete write operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried moving to only allowing the following actions:

PutObject, PutObjectTagging, PutObjectAcl, CreateBucket

This failed in a strange way. All artifacts were created and successfully uploaded to the s3. The logs only showed one error which simply stated that the library in question could not be documented (even though it clearly was successfully being done so). There were no other errors in the logs as far as I could see to indicate some additional error. Any idea what might be happening here?

ansible/roles/docs-rs-builder/templates/builder.service Outdated Show resolved Hide resolved
Comment on lines 1 to 25
#
# {{ ansible_managed }}
#

[Unit]
Description=The docs.rs Builder

[Service]
Environment=TMPDIR={{ vars_mountpoint_path }}
Environment=DOCSRS_PREFIX={{ vars_mountpoint_path }}
Environment=DOCSRS_RUSTWIDE_WORKSPACE={{ vars_mountpoint_path }}
Environment=DOCSRS_LOG=debug
Environment=DOCSRS_DATABASE_URL={{ vars_database_url }}
Environment=DOCSRS_DOCKER_IMAGE=ghcr.io/rust-lang/crates-build-env/linux-micro
Environment=DOCSRS_STORAGE_BACKEND=s3
Environment=S3_REGION=us-east-1
Environment=DOCSRS_S3_BUCKET={{ vars_s3_bucket }}
Environment=RUST_BACKTRACE=1

WorkingDirectory={{ vars_working_dir }}

ExecStart={{ vars_executable_path }} daemon --registry-watcher=disabled

[Install]
WantedBy=multi-user.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the build-server only get the config that it needs for the builds?

Or should the config be the same across the server / container types?

While it feels cleaner to only have the needed config here,
due to the shared the codebase it could lead to bugs when we do changes.

@syphar
Copy link
Member

syphar commented Feb 8, 2023

One thing that I remembered, but this probably won't block merging this for staging:

start-build-server currently doesn't start the webserver, so doesn't have a metrics endpoint yet.

But we can add this before this goes live.

Copy link
Member

@jdno jdno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ansible role looks pretty good. We should break it up into separate roles for Docker, Rust, and the builder, but I'll create an issue so that we can do that later.

@rylev
Copy link
Member Author

rylev commented Feb 13, 2023

Going to merge this for now as I think we've reached a good starting point. I will make issues for the additional suggestions for improvements in this PR.

@rylev rylev merged commit 7951385 into rust-lang:master Feb 13, 2023
@rylev rylev deleted the docs-rs-builder-ansible branch February 13, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants