-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ansible for the docs-rs builder #231
Conversation
--- | ||
|
||
vars_mountpoint_file_path: /builder.fs | ||
vars_mountpoint_size: 53687091200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you determine this number? I'm not sure 50 GB is enough; how often is this builder going to be torn down and recreated? Cargo has several caches that grow without bound, we ended up bumping this all the way to 200 GB in prod because we kept running out of space on the 100 GB disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was completely unscientific. I took what the playground is using (which was my basis for these tasks) and added some zeros until I didn't run out of memory. I think we'll need to think more about this...
# | ||
# {{ ansible_managed }} | ||
# | ||
|
||
[Unit] | ||
Description=The docs.rs Builder | ||
|
||
[Service] | ||
Environment=TMPDIR={{ vars_mountpoint_path }} | ||
Environment=DOCSRS_PREFIX={{ vars_mountpoint_path }} | ||
Environment=DOCSRS_RUSTWIDE_WORKSPACE={{ vars_mountpoint_path }} | ||
Environment=DOCSRS_LOG=debug | ||
Environment=DOCSRS_DATABASE_URL={{ vars_database_url }} | ||
Environment=DOCSRS_DOCKER_IMAGE=ghcr.io/rust-lang/crates-build-env/linux-micro | ||
Environment=DOCSRS_STORAGE_BACKEND=s3 | ||
Environment=S3_REGION=us-east-1 | ||
Environment=DOCSRS_S3_BUCKET={{ vars_s3_bucket }} | ||
Environment=RUST_BACKTRACE=1 | ||
|
||
WorkingDirectory={{ vars_working_dir }} | ||
|
||
ExecStart={{ vars_executable_path }} daemon --registry-watcher=disabled | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have time right now but I want to compare this to what we're running in prod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. We can also land this now and iterate on it as we go forward too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, here's the existing config:
[Unit]
Description=Cratesfyi daemon
After=network.target postgresql.service
[Service]
User=cratesfyi
Group=cratesfyi
EnvironmentFile=/home/cratesfyi/.docs-rs-env
ExecStart=/home/ubuntu/docs.rs/target/release/cratesfyi daemon
WorkingDirectory=/home/ubuntu/docs.rs
LimitNOFILE=20000
Restart=on-failure
[Install]
WantedBy=multi-user.target
The file limit I think should be ok to omit because we now upload files as .zips instead of .html, but I'd prefer to keep it around until we have everything else working. Restart=on-failure
looks important. Environment file is ok to omit since it's configured directly in the .service file, but ideally we'd have some way to configure DOCSRS_TOOLCHAIN
in prod in case we need to lock it to a specific version.
Here's the env file:
DOCSRS_PREFIX=/opt/docs-rs-prefix
DOCSRS_DATABASE_URL=postgresql://cratesfyi@%2Fvar%2Frun%2Fpostgresql
DOCSRS_GITHUB_ACCESSTOKEN=redacted
DOCSRS_RUSTWIDE_WORKSPACE=/home/cratesfyi/workspace
DOCSRS_TOOLCHAIN=nightly
DOCSRS_BUILD_CPU_LIMIT=3
DOCSRS_STORAGE_BACKEND=s3
DOCSRS_LOG=docs_rs=debug,docs_rs::web::strangler=info,rustwide=info
RUST_BACKTRACE=1
SENTRY_DSN=https://[email protected]/redacted
CACHE_CONTROL_MAX_AGE=600
CACHE_CONTROL_STALE_WHILE_REVALIDATE=86400
CLOUDFRONT_DISTRIBUTION_ID_WEB=redacted
CLOUDFRONT_DISTRIBUTION_ID_STATIC=redacted
DOCSRS_CDN_BACKEND=cloudfront
All the CACHE_CONTROL vars are important to keep - cc @syphar @jsha to confirm. DOCSRS_CDN_BACKEND and the CLOUDFRONT vars are important to keep. SENTRY_DSN and the GITHUB_ACCESSTOKEN are important. We need some way of configuring secrets, I'm not sure how you're doing that today.
I think it's ok to drop DOCSRS_BUILD_CPU_LIMIT
, that's set so we can run the web server in parallel but this is just the builder without a web server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CACHE_CONTROL_MAX_AGE=600
can be deleted; we don't use that config field anymore.
CACHE_CONTROL_STALE_WHILE_REVALIDATE=86400
should stay; we still use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the build-server only get the config that it needs for the builds?
Or should the config be the same across the server / container types?
While it feels cleaner to only have the needed config here,
due to the shared the codebase it could lead to bugs when we do changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point, I forgot that CACHE_CONTROL is only used by the web server - ok, those variables are probably unnecessary then. the CLOUDFRONT variables are still necessary since we invalidate the cache after a build, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point, I forgot that CACHE_CONTROL is only used by the web server - ok, those variables are probably unnecessary then. the CLOUDFRONT variables are still necessary since we invalidate the cache after a build, though.
That's the kind of questions that made me think if we should have the same config / environment in all server types. At least until we reflect this in our implementation so a developer wouldn't assume locally / in testing that the config is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variables exposed to the worker vs the web frontend are configured separately in this repo. The web frontend is given its env variables through the terraform config here while the worker gets its environment variables through the systemd unit file created through Ansible here.
It would be pretty difficult to unify these two. I personally think it's best we start treating these two parts of the system separately as they will only become more and more separate over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the unit config with the suggestions here except for the CLOUDFRONT variables. I need to understand invalidations a bit better before I can make that change. I personally would prefer we not block merging this PR before we tackle that issue. The point of the staging environment is that we don't need to be perfect before deploying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
a0971b1
to
bdef738
Compare
policy = jsonencode({ | ||
Version = "2012-10-17" | ||
Statement = [ | ||
// Access to s3 | ||
{ | ||
Effect = "Allow" | ||
Action = "s3:*" | ||
|
||
Resource = [ | ||
aws_s3_bucket.storage.arn, | ||
"${aws_s3_bucket.storage.arn}/*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make this append-only, so it can't delete or read existing files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can set the Action above to only non-delete write operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried moving to only allowing the following actions:
PutObject, PutObjectTagging, PutObjectAcl, CreateBucket
This failed in a strange way. All artifacts were created and successfully uploaded to the s3. The logs only showed one error which simply stated that the library in question could not be documented (even though it clearly was successfully being done so). There were no other errors in the logs as far as I could see to indicate some additional error. Any idea what might be happening here?
# | ||
# {{ ansible_managed }} | ||
# | ||
|
||
[Unit] | ||
Description=The docs.rs Builder | ||
|
||
[Service] | ||
Environment=TMPDIR={{ vars_mountpoint_path }} | ||
Environment=DOCSRS_PREFIX={{ vars_mountpoint_path }} | ||
Environment=DOCSRS_RUSTWIDE_WORKSPACE={{ vars_mountpoint_path }} | ||
Environment=DOCSRS_LOG=debug | ||
Environment=DOCSRS_DATABASE_URL={{ vars_database_url }} | ||
Environment=DOCSRS_DOCKER_IMAGE=ghcr.io/rust-lang/crates-build-env/linux-micro | ||
Environment=DOCSRS_STORAGE_BACKEND=s3 | ||
Environment=S3_REGION=us-east-1 | ||
Environment=DOCSRS_S3_BUCKET={{ vars_s3_bucket }} | ||
Environment=RUST_BACKTRACE=1 | ||
|
||
WorkingDirectory={{ vars_working_dir }} | ||
|
||
ExecStart={{ vars_executable_path }} daemon --registry-watcher=disabled | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the build-server only get the config that it needs for the builds?
Or should the config be the same across the server / container types?
While it feels cleaner to only have the needed config here,
due to the shared the codebase it could lead to bugs when we do changes.
One thing that I remembered, but this probably won't block merging this for staging:
But we can add this before this goes live. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ansible role looks pretty good. We should break it up into separate roles for Docker, Rust, and the builder, but I'll create an issue so that we can do that later.
Going to merge this for now as I think we've reached a good starting point. I will make issues for the additional suggestions for improvements in this PR. |
This adds Ansible configuration for the docs builder for docs-rs.
In addition to adding the ansible tasks, this PR also:
aws_iam_instance_profile
(and the appropriate role and policy) for the builder so that the ec2 instance the builder runs on can access s3.