Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IND-369] TFE FDO on Nomad #168

Merged
merged 8 commits into from
Jul 1, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions packs/tfe_fdo_nomad/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# 0.1.0

- Initial release
156 changes: 156 additions & 0 deletions packs/tfe_fdo_nomad/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Nomad pack for Terraform Enterprise FDO

<!-- Include a brief description of your pack -->
tgross marked this conversation as resolved.
Show resolved Hide resolved

This pack deploys Terraform Enterprise on Nomad. This includes running a Terraform Enterprise service job and Terraform Enterprise agent batch job.

## Pack Usage

The pack expects certain prerequisites to be fulfilled before running. The list of prerequisites are:

Interacting with the Nomad server will require
1. `NOMAD_ADDR` - The address of the Nomad server.
1. `NOMAD_TOKEN` - The SecretID of an ACL token to use to authenticate API requests with.
1. `NOMAD_CACERT` - Path to a PEM encoded CA cert file to use to verify the Nomad server SSL certificate.
1. `NOMAD_CLIENT_CERT` - Path to a PEM encoded client certificate for TLS authentication to the Nomad server. Must also specify NOMAD_CLIENT_KEY.
1. `NOMAD_CLIENT_KEY` - Path to an unencrypted PEM encoded private key matching the client certificate from NOMAD_CLIENT_CERT.

After setting up the environment variables, the pack can be setup using the following steps:

1. Create Namespace for TFE job and TFE agent job.

1. Run `nomad namespace apply terraform-enterprise` to create the `terraform-enterprise` namespace. This is the default namespace that is used to bring up TFE Job.
1. Run `nomad namespace apply tfe-agents` to create the `tfe-agents` namespace. This is the default namespace that is used to bring up TFE Agent Job.


2. Create a Nomad ACL policy file `terraform_enterprise_policy.hcl` with the content below:
```hcl
namespace "tfe-agents" {
capabilities = ["submit-job","dispatch-job", "list-jobs", "read-job", "read-logs" ]
}
```

3. Apply the Nomad policy using:
```bash
$ nomad acl policy apply \
-namespace terraform-enterprise -job tfe-job \
-group tfe-group -task tfe-task \
terraform-enterprise-policy ./terraform_enterprise_policy.hcl
```

4. Create the necessary Nomad Variables for each job.

These contain sensitive data that are required like certs, licenses and passwords.
Create a variable specification file:

```hcl
# spec.nv.hcl
path = "nomad/jobs/tfe-job"
kkavish marked this conversation as resolved.
Show resolved Hide resolved
namespace = "terraform-enterprise"

items {
# TFE DB password. Mapped to the TFE_DB_PASSWORD environment variable.
db_password = ""

# The field should contain the base64 encoded value of the cert. Mappped to the TFE_TLS_CERT_FILE environment variable.
cert = ""

# The field should contain the base64 encoded value of the bundle. Mapped to the TFE_TLS_CA_BUNDLE_FILE environment variable.
bundle = ""

# The field should contain the base64 encoded value of the key. Mappped to the TFE_TLS_KEY_FILE environment variable.
key = ""

# A valid TFE license. Mapped to the TFE_LICENSE environment variable.
tfe_license = ""

# Object storage access key. Mapped to the TFE_OBJECT_STORAGE_S3_SECRET_ACCESS_KEY environment variable.
s3_secret_key = ""

# The field should contain the base64 encoded value of the Nomad CA. Mapped to the TFE_RUN_PIPELINE_NOMAD_TLS_CONFIG_CA_CERT environment variable.
nomad_ca_cert = ""

# The field should contain the base64 encoded value of the Nomad cert. Mapped to the TFE_RUN_PIPELINE_NOMAD_TLS_CONFIG_CLIENT_CERT environment variable.
nomad_cert = ""

# The field should contain the base64 encoded value of the Nomad cert's key. Mapped to the TFE_RUN_PIPELINE_NOMAD_TLS_CONFIG_CLIENT_KEY environment variable.
nomad_cert_key = ""
Comment on lines +78 to +85
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that TFE agent can talk to Nomad, right?

There's no NOMAD_TOKEN being set anywhere, which makes me think we should be using the Task API socket and not TLS certificates for having TFE talk to Nomad anyways. But the Pack doesn't include what Nomad ACL policies need to be written for the TFE job either.

Copy link
Contributor Author

@kkavish kkavish Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that TFE agent can talk to Nomad, right?

TFE Agent is not supposed to talk to Nomad. Only the TFE container will.

There's no NOMAD_TOKEN being set anywhere, which makes me think we should be using the Task API socket and not TLS certificates for having TFE talk to Nomad anyways. But the Pack doesn't include what Nomad ACL policies need to be written for the TFE job either.

NOMAD_TOKEN will be injected through identity stanza in TFE job file (not the TFE agent job file) if I'm not wrong. For versions not supporting identity we can put NOMAD_TOKEN in the job file.

Copy link
Member

@tgross tgross Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good to hear that you're using Workload Identity, but then you don't need the certs to use the Task API socket.

We have also provided the sample permissions for users inside Readme.md.

Oops, missed that! Ok.

If you meant to update the job definition to use unix.sock we can do a spike on it before GA and update. Note, that this is beta.

I was told this was urgently shipping to production and that's why we needed it to be at the top of our priority list to review. Let's just finish the job so that the Nomad Engineering team doesn't have to service another unscheduled interrupt for this Pack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgross if I am not wrong, Task APIs are not a special type of job, right? It's an HTTP API which we can call to start a new job on Nomad.

I'm not sure how we can update the job specs here to accommodate Task API.

The agent job here is started by TFE task worker (https://github.com/hashicorp/tfe-task-worker/blob/main/driver/nomad/nomad.go).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Task API provides a Unix domain socket at ${NOMAD_SECRETS_DIR}/api.sock which you'd use instead of the external-facing HTTP endpoint. So no TLS required, just the Workload Identity token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, correct, so essentially we will have to modify the backend HTTP call to trigger TFE agent and not the job file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you need to change the backend? You just need to swap out the NOMAD_ADDR environment variable. See hashicorp/nomad#16872 for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok, now it makes sense! thanks @tgross

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgross this won't work with TFE because of this code - https://github.com/hashicorp/terraform-enterprise/blob/b9f00dc90468660cd9ab99ff5334b9504d020974/config/config.go#L476

I'll have to pick this up, change, test and merge before the next release cycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well. I'll approve.


kkavish marked this conversation as resolved.
Show resolved Hide resolved
# TFE Redis password. Mapped to the TFE_REDIS_PASSWORD environment variable.
redis_password = ""

# TFE Vault encryption key. Mapped to the TFE_ENCRYPTION_PASSWORD environment variable.
tfe_encryption_password = ""

# Password for the registry where the TFE image is hosted. Mapped to the TFE_IMAGE_REGISTRY_PASSWORD environment variable.
tfe_image_registry_password = ""

}
```

The variables can be created as below by passing the `spec.nv.hcl` file we create above:

```bash
$ nomad var put @spec.nv.hcl
```


<!-- Include information about how to use your pack -->
tgross marked this conversation as resolved.
Show resolved Hide resolved

# Pack Information

After completing prerequisites, the pack can be run using the following command:
```bash
$nomad-pack run tfe_fdo_nomad -f var.hcl
tgross marked this conversation as resolved.
Show resolved Hide resolved
```

The `var.hcl` file should contain the necessary variables required for the pack to run. The variables are listed below.
tgross marked this conversation as resolved.
Show resolved Hide resolved

## Variables

These variables may be set to change the behavior of the TFE. Note that some of these variables come with default configuration while the rest need to provided for the pack deployment to succeed.
<!-- Include information on the variables from your pack -->
tgross marked this conversation as resolved.
Show resolved Hide resolved

## Configuration

| Name | Required | Default | Comments |
|----------------------------------------------|----------|----------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| `job_name` | no | `"tfe-job"` | Override the Nomad job name. |
| `datacenters` | no | `"dc1"` | Nomad datacenters where the task in the jobs will be spread. |
| `tfe_namespace` | no | `terraform-enterprise` | Nomad namespace where TFE image will be run as a Nomad task. |
| `tfe_port` | no | `8443` | HTTPS port to expose for TFE task. |
| `tfe_group_count` | no | `1` | Number of task groups to run in the job. |
| `tfe_http_port` | no | `8080` | HTTP port to expose for TFE task. |
| `tfe_service_name` | no | `tfe-service` | Name of the service to register in Nomad DNS. |
| `tfe_database_user` | no | `hashicorp` | TFE database user. |
| `tfe_database_host` | yes | `""` | The host name/IP of the postgres database being used. |
| `tfe_database_name` | no | `"tfe"` | TFE database name. |
| `tfe_database_parameters` | no | `sslmode=require` | TFE database server parameters for the connection URI. |
| `tfe_object_storage_type` | no | `s3` | Type of object storage to use. Must be one of s3, azure, or google. |
| `tfe_run_pipeline_nomad_address` | yes | `""` | The server address of Nomad where TFE is being deployed. |
| `tfe_object_storage_s3_bucket` | yes | `""` | The bucket name of the S3 compatible object storage being used. |
| `tfe_object_storage_s3_region` | no | `us-west-2` | S3 region. |
| `tfe_object_storage_s3_use_instance_profile` | no | `false` | Whether to use the instance profile for authentication. |
| `tfe_object_storage_s3_endpoint` | yes | `""` | The endpoint of the S3 compatible object storage being used. |
| `tfe_object_storage_s3_access_key_id` | yes | `""` | The access key id value to be used to query the S3 object storage bucket. |
| `tfe_redis_host` | yes | `""` | The Redis host name being used. |
| `tfe_redis_user` | no | `""` | Redis server user. |
| `tfe_redis_use_tls` | no | false | Indicates to use TLS to access Redis. |
| `tfe_redis_use_auth` | no | false | Indicates Redis server is configured to use TFE_REDIS_PASSWORD and TFE_REDIS_USER (optional) for authentication. |
| `tfe_hostname` | yes | `""` | The host name of the TFE instance to be used while deploying. |
| `tfe_tls_cert_mount_path` | no | `"/etc/ssl/private/terraform-enterprise"` | Mount path where the certificates and other files will be mounted inside TFE container. |
| `tfe_iact_subnets` | no | `""` | Comma-separated list of subnets in CIDR notation that are allowed to retrieve the initial admin creation token via the API . |
| `tfe_iact_time_limit` | no | `"60"` | Number of minutes that the initial admin creation token can be retrieved via the API after the application starts. |
| `tfe_vault_disable_mlock` | no | `"false"` | Disable mlock for internal Vault. |
| `tfe_resource_cpu` | no | `"750"` | CPU in MHz for TFE container. |
| `tfe_resource_memory` | no | `"1024"` | Memory in MB for TFE container. |
| `tfe_image` | no | `"images.releases.hashicorp.com/hashicorp/terraform-enterprise:v202401-2"` | TFE image and tag to download and run. |
| `tfe_image_registry_username` | no | `"terraform"` | The user name for the registry where the TFE image is hosted. |
| `tfe_image_server_address` | yes | `""` | The server address of the registry where TFE image is hosted. |
| `tfe_run_pipeline_nomad_tls_config_insecure` | no | `"false"` | mTLS between Nomad and TFE when set to false. |
| `tfe_agent_namespace` | no | `"tfe-agents"` | Nomad namespace for TFE Agents to run. |
| `tfe_agent_image` | no | `"hashicorp/tfc-agent:latest"` | TFE Agent image and tag to download and run. |
| `tfe_vault_cluster_port` | no | `"8201"` | Vault cluster port which needs to exposed from the TFE container. |
| `tfe_vault_cluster_address` | no | `"http://$${NOMAD_HOST_ADDR_vault}"` | Cluster URL of the internal Vault server on this node (e.g., http://192.168.0.1:8201). Must be reachable across nodes. |
| `tfe_agent_resource_cpu` | no | `"750"` | CPU in MHz for TFE Agent container. |
| `tfe_agent_resource_memory` | no | `"1024"` | Memory in MB for TFE Agent container. |
12 changes: 12 additions & 0 deletions packs/tfe_fdo_nomad/metadata.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: MPL-2.0

app {
url = "https://developer.hashicorp.com/terraform/enterprise"
tgross marked this conversation as resolved.
Show resolved Hide resolved
}
pack {
name = "tfe_fdo_nomad"
url = "https://github.com/hashicorp/nomad-pack-community-registry/tfe_fdo_nomad"
description = "Terraform Enterprise"
version = "0.1.0"
}
6 changes: 6 additions & 0 deletions packs/tfe_fdo_nomad/outputs.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: MPL-2.0

Congrats! You deployed the terraform_enterprise_fdo pack on Nomad.

You can view your instances of TFE running on the Nomad UI and reach it on the provided hostname.
kkavish marked this conversation as resolved.
Show resolved Hide resolved
57 changes: 57 additions & 0 deletions packs/tfe_fdo_nomad/templates/tfe.agent.nomad.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: MPL-2.0

job "tfe-agent-job" {
tgross marked this conversation as resolved.
Show resolved Hide resolved
type = "batch"
namespace = [[ .tfe_fdo_nomad.tfe_agent_namespace | quote ]]
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}
parameterized {
payload = "forbidden"
meta_required = [
"TFC_AGENT_TOKEN",
"TFC_ADDRESS"
]
meta_optional = [
"TFE_RUN_PIPELINE_IMAGE",
"TFC_AGENT_AUTO_UPDATE",
"TFC_AGENT_CACHE_DIR",
"TFC_AGENT_SINGLE",
"HTTPS_PROXY",
"HTTP_PROXY",
"NO_PROXY"
]
}

group "tfe-agent-group" {

task "tfc-agent-task" {
driver = "docker"

config {
image = [[ .tfe_fdo_nomad.tfe_agent_image | quote ]]
}

env {
TFC_ADDRESS = "${NOMAD_META_TFC_ADDRESS}"
TFC_AGENT_TOKEN = "${NOMAD_META_TFC_AGENT_TOKEN}"
TFC_AGENT_AUTO_UPDATE = "${NOMAD_META_TFC_AGENT_AUTO_UPDATE}"
TFC_AGENT_CACHE_DIR = "${NOMAD_META_TFC_AGENT_CACHE_DIR}"
TFC_AGENT_SINGLE = "${NOMAD_META_TFC_AGENT_SINGLE}"
HTTPS_PROXY = "${NOMAD_META_HTTPS_PROXY}"
https_proxy = "${NOMAD_META_HTTPS_PROXY}"
HTTP_PROXY = "${NOMAD_META_HTTP_PROXY}"
http_proxy = "${NOMAD_META_HTTP_PROXY}"
NO_PROXY = "${NOMAD_META_NO_PROXY}"
no_proxy = "${NOMAD_META_NO_PROXY}"
}

resources {
cpu = [[ .tfe_fdo_nomad.tfe_agent_resource_cpu ]]
memory = [[ .tfe_fdo_nomad.tfe_agent_resource_memory ]]
}
}
}
}
Loading