Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Add HTTPS support, plumbing x509 certificates #1500

Merged
merged 16 commits into from
Aug 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/buildomat/jobs/package.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ cargo --version
rustc --version

ptime -m ./tools/install_builder_prerequisites.sh -yp
ptime -m ./tools/create_self_signed_cert.sh -yp

ptime -m cargo run --locked --release --bin omicron-package -- package

Expand Down
1 change: 1 addition & 0 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ Supported config properties include:
|
|Yes
|Dropshot configuration for the external server (i.e., the one that operators and developers using the Oxide rack will use). Specific properties are documented below, but see the Dropshot README for details.
| Note that this is an array of external address configurations; multiple may be supplied.

|`dropshot_external.bind_address`
|`"127.0.0.1:12220"`
Expand Down
9 changes: 6 additions & 3 deletions common/src/nexus_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,12 @@ pub struct DeploymentConfig {
pub id: Uuid,
/// Uuid of the Rack where Nexus is executing.
pub rack_id: Uuid,
/// Dropshot configuration for external API server
pub dropshot_external: ConfigDropshot,
/// Dropshot configuration for internal API server
/// Dropshot configurations for external API server.
///
/// Multiple configurations may be supplied to request
/// combinations of HTTP / HTTPS servers.
pub dropshot_external: Vec<ConfigDropshot>,
/// Dropshot configuration for internal API server.
pub dropshot_internal: ConfigDropshot,
/// Portion of the IP space to be managed by the Rack.
pub subnet: Ipv6Subnet<RACK_PREFIX>,
Expand Down
7 changes: 7 additions & 0 deletions docs/how-to-run.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ This script requires Omicron be uninstalled, e.g., with `pfexec
that is not the case. The script will then remove the file-based vdevs and the
VNICs created by `create_virtual_hardware.sh`.

=== Make me a certificate!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an analogous change for how-to-run-simulated? In general, the test suite matches what how-to-run-simulated does so if you've updated that then things should work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not, because the implementation within Nexus checks for the existence of the certificate files before deciding whether or not to launch the HTTPS server.

For the simulated server, they won't exist, so only the HTTP server will be launched.

I definitely think this would be required before moving to "HTTPS only", but considered this PR an intermediate step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in the other comment, but that doesn't seem like the right behavior. It seems like in production, if they don't exist, that should be a fatal error. And in development, if they do exist, we should start the HTTP server. If this were part of configuration, then whoever's running Nexus can express their intent. (I'm not saying we have to do all that in this PR, but if we decide that is what we want, then I think this behavior is not really an intermediate step.)


Nexus's external interface will typically be served using public-facing x.509
certificate. While we are still configuring the mechanism to integrate this real
certificate into the package system, `./tools/create_self_signed_cert.sh` can be
used to generate an equivalent self-signed certificate.
Comment on lines +58 to +61
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(me): I wonder if I can fix #1398 and make this easier at the same time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #1528. We're gonna need to grab / insert certs from somewhere.

I've updating the packaging hints to make this more obvious if an error is encountered.


== Deploying Omicron

The control plane repository contains a packaging tool which bundles binaries
Expand Down
6 changes: 3 additions & 3 deletions nexus/examples/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,15 @@ address = "[::1]:8123"
id = "e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c"
rack_id = "c19a698f-c6f9-4a17-ae30-20d711b8f7dc"

[deployment.dropshot_external]
# IP address and TCP port on which to listen for the external API
[[deployment.dropshot_external]]
# IP Address and TCP port on which to listen for the external API
bind_address = "127.0.0.1:12220"
# Allow larger request bodies (1MiB) to accomodate firewall endpoints (one
# rule is ~500 bytes)
request_body_max_bytes = 1048576

[deployment.dropshot_internal]
# IP address and TCP port on which to listen for the internal API
# IP Address and TCP port on which to listen for the internal API
bind_address = "127.0.0.1:12221"

[deployment.subnet]
Expand Down
12 changes: 6 additions & 6 deletions nexus/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
[[deployment.dropshot_external]]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
Expand All @@ -358,12 +358,12 @@ mod test {
rack_id: "38b90dc4-c22a-65ba-f49a-f051fe01208f"
.parse()
.unwrap(),
dropshot_external: ConfigDropshot {
dropshot_external: vec![ConfigDropshot {
bind_address: "10.1.2.3:4567"
.parse::<SocketAddr>()
.unwrap(),
..Default::default()
},
},],
dropshot_internal: ConfigDropshot {
bind_address: "10.1.2.3:4568"
.parse::<SocketAddr>()
Expand Down Expand Up @@ -418,7 +418,7 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
[[deployment.dropshot_external]]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
Expand Down Expand Up @@ -460,7 +460,7 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
[[deployment.dropshot_external]]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
Expand Down Expand Up @@ -516,7 +516,7 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
[[deployment.dropshot_external]]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
Expand Down
62 changes: 37 additions & 25 deletions nexus/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ pub fn run_openapi_internal() -> Result<(), String> {
pub struct Server {
/// shared state used by API request handlers
pub apictx: Arc<ServerContext>,
/// dropshot server for external API
pub http_server_external: dropshot::HttpServer<Arc<ServerContext>>,
/// dropshot servers for external API
pub http_servers_external: Vec<dropshot::HttpServer<Arc<ServerContext>>>,
/// dropshot server for internal API
pub http_server_internal: dropshot::HttpServer<Arc<ServerContext>>,
}
Expand All @@ -92,26 +92,36 @@ impl Server {
ServerContext::new(config.deployment.rack_id, ctxlog, &config)
.await?;

let http_server_starter_external = dropshot::HttpServerStarter::new(
&config.deployment.dropshot_external,
external_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_external")),
)
.map_err(|error| format!("initializing external server: {}", error))?;

let http_server_starter_internal = dropshot::HttpServerStarter::new(
// Launch the internal server.
let server_starter_internal = dropshot::HttpServerStarter::new(
&config.deployment.dropshot_internal,
internal_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_internal")),
)
.map_err(|error| format!("initializing internal server: {}", error))?;

let http_server_external = http_server_starter_external.start();
let http_server_internal = http_server_starter_internal.start();

Ok(Server { apictx, http_server_external, http_server_internal })
let http_server_internal = server_starter_internal.start();

// Launch the external server(s).
let http_servers_external = config
.deployment
.dropshot_external
.iter()
.map(|cfg| {
let server_starter_external = dropshot::HttpServerStarter::new(
&cfg,
external_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_external")),
)
.map_err(|error| {
format!("initializing external server: {}", error)
})?;
Ok(server_starter_external.start())
})
.collect::<Result<Vec<dropshot::HttpServer<_>>, String>>()?;

Ok(Server { apictx, http_servers_external, http_server_internal })
}

/// Wait for the given server to shut down
Expand All @@ -120,18 +130,20 @@ impl Server {
/// immediately after calling `start()`, the program will block indefinitely
/// or until something else initiates a graceful shutdown.
pub async fn wait_for_finish(self) -> Result<(), String> {
let errors = vec![
self.http_server_external
.await
.map_err(|e| format!("external: {}", e)),
let mut errors = vec![];
for server in self.http_servers_external {
errors.push(server.await.map_err(|e| format!("external: {}", e)));
}
errors.push(
self.http_server_internal
.await
.map_err(|e| format!("internal: {}", e)),
]
.into_iter()
.filter(Result::is_err)
.map(|r| r.unwrap_err())
.collect::<Vec<String>>();
);
let errors = errors
.into_iter()
.filter(Result::is_err)
.map(|r| r.unwrap_err())
.collect::<Vec<String>>();

if errors.len() > 0 {
let msg = format!("errors shutting down: ({})", errors.join(", "));
Expand Down
6 changes: 4 additions & 2 deletions nexus/test-utils/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ pub struct ControlPlaneTestContext {

impl ControlPlaneTestContext {
pub async fn teardown(mut self) {
self.server.http_server_external.close().await.unwrap();
for server in self.server.http_servers_external {
server.close().await.unwrap();
}
self.server.http_server_internal.close().await.unwrap();
self.database.cleanup().await.unwrap();
self.clickhouse.cleanup().await.unwrap();
Expand Down Expand Up @@ -119,7 +121,7 @@ pub async fn test_setup_with_config(
.expect("Nexus never loaded users");

let testctx_external = ClientTestContext::new(
server.http_server_external.local_addr(),
server.http_servers_external[0].local_addr(),
logctx.log.new(o!("component" => "external client test context")),
);
let testctx_internal = ClientTestContext::new(
Expand Down
5 changes: 1 addition & 4 deletions nexus/tests/config.test.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,13 @@ max_vpc_ipv4_subnet_prefix = 29
id = "e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c"
rack_id = "c19a698f-c6f9-4a17-ae30-20d711b8f7dc"

#
smklein marked this conversation as resolved.
Show resolved Hide resolved
[[deployment.dropshot_external]]
# NOTE: for the test suite, the port MUST be 0 (in order to bind to any
# available port) because the test suite will be running many servers
# concurrently.
#
[deployment.dropshot_external]
bind_address = "127.0.0.1:0"
request_body_max_bytes = 1048576

# port must be 0. see above
[deployment.dropshot_internal]
bind_address = "127.0.0.1:0"
request_body_max_bytes = 1048576
Expand Down
2 changes: 1 addition & 1 deletion nexus/tests/integration_tests/authn_http.rs
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ async fn start_whoami_server(
TestContext::new(
whoami_api,
server_state,
&config.deployment.dropshot_external,
&config.deployment.dropshot_external[0],
Some(logctx),
log,
)
Expand Down
14 changes: 8 additions & 6 deletions openapi/sled-agent.json
Original file line number Diff line number Diff line change
Expand Up @@ -1136,11 +1136,13 @@
{
"type": "object",
"properties": {
"external_address": {
"type": "string"
"external_ip": {
"type": "string",
"format": "ip"
},
"internal_address": {
"type": "string"
"internal_ip": {
"type": "string",
"format": "ipv6"
},
"type": {
"type": "string",
Expand All @@ -1150,8 +1152,8 @@
}
},
"required": [
"external_address",
"internal_address",
"external_ip",
"internal_ip",
"type"
]
},
Expand Down
9 changes: 9 additions & 0 deletions package-manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ zone = true
setup_hint = """
- Run `./tools/ci_download_console` to download the web console assets
- Run `pkg install library/postgresql-13` to download Postgres libraries
- Run `./tools/create_self_signed_cert.sh` to generate a certificate
"""

[[package.omicron-nexus.paths]]
Expand All @@ -30,6 +31,14 @@ to = "/var/svc/manifest/site/nexus"
[[package.omicron-nexus.paths]]
from = "out/console-assets"
to = "/var/nexus/static"
# Note, we could just map the whole "out/certs" directory, but this ensures
# both files exist.
[[package.omicron-nexus.paths]]
from = "out/certs/cert.pem"
to = "/var/nexus/certs/cert.pem"
[[package.omicron-nexus.paths]]
from = "out/certs/key.pem"
to = "/var/nexus/certs/key.pem"
Comment on lines +34 to +41
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a longer-term issue to resolve, but I'd like for this to not necessarily be bundled with the package.

However, this may require a different mechanism for acquiring / rotation of the certificate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed up today and met with Rick to discuss this.

My takeaways:

  • RSS must provide a way for users to input the Nexus IP address + initial certificates
    • Later, an interface through Nexus will support modifying these values
  • A key storage mechanism like https://github.com/hashicorp/vault - not necessarily this particular technology, but something highly available - should be used to store keys for both internal and externally facing services.
    • For internal services, we could concoct a scheme where:
      • When booted, services generate a public/private key pair
      • The private key stays private to the service while it exists in-memory
      • The public key gets published to "vault" - or whatever key management facility we use - and signed by the internal root CA. This should rely on trusting "who is launching the service" (namely, sled agent) to extend this trust.
    • For external services, we rely on user input for these certificates. We could store them in this "vault", but cannot generate them ourselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tracked by #1528


[package.oximeter-collector]
rust.binary_names = ["oximeter"]
Expand Down
17 changes: 10 additions & 7 deletions sled-agent/src/params.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use internal_dns_client::names::{BackendName, ServiceName, AAAA, SRV};
use omicron_common::address::{DENDRITE_PORT, OXIMETER_PORT};
use omicron_common::address::{
DENDRITE_PORT, NEXUS_INTERNAL_PORT, OXIMETER_PORT,
};
use omicron_common::api::external;
use omicron_common::api::internal::nexus::{
DiskRuntimeState, InstanceRuntimeState,
Expand Down Expand Up @@ -342,7 +344,7 @@ impl From<DatasetEnsureBody> for sled_agent_client::types::DatasetEnsureBody {
)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ServiceType {
Nexus { internal_address: SocketAddrV6, external_address: SocketAddr },
Nexus { internal_ip: Ipv6Addr, external_ip: IpAddr },
InternalDns { server_address: SocketAddrV6, dns_address: SocketAddrV6 },
Oximeter,
Dendrite { asic: DendriteAsic },
Expand All @@ -354,10 +356,9 @@ impl From<ServiceType> for sled_agent_client::types::ServiceType {
use ServiceType as St;

match s {
St::Nexus { internal_address, external_address } => AutoSt::Nexus {
internal_address: internal_address.to_string(),
external_address: external_address.to_string(),
},
St::Nexus { internal_ip, external_ip } => {
AutoSt::Nexus { internal_ip, external_ip }
}
St::InternalDns { server_address, dns_address } => {
AutoSt::InternalDns {
server_address: server_address.to_string(),
Expand Down Expand Up @@ -415,7 +416,9 @@ impl ServiceRequest {
pub fn address(&self) -> SocketAddrV6 {
match self.service_type {
ServiceType::InternalDns { server_address, .. } => server_address,
ServiceType::Nexus { internal_address, .. } => internal_address,
ServiceType::Nexus { internal_ip, .. } => {
SocketAddrV6::new(internal_ip, NEXUS_INTERNAL_PORT, 0, 0)
}
ServiceType::Oximeter => {
SocketAddrV6::new(self.addresses[0], OXIMETER_PORT, 0, 0)
}
Expand Down
Loading