Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Add HTTPS support, plumbing x509 certificates #1500

Merged
merged 16 commits into from
Aug 8, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 6 additions & 26 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -77,35 +77,15 @@ Supported config properties include:
|Yes
|URL identifying the CockroachDB instance(s) to connect to. CockroachDB is used for all persistent data.

|`dropshot_external`
smklein marked this conversation as resolved.
Show resolved Hide resolved
|
|Yes
|Dropshot configuration for the external server (i.e., the one that operators and developers using the Oxide rack will use). Specific properties are documented below, but see the Dropshot README for details.

|`dropshot_external.bind_address`
|`"127.0.0.1:12220"`
|Yes
|Specifies that the server should bind to the given IP address and TCP port for the **external** API (i.e., the one that operators and developers using the Oxide rack will use). In general, servers can bind to more than one IP address and port, but this is not (yet?) supported.

|`dropshot_external.request_body_max_bytes`
|`1000`
|Yes
|Specifies the maximum request body size for the **external** API.

|`dropshot_internal`
|
|Yes
|Dropshot configuration for the internal server (i.e., the one used by the sled agent). Specific properties are documented below, but see the Dropshot README for details.

|`dropshot_internal.bind_address`
|`"127.0.0.1:12220"`
|`external_ip`
|`"127.0.0.1"`
|Yes
|Specifies that the server should bind to the given IP address and TCP port for the **internal** API (i.e., the one used by the sled agent). In general, servers can bind to more than one IP address and port, but this is not (yet?) supported.
|Specifies that the server should bind to the given IP address for the **external** API (i.e., the one that operators and developers using the Oxide rack will use).

|`dropshot_internal.request_body_max_bytes`
|`1000`
|`internal_ip`
|`"127.0.0.1"`
|Yes
|Specifies the maximum request body size for the **internal** API.
|Dropshot configuration for the internal server (i.e., the one used by the sled agent).

|`id`
|`"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c"`
Expand Down
30 changes: 25 additions & 5 deletions common/src/nexus_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@

use super::address::{Ipv6Subnet, RACK_PREFIX};
use super::postgres_config::PostgresConfigWithUrl;
use dropshot::ConfigDropshot;
use serde::{Deserialize, Serialize};
use serde_with::serde_as;
use serde_with::DisplayFromStr;
use std::fmt;
use std::net::IpAddr;
use std::path::{Path, PathBuf};
use uuid::Uuid;

Expand Down Expand Up @@ -98,16 +98,36 @@ pub enum Database {
},
}

/// Describes how ports are selected for dropshot's HTTP servers.
smklein marked this conversation as resolved.
Show resolved Hide resolved
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
#[serde(rename_all = "snake_case")]
pub enum PortPicker {
/// Use default values for ports, defined by Nexus.
NexusChoice,
/// Use port zero - this is avoids conflicts during tests,
/// by letting the OS pick free ports.
Zero,
}

impl Default for PortPicker {
fn default() -> Self {
PortPicker::NexusChoice
}
}

#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct DeploymentConfig {
/// Uuid of the Nexus instance
pub id: Uuid,
/// Uuid of the Rack where Nexus is executing.
pub rack_id: Uuid,
/// Dropshot configuration for external API server
pub dropshot_external: ConfigDropshot,
/// Dropshot configuration for internal API server
pub dropshot_internal: ConfigDropshot,
/// External address of Nexus.
pub external_ip: IpAddr,
/// Internal address of Nexus.
pub internal_ip: IpAddr,
/// Decides how ports are selected
#[serde(default)]
pub port_picker: PortPicker,
/// Portion of the IP space to be managed by the Rack.
pub subnet: Ipv6Subnet<RACK_PREFIX>,
/// DB configuration.
Expand Down
7 changes: 7 additions & 0 deletions docs/how-to-run.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ This script requires Omicron be uninstalled, e.g., with `pfexec
that is not the case. The script will then remove the file-based vdevs and the
VNICs created by `create_virtual_hardware.sh`.

=== Make me a certificate!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an analogous change for how-to-run-simulated? In general, the test suite matches what how-to-run-simulated does so if you've updated that then things should work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not, because the implementation within Nexus checks for the existence of the certificate files before deciding whether or not to launch the HTTPS server.

For the simulated server, they won't exist, so only the HTTP server will be launched.

I definitely think this would be required before moving to "HTTPS only", but considered this PR an intermediate step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in the other comment, but that doesn't seem like the right behavior. It seems like in production, if they don't exist, that should be a fatal error. And in development, if they do exist, we should start the HTTP server. If this were part of configuration, then whoever's running Nexus can express their intent. (I'm not saying we have to do all that in this PR, but if we decide that is what we want, then I think this behavior is not really an intermediate step.)


Nexus's external interface will typically be served using public-facing x.509
certificate. While we are still configuring the mechanism to integrate this real
certificate into the package system, `./tools/create_self_signed_cert.sh` can be
used to generate an equivalent self-signed certificate.
Comment on lines +58 to +61
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(me): I wonder if I can fix #1398 and make this easier at the same time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #1528. We're gonna need to grab / insert certs from somewhere.

I've updating the packaging hints to make this more obvious if an error is encountered.


== Deploying Omicron

The control plane repository contains a packaging tool which bundles binaries
Expand Down
13 changes: 2 additions & 11 deletions nexus/examples/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,8 @@ address = "[::1]:8123"
# Identifier for this instance of Nexus
id = "e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c"
rack_id = "c19a698f-c6f9-4a17-ae30-20d711b8f7dc"

[deployment.dropshot_external]
# IP address and TCP port on which to listen for the external API
bind_address = "127.0.0.1:12220"
# Allow larger request bodies (1MiB) to accomodate firewall endpoints (one
# rule is ~500 bytes)
request_body_max_bytes = 1048576

[deployment.dropshot_internal]
# IP address and TCP port on which to listen for the internal API
bind_address = "127.0.0.1:12221"
external_ip = "127.0.0.1"
internal_ip = "127.0.0.1"

[deployment.subnet]
net = "fd00:1122:3344:0100::/56"
Expand Down
52 changes: 13 additions & 39 deletions nexus/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,17 +215,16 @@ mod test {
AuthnConfig, Config, ConsoleConfig, LoadError, PackageConfig,
SchemeName, TimeseriesDbConfig, UpdatesConfig,
};
use dropshot::ConfigDropshot;
use dropshot::ConfigLogging;
use dropshot::ConfigLoggingIfExists;
use dropshot::ConfigLoggingLevel;
use libc;
use omicron_common::address::{Ipv6Subnet, RACK_PREFIX};
use omicron_common::nexus_config::{
Database, DeploymentConfig, LoadErrorKind,
Database, DeploymentConfig, LoadErrorKind, PortPicker,
};
use std::fs;
use std::net::{Ipv6Addr, SocketAddr};
use std::net::{IpAddr, Ipv6Addr};
use std::path::Path;
use std::path::PathBuf;

Expand Down Expand Up @@ -336,12 +335,8 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
bind_address = "10.1.2.3:4568"
request_body_max_bytes = 1024
external_ip = "10.1.2.3"
internal_ip = "10.1.2.4"
[deployment.subnet]
net = "::/56"
[deployment.database]
Expand All @@ -358,18 +353,9 @@ mod test {
rack_id: "38b90dc4-c22a-65ba-f49a-f051fe01208f"
.parse()
.unwrap(),
dropshot_external: ConfigDropshot {
bind_address: "10.1.2.3:4567"
.parse::<SocketAddr>()
.unwrap(),
..Default::default()
},
dropshot_internal: ConfigDropshot {
bind_address: "10.1.2.3:4568"
.parse::<SocketAddr>()
.unwrap(),
..Default::default()
},
external_ip: "10.1.2.3".parse::<IpAddr>().unwrap(),
internal_ip: "10.1.2.4".parse::<IpAddr>().unwrap(),
port_picker: PortPicker::default(),
subnet: Ipv6Subnet::<RACK_PREFIX>::new(Ipv6Addr::LOCALHOST),
database: Database::FromDns,
},
Expand Down Expand Up @@ -418,12 +404,8 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
bind_address = "10.1.2.3:4568"
request_body_max_bytes = 1024
external_ip = "10.1.2.3"
internal_ip = "10.1.2.4"
[deployment.subnet]
net = "::/56"
[deployment.database]
Expand Down Expand Up @@ -460,12 +442,8 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
bind_address = "10.1.2.3:4568"
request_body_max_bytes = 1024
external_ip = "10.1.2.3"
internal_ip = "10.1.2.4"
[deployment.subnet]
net = "::/56"
[deployment.database]
Expand Down Expand Up @@ -516,12 +494,8 @@ mod test {
[deployment]
id = "28b90dc4-c22a-65ba-f49a-f051fe01208f"
rack_id = "38b90dc4-c22a-65ba-f49a-f051fe01208f"
[deployment.dropshot_external]
bind_address = "10.1.2.3:4567"
request_body_max_bytes = 1024
[deployment.dropshot_internal]
bind_address = "10.1.2.3:4568"
request_body_max_bytes = 1024
external_ip = "10.1.2.3"
internal_ip = "10.1.2.4"
[deployment.subnet]
net = "::/56"
[deployment.database]
Expand Down
100 changes: 88 additions & 12 deletions nexus/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ pub use crucible_agent_client;
use external_api::http_entrypoints::external_api;
use internal_api::http_entrypoints::internal_api;
use slog::Logger;
use std::net::SocketAddr;
use std::path::PathBuf;
use std::sync::Arc;

#[macro_use]
Expand Down Expand Up @@ -71,7 +73,9 @@ pub fn run_openapi_internal() -> Result<(), String> {
pub struct Server {
/// shared state used by API request handlers
pub apictx: Arc<ServerContext>,
/// dropshot server for external API
/// dropshot server for external API (encrypted)
pub https_server_external: Option<dropshot::HttpServer<Arc<ServerContext>>>,
/// dropshot server for external API (unencrypted)
pub http_server_external: dropshot::HttpServer<Arc<ServerContext>>,
/// dropshot server for internal API
pub http_server_internal: dropshot::HttpServer<Arc<ServerContext>>,
Expand All @@ -92,26 +96,98 @@ impl Server {
ServerContext::new(config.deployment.rack_id, ctxlog, &config)
.await?;

let http_server_starter_external = dropshot::HttpServerStarter::new(
&config.deployment.dropshot_external,
external_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_external")),
)
.map_err(|error| format!("initializing external server: {}", error))?;

// Determine port choices

let (external_http_port, external_https_port, internal_http_port) =
match config.deployment.port_picker {
omicron_common::nexus_config::PortPicker::NexusChoice => {
(80, 443, omicron_common::address::NEXUS_INTERNAL_PORT)
}
omicron_common::nexus_config::PortPicker::Zero => (0, 0, 0),
};

// Launch the internal server.

let dropshot_internal_config = dropshot::ConfigDropshot {
bind_address: SocketAddr::new(
config.deployment.internal_ip,
internal_http_port,
),
request_body_max_bytes: 1048576,
..Default::default()
};
let http_server_starter_internal = dropshot::HttpServerStarter::new(
&config.deployment.dropshot_internal,
&dropshot_internal_config,
internal_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_internal")),
)
.map_err(|error| format!("initializing internal server: {}", error))?;
let http_server_internal = http_server_starter_internal.start();

// Launch the external server(s).
//
// - The HTTP server is unconditionally started.
// - The HTTPS server is started if the necessary certificate files
// exist.
//
// TODO: Consider changing this disposition, making "HTTPS" the default,
// and returning an error if the certificates don't exist. Doing so
// would be the more secure long-term plan, but would make gradual
// deployment of this feature more difficult.

let cert_file = PathBuf::from("/var/nexus/certs/cert.pem");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unlikely to work on development machines.

Why not make this part of dropshot_external, with /var/nexus/certs/cert.pem the value in the SMF config file and something like "out/certs" the value in nexus/examples/config.toml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this file is only used if cert_file.exists() && key_file.exists().

I was kinda opposed to using the config-file-supplied certificate, because longer-term, it seemed like this value would be:

So although we are currently plumbing the certificates through the package system, I don't expect that to stay the case for very long.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this file is only used if cert_file.exists() && key_file.exists().

What happens if the path is missing on a production system? It seems like that should produce a fatal error.

I was kinda opposed to using the config-file-supplied certificate

I can see where you're coming from. At the same time, it seems like an essential developer or demo tool to be able to start Nexus with a specific certificate?

Is that a good idea for the production config? I'm not sure. It might depend on the expected behavior in the longer term. Is it that Nexus reaches out to CockroachDB/Vault to get the certificate, then loads that into memory, and then starts the HTTPS server with it? If that's true, then what do we do if Vault is down? I could see: (1) not coming up at all until we can get the cert, (2) storing the last-known-good certificate to a file and trying to use that until we can get the cert, (3) generating a self-signed certificate and using that until we're able to access the real one. These all have different tradeoffs. I've been assuming Nexus could come up (albeit not very usefully) even if everything else in the world was down so that we could at least ask it why it wasn't able to come up. Anyway, if we go with (2) or (3), then it seems like accepting a file path in the config file would be a reasonable stepping stone.

(Along these lines, I guess hardcoded, deployment-specific absolute paths in source files seem like a code smell to me, even if we gracefully handle them being missing.)

let key_file = PathBuf::from("/var/nexus/certs/key.pem");

let https_server_external = if cert_file.exists() && key_file.exists() {
let dropshot_external_https_config = dropshot::ConfigDropshot {
bind_address: SocketAddr::new(
config.deployment.external_ip,
external_https_port,
),
request_body_max_bytes: 1048576,
tls: Some(dropshot::ConfigTls { cert_file, key_file }),
};
let https_server_starter_external =
dropshot::HttpServerStarter::new(
&dropshot_external_https_config,
external_api(),
Arc::clone(&apictx),
&log.new(
o!("component" => "dropshot_external (encrypted)"),
),
)
.map_err(|error| {
format!("initializing external server: {}", error)
})?;
Some(https_server_starter_external.start())
} else {
None
};

let dropshot_external_http_config = dropshot::ConfigDropshot {
bind_address: SocketAddr::new(
config.deployment.external_ip,
external_http_port,
),
request_body_max_bytes: 1048576,
tls: None,
};
let http_server_starter_external = dropshot::HttpServerStarter::new(
&dropshot_external_http_config,
external_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_external (unencrypted)")),
)
.map_err(|error| format!("initializing external server: {}", error))?;
let http_server_external = http_server_starter_external.start();
let http_server_internal = http_server_starter_internal.start();

Ok(Server { apictx, http_server_external, http_server_internal })
Ok(Server {
apictx,
https_server_external,
http_server_external,
http_server_internal,
})
}

/// Wait for the given server to shut down
Expand Down
13 changes: 3 additions & 10 deletions nexus/tests/config.test.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,12 @@ max_vpc_ipv4_subnet_prefix = 29
id = "e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c"
rack_id = "c19a698f-c6f9-4a17-ae30-20d711b8f7dc"

#
smklein marked this conversation as resolved.
Show resolved Hide resolved
external_ip = "127.0.0.1"
internal_ip = "127.0.0.1"
# NOTE: for the test suite, the port MUST be 0 (in order to bind to any
# available port) because the test suite will be running many servers
# concurrently.
#
[deployment.dropshot_external]
bind_address = "127.0.0.1:0"
request_body_max_bytes = 1048576

# port must be 0. see above
[deployment.dropshot_internal]
bind_address = "127.0.0.1:0"
request_body_max_bytes = 1048576
port_picker = "zero"

[deployment.subnet]
net = "fd00:1122:3344:0100::/56"
Expand Down
Loading