Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] improve external messages and make more available to clients #4573

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 143 additions & 20 deletions common/src/api/external/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,11 @@ pub enum Error {
#[error("Internal Error: {internal_message}")]
InternalError { internal_message: String },
/// The system (or part of it) is unavailable.
#[error("Service Unavailable: {internal_message}")]
ServiceUnavailable { internal_message: String },
#[error(
"Service Unavailable {}",
message.display_internal()
)]
ServiceUnavailable { message: MessageVariant },
/// Method Not Allowed
#[error("Method Not Allowed: {internal_message}")]
MethodNotAllowed { internal_message: String },
Expand All @@ -66,6 +69,88 @@ pub enum Error {
Conflict { internal_message: String },
}

/// Represents either a message that is internal to the system or a message that
/// is intended to be returned to the client.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub enum MessageVariant {
Internal(String),
External {
message: String,
/// Internal context, or an empty string if there's no context yet.
internal_context: String,
},
}

impl MessageVariant {
fn internal(message: String) -> Self {
MessageVariant::Internal(message)
}

fn external(message: String) -> Self {
MessageVariant::External {
message: message,
internal_context: String::new(),
}
}

fn internal_context<C>(self, context: C) -> Self
where
C: Display + Send + Sync + 'static,
{
match self {
MessageVariant::Internal(msg) => {
MessageVariant::Internal(format!("{}: {}", context, msg))
}
MessageVariant::External { message, internal_context } => {
let internal_context = if internal_context.is_empty() {
context.to_string()
} else {
format!("{}: {}", context, internal_context)
};
MessageVariant::External { message, internal_context }
}
}
}

fn into_internal_external(self) -> (String, Option<String>) {
match self {
MessageVariant::Internal(msg) => (msg, None),
MessageVariant::External { message, internal_context } => {
let internal = if internal_context.is_empty() {
message.clone()
} else {
format!("{}: {}", message, internal_context)
};
(internal, Some(message))
}
}
}

// Do not implement `fmt::Display` for this enum because we don't want users to
// accidentally display the internal message to the client. Instead, use a
// private formatter.
fn display_internal(&self) -> MessageVariantDisplay<'_> {
MessageVariantDisplay(self)
}
}

struct MessageVariantDisplay<'a>(&'a MessageVariant);

impl<'a> Display for MessageVariantDisplay<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self.0 {
MessageVariant::Internal(msg) => write!(f, "(internal: {})", msg),
MessageVariant::External { message, internal_context } => {
write!(f, ": {message}")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would move the : back to the message in the ServiceUnavailable variant on line 57, in part to keep all those variants uniform, and in part so that if we ever, say, display_internal into a log line we don't get the leading colon. This is pretty nitpicky, though, and I don't feel especially strongly about it, especially if there's a reason to do the formatting here that I've totally overlooked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this significantly, and followed this approach in the new version.

if !internal_context.is_empty() {
write!(f, " (with internal context: {internal_context})")?;
}
Ok(())
}
}
}
}

/// Indicates how an object was looked up (for an `ObjectNotFound` error)
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub enum LookupType {
Expand Down Expand Up @@ -155,15 +240,28 @@ impl Error {
Error::InvalidRequest { message: message.to_owned() }
}

/// Generates an [`Error::ServiceUnavailable`] error with the specific
/// message
/// Generates an [`Error::ServiceUnavailable`] error with a specific
/// message that is sent back to the client.
///
/// This should be used for transient failures where the caller might be
/// expected to retry. Logic errors or other problems indicating that a
/// retry would not work should probably be an InternalError (if it's a
/// server problem) or InvalidRequest (if it's a client problem) instead.
pub fn unavail(message: &str) -> Error {
Error::ServiceUnavailable { internal_message: message.to_owned() }
pub fn unavail_external(external_message: impl Into<String>) -> Error {
Error::ServiceUnavailable {
message: MessageVariant::external(external_message.into()),
}
}

/// Generates an [`Error::ServiceUnavailable`] error with a specific
/// internal-only message.
///
/// This should be used for transient failures where the caller might be
/// expected to retry.
pub fn unavail_internal(internal_message: impl Into<String>) -> Error {
Error::ServiceUnavailable {
message: MessageVariant::internal(internal_message.into()),
}
}

/// Generates an [`Error::TypeVersionMismatch`] with a specific message.
Expand Down Expand Up @@ -215,12 +313,9 @@ impl Error {
Error::InternalError { internal_message } => Error::InternalError {
internal_message: format!("{}: {}", context, internal_message),
},
Error::ServiceUnavailable { internal_message } => {
Error::ServiceUnavailable { message } => {
Error::ServiceUnavailable {
internal_message: format!(
"{}: {}",
context, internal_message
),
message: message.internal_context(context),
}
}
Error::MethodNotAllowed { internal_message } => {
Expand Down Expand Up @@ -326,11 +421,16 @@ impl From<Error> for HttpError {
HttpError::for_internal_error(internal_message)
}

Error::ServiceUnavailable { internal_message } => {
HttpError::for_unavail(
Some(String::from("ServiceNotAvailable")),
Error::ServiceUnavailable { message } => {
let (internal_message, external) =
message.into_internal_external();
HttpError {
status_code: http::StatusCode::SERVICE_UNAVAILABLE,
error_code: Some(String::from("ServiceNotAvailable")),
external_message: external
.unwrap_or_else(|| "Service Unavailable".to_owned()),
internal_message,
)
}
}

Error::TypeVersionMismatch { internal_message } => {
Expand Down Expand Up @@ -381,7 +481,10 @@ impl<T: ClientError> From<progenitor::progenitor_client::Error<T>> for Error {

match rv.status() {
http::StatusCode::SERVICE_UNAVAILABLE => {
Error::unavail(&message)
// TODO: Out of an abundance of caution we treat this
// message as internal, though it's possible some such
// messages should be displayed.
Error::unavail_internal(&message)
}
status if status.is_client_error() => {
Error::invalid_request(&message)
Expand Down Expand Up @@ -493,6 +596,8 @@ impl<T> InternalContext<T> for Result<T, Error> {

#[cfg(test)]
mod test {
use crate::api::external::error::MessageVariant;

use super::Error;
use super::InternalContext;

Expand Down Expand Up @@ -553,11 +658,29 @@ mod test {
};

// test `with_internal_context()` and (separately) `ServiceUnavailable`
// variant
let error: Result<(), Error> = Err(Error::unavail("boom"));
// internal variant
let error: Result<(), Error> = Err(Error::unavail_internal("boom"));
match error.with_internal_context(|| format!("uh-oh (#{:2})", 2)) {
Err(Error::ServiceUnavailable { message }) => {
assert_eq!(
message,
MessageVariant::internal("uh-oh (# 2): boom".to_owned())
);
}
_ => panic!("returned wrong type"),
};

// test `ServiceUnavailable` external variant
let error: Result<(), Error> = Err(Error::unavail_external("boom"));
match error.with_internal_context(|| format!("uh-oh (#{:2})", 2)) {
Err(Error::ServiceUnavailable { internal_message }) => {
assert_eq!(internal_message, "uh-oh (# 2): boom");
Err(Error::ServiceUnavailable { message }) => {
assert_eq!(
message,
MessageVariant::External {
message: "boom".to_owned(),
internal_context: "uh-oh (# 2)".to_owned(),
}
);
}
_ => panic!("returned wrong type"),
};
Expand Down
8 changes: 6 additions & 2 deletions nexus/db-queries/src/db/datastore/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,9 @@ impl DataStore {
opctx.authorize(authz::Action::Query, &authz::DATABASE).await?;
let pool = self.pool.pool();
let connection = pool.get().await.map_err(|err| {
Error::unavail(&format!("Failed to access DB connection: {err}"))
Error::unavail_internal(format!(
"Failed to access DB connection: {err}"
))
})?;
Ok(connection)
}
Expand All @@ -234,7 +236,9 @@ impl DataStore {
&self,
) -> Result<DataStoreConnection, Error> {
let connection = self.pool.pool().get().await.map_err(|err| {
Error::unavail(&format!("Failed to access DB connection: {err}"))
Error::unavail_internal(format!(
"Failed to access DB connection: {err}"
))
})?;
Ok(connection)
}
Expand Down
2 changes: 1 addition & 1 deletion nexus/db-queries/src/db/datastore/sled.rs
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ impl DataStore {
.await
.map_err(|e| match e {
TxnError::CustomError(SledReservationError::NotFound) => {
external::Error::unavail(
external::Error::unavail_external(
"No sleds can fit the requested instance",
)
}
Expand Down
2 changes: 1 addition & 1 deletion nexus/db-queries/src/db/datastore/vpc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ impl DataStore {
opctx.log,
"failed to find a VNI after searching entire range";
);
Err(Error::unavail("Failed to find a free VNI for this VPC"))
Err(Error::unavail_external("Failed to find a free VNI for this VPC"))
}

// Internal implementation for creating a VPC.
Expand Down
6 changes: 3 additions & 3 deletions nexus/db-queries/src/db/queries/region_allocation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,17 +48,17 @@ pub fn from_diesel(e: DieselError) -> external::Error {
if let Some(sentinel) = matches_sentinel(&e, &sentinels) {
match sentinel {
NOT_ENOUGH_DATASETS_SENTINEL => {
return external::Error::unavail(
return external::Error::unavail_external(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one feels like the sort of thing I would not expect an operator to know about, let alone an end user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've turned this and the other two below into a generic "Not enough disk space" message.

"Not enough datasets to allocate disks",
);
}
NOT_ENOUGH_ZPOOL_SPACE_SENTINEL => {
return external::Error::unavail(
return external::Error::unavail_external(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one feels like a mix. Some of this seems useful to an operator but much of it (terms like "zpool" and "region") describes implementation details that feel like a bad UX to expose to operators and end users.

"Not enough zpool space to allocate disks. There may not be enough disks with space for the requested region. You may also see this if your rack is in a degraded state, or you're running the default multi-rack topology configuration in a 1-sled development environment.",
);
}
NOT_ENOUGH_UNIQUE_ZPOOLS_SENTINEL => {
return external::Error::unavail(
return external::Error::unavail_external(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly this feels like a call generator more than it's helpful. (I'm imagining a customer wondering "What's a zpool? Do I need more physical disks? Is it somehow allocating something from an API Disk?)

"Not enough unique zpools selected while allocating disks",
);
}
Expand Down
13 changes: 10 additions & 3 deletions nexus/src/app/external_endpoints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -737,7 +737,7 @@ fn endpoint_for_authority(
let endpoint_config = config_rx.borrow();
let endpoints = endpoint_config.as_ref().ok_or_else(|| {
error!(&log, "received request with no endpoints loaded");
Error::unavail("endpoints not loaded")
Error::unavail_external("endpoints not loaded")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels confusing and unactionable for end users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this internal-only.

})?;

// See if there's an endpoint for the requested name. If so, use it.
Expand Down Expand Up @@ -810,6 +810,7 @@ mod test {
use nexus_types::identity::Resource;
use omicron_common::api::external::Error;
use omicron_common::api::external::IdentityMetadataCreateParams;
use omicron_common::api::external::MessageVariant;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
Expand Down Expand Up @@ -1532,9 +1533,15 @@ mod test {
let result =
endpoint_for_authority(&log, &authority, rx_channel);
match result {
Err(Error::ServiceUnavailable { internal_message }) => {
Err(Error::ServiceUnavailable { message }) => {
assert_eq!(rx_label, "none");
assert_eq!(internal_message, "endpoints not loaded");
assert_eq!(
message,
MessageVariant::External {
message: "endpoints not loaded".to_owned(),
internal_context: String::new(),
}
);
}
Err(Error::InvalidRequest { message }) => {
assert_eq!(rx_label, "empty");
Expand Down
36 changes: 18 additions & 18 deletions nexus/src/app/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,9 @@ impl super::Nexus {
}

if instance.runtime().migration_id.is_some() {
return Err(Error::unavail("instance is already migrating"));
return Err(Error::unavail_external(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a 409 and not a 503.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, turned this into a 409.

"instance is already migrating",
));
}

// Kick off the migration saga
Expand Down Expand Up @@ -1583,28 +1585,26 @@ impl super::Nexus {
| InstanceState::Starting
| InstanceState::Stopping
| InstanceState::Stopped
| InstanceState::Failed => Err(Error::ServiceUnavailable {
internal_message: format!(
"cannot connect to serial console of instance in state \
{:?}",
vmm.runtime.state.0
),
}),
InstanceState::Destroyed => Err(Error::ServiceUnavailable {
internal_message: format!(
| InstanceState::Failed => {
Err(Error::unavail_external(format!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like some kind of 400, not a 503.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned this into a 400 and cleaned up the message.

"cannot connect to serial console of instance in state \
{:?}",
vmm.runtime.state.0
)))
}
InstanceState::Destroyed => {
Err(Error::unavail_external(format!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same -- 400 or 409, I'd think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

400 as well.

"cannot connect to serial console of instance in state \
{:?}",
InstanceState::Stopped),
}),
InstanceState::Stopped)))
}
}
} else {
Err(Error::ServiceUnavailable {
internal_message: format!(
"instance is in state {:?} and has no active serial console \
Err(Error::unavail_external(format!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

400 as well.

"instance is in state {:?} and has no active serial console \
server",
instance.runtime().nexus_state
)
})
instance.runtime().nexus_state
)))
}
}

Expand Down
7 changes: 4 additions & 3 deletions nexus/src/app/oximeter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -390,8 +390,8 @@ impl super::Nexus {
limit: std::num::NonZeroU32::new(1).unwrap(),
};
let oxs = self.db_datastore.oximeter_list(&page_params).await?;
let info = oxs.first().ok_or_else(|| Error::ServiceUnavailable {
internal_message: String::from("no oximeter collectors available"),
let info = oxs.first().ok_or_else(|| {
Error::unavail_external("no oximeter collectors available")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes sense for an external user, though I suspect this is only produced in calls to the internal API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change.

})?;
let address =
SocketAddr::from((info.ip.ip(), info.port.try_into().unwrap()));
Expand All @@ -403,7 +403,8 @@ impl super::Nexus {
fn map_oximeter_err(error: oximeter_db::Error) -> Error {
match error {
oximeter_db::Error::DatabaseUnavailable(_) => {
Error::ServiceUnavailable { internal_message: error.to_string() }
// XXX: should this be unavail_external?
Error::unavail_internal(error.to_string())
}
_ => Error::InternalError { internal_message: error.to_string() },
}
Expand Down
Loading
Loading