Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use pagination to support more than 200 silos and TLS certs #7289

Closed
davepacheco opened this issue Dec 20, 2024 · 0 comments · Fixed by #7291
Closed

use pagination to support more than 200 silos and TLS certs #7289

davepacheco opened this issue Dec 20, 2024 · 0 comments · Fixed by #7291
Labels
good first issue Issues that are good for learning the codebase
Milestone

Comments

@davepacheco
Copy link
Collaborator

@augustuswm reported seeing this error in colo:

13:21:12.456Z ERRO 1cfdb5b6-e568-436a-a85f-7fecf1b8eef2 (ServerContext): reading endpoints: expected at most 200 certificates, but found at least 200.  TLS may not work on some Silos' external endpoints.
    background_task = external_endpoints
    file = nexus/src/app/external_endpoints.rs:546

The code gives a lot of the context. This is what Nexus is trying to do:

/// Read the lists of all Silos, external DNS zones, and external TLS
/// certificates from the database and assemble an `ExternalEndpoints` structure
/// that describes what DNS names exist, which Silos they correspond to, and
/// what TLS certificates can be used for them
// This structure is used to determine what TLS certificates are used for
// incoming connections to the external console/API endpoints. As such, it's
// critical that we produce a usable result if at all possible, even if it's
// incomplete. Otherwise, we won't be able to serve _any_ incoming connections
// to _any_ of our external endpoints! If data from the database is invalid or
// inconsistent, that data is discarded and a warning is produced, but we'll
// still return a usable object.

This code predates Paginator, which makes it easy to do client-side database query pagination. So we (I think it was me) picked what seemed like an astronomical limit for the number of silos and had the code use that and produce this error message if we ever hit it:

// We will not look for more than this number of external DNS zones, Silos,
// or certificates. We do not expect very many of any of these objects.
const MAX: u32 = 200;

I believe it'd be pretty straightforward to switch this to using Paginator and remove these limits here. There's not much risk to this code path to allowing this to be quite a bit larger. It's a background task and not latency-sensitive. (We still don't want to be doing huge database queries, though, hence the pagination.)

@davepacheco davepacheco added the good first issue Issues that are good for learning the codebase label Dec 20, 2024
@davepacheco davepacheco added this to the 13 milestone Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Issues that are good for learning the codebase
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant