-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kitsune2 Bootstrap Server--Core #10
base: main
Are you sure you want to change the base?
Conversation
crates/bootstrap_srv/src/server.rs
Outdated
|
||
// validate expires_at is not more than 30 min after created_at | ||
if info.expires_at - info.created_at | ||
> (std::time::Duration::from_secs(60 * 30).as_micros() as i64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move this value to a const in the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 263d5ab
crates/bootstrap_srv/src/server.rs
Outdated
|
||
// validate created at is less than 3 min in the future | ||
if info.created_at | ||
> now + (std::time::Duration::from_secs(60 * 3).as_micros() as i64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move this value to a const in the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 263d5ab
crates/bootstrap_srv/src/server.rs
Outdated
|
||
// validate created at is not older than 3 min ago | ||
if info.created_at | ||
< now - (std::time::Duration::from_secs(60 * 3).as_micros() as i64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move this value to a const in the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 263d5ab
crates/bootstrap_srv/src/server.rs
Outdated
} | ||
} | ||
|
||
fn maint_worker( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read this and thought it was a typo on "main_worker" -- maybe spelling it fully would be cleaer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'll change it to prune_worker
? I hate typing maintenance, I always get the e's and a's backwards the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 263d5ab
crates/bootstrap_srv/src/space.rs
Outdated
|
||
/// A map of spaces. | ||
#[derive(Clone)] | ||
pub struct SpaceMap(Arc<Mutex<HashMap<bytes::Bytes, Space>>>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use bytes::Bytes
instead of the wrapper type you made SpaceId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I see this server / protocol being useful for WAN discovery outside of projects using kitsune itself. Perhaps it would actually be better to exist outside this monorepo, but that is more overhead for us. At the very least, I didn't want this to have any dependencies on other kitsune crates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case do you want to define a SpaceId or similar wrapper type in this crate just for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can start with just a typedef for code clarity. I don't think there's any need for a newtype yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 263d5ab
crates/bootstrap_srv/src/space.rs
Outdated
|
||
impl SpaceMap { | ||
/// Read the content of a space. | ||
pub fn read(&self, space: &bytes::Bytes) -> std::io::Result<Vec<u8>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - should this be a SpaceId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #10 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 263d5ab
Co-authored-by: mattyg <[email protected]>
crates/bootstrap_srv/src/server.rs
Outdated
fn drop(&mut self) { | ||
self.cont.store(false, std::sync::atomic::Ordering::SeqCst); | ||
for worker in self.workers.drain(..) { | ||
let _ = worker.join().expect("Failure shutting down worker thread"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value in passing the error through here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you like to pass this through? This is a drop impl, we don't have many options. Technically you're not supposed to panic in drop impls, you'll get a stack dump that is not very useful... I'm open to other suggestions... but rust's handling of panics in other threads is pretty poor, I haven't really heard of a better pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the main thread = server thread panic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needn't be the main thread if the server were passed to a different thread. But yes, whichever thread the server instance variable is dropped on will panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't that stop the whole server when it's only about dropping workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest that this doesn't have to be a drop implementation here. The main
program is doing the drop explicitly but you only need to create one instance of this per program, so it could equally be a shutdown
method that can return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drop implementation that calls shutdown().unwrap()
would be good, to make sure that the user gets some attempt at shutdown if they forget to call it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: ebac6ff
/// and having too many file handles open. | ||
/// | ||
/// Start with 10MiB? | ||
const MAX_PER_TEMPFILE: u64 = 1024 * 1024 * 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be in the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pressure leading to wanting this larger is the inefficiency of having multiple tempfiles open. This lets us store more than 10_000 entries in one file (because infos are often much less than 1024 bytes long). It doesn't feel like we should need this to be much larger.
The pressure leading to wanting this smaller is the constraints of disk space. On a 16 core system (which cloud servers will likely not have anywhere near that) the default production config will have 64 active write tempfiles, giving 64MiB. Even if we have 10 older generations still around, that's only 640 MiB. Do we think servers will ever have that little disk space? Or to look at it from the opposite perspective, let's say the cloud server provides 10GiB of disk space. We can store 10,485,760 agent records at once, even if many of those are defunct, waiting to be cleaned up.
I don't feel like dev-ops folks are going to understand the consequences of making changes to this number. Providing the option to may just be confusing, and not practically useful. What do you think?
P.S. Re-reading this post, I'm not sure I even described it without being confusing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Pruning is tested in the other PR - so just a couple of questions.
crates/bootstrap_srv/src/server.rs
Outdated
fn drop(&mut self) { | ||
self.cont.store(false, std::sync::atomic::Ordering::SeqCst); | ||
for worker in self.workers.drain(..) { | ||
let _ = worker.join().expect("Failure shutting down worker thread"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the main thread = server thread panic here?
Co-authored-by: Jost Schulte <[email protected]>
/// | ||
/// Defaults: | ||
/// - `testing = 32` | ||
/// - `production = 32` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be higher in production? Like 10k?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is just for bootstrap. I'd prefer to handle more spaces and less agents per space than the other way around. Once they are in the space, they can get additional peers by gossip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
crates/bootstrap_srv/src/parse.rs
Outdated
@@ -0,0 +1,83 @@ | |||
/// An entry with known content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to point to where this is documented from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 85dffc7
crates/bootstrap_srv/src/server.rs
Outdated
use crate::*; | ||
use tiny_http::*; | ||
|
||
/// Don't allow created_at to be greater or less than this far away from now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Don't allow created_at to be greater or less than this far away from now. | |
/// Don't allow created_at to be greater than this far away from now. |
I think the "absolute" is implied by the sentence and the less is a little confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: b407784
crates/bootstrap_srv/src/server.rs
Outdated
fn drop(&mut self) { | ||
self.cont.store(false, std::sync::atomic::Ordering::SeqCst); | ||
for worker in self.workers.drain(..) { | ||
let _ = worker.join().expect("Failure shutting down worker thread"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest that this doesn't have to be a drop implementation here. The main
program is doing the drop explicitly but you only need to create one instance of this per program, so it could equally be a shutdown
method that can return an error?
crates/bootstrap_srv/src/server.rs
Outdated
fn drop(&mut self) { | ||
self.cont.store(false, std::sync::atomic::Ordering::SeqCst); | ||
for worker in self.workers.drain(..) { | ||
let _ = worker.join().expect("Failure shutting down worker thread"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drop implementation that calls shutdown().unwrap()
would be good, to make sure that the user gets some attempt at shutdown if they forget to call it
let prune_cont = cont.clone(); | ||
let prune_space_map = space_map.clone(); | ||
workers.push(std::thread::spawn(move || { | ||
prune_worker(config, prune_cont, prune_space_map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be bad if this thread died and we didn't know. Doesn't necessarily seem like a problem for this issue but would that be a good thing to track somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we haven't adopted any tracing in this binary yet (it is less needed with server code, because we probably just want it all going to systemd log anyways) I've just eprintln-d it for now. We can do something different easily in the future: 7923234
crates/bootstrap_srv/src/server.rs
Outdated
return Err(std::io::Error::other("InvalidExpiresAt")); | ||
} | ||
|
||
// validate signature (do this at the end because it's more expensive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// validate signature (do this at the end because it's more expensive | |
// validate signature (do this at the end because it's more expensive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 68c1004
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module is a little scary to me :) The documentation makes sense, and I can follow the code, I'm not sure how much I can asses the behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It would be nice to find a way to test it, at least a little bit. Perhaps if I made the max file size configurable at least internally, and then wrote some tests that explicitly dropped read handles and checked if the files still exist on-disk? I don't know if that would be flaky or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been through this in detail now and I'm much happier. Thank you for letting me try and fail to pick it apart so I could make sense of it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working through the store, will pick up after meetings
crates/bootstrap_srv/src/store.rs
Outdated
//! open a new tempfile for writing. | ||
//! - The older read handles will persist the existence of the older tempfiles | ||
//! until the last read reference is dropped, at which point the tempfile | ||
//! will be cleaned up by the os. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the OS guaranteed to do this right away? I thought this would only happen on reboot unless the tempfile implementation in the code has logic to cleanup when dropped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tempfile drop implementation indeed cleans up the tempfile. If that fails for any reason (panic on the thread) then it is up to the os to clean it up. That's completely system specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd optionally include that in the documentation here but maybe it's too much detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some details: ea028b0
This is the core functionality of the kitsune2 bootstrap server code.
To make the PRs more manageable, I've broken them out into some preparatory work before this, and all the endpoint test after this.