-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Actually provision read-only downstairs! #728
Changes from 2 commits
30e7565
3db67ec
6f7e168
03086e6
bc0ba2d
473cf73
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -259,7 +259,7 @@ impl DataFile { | |
id: request.id.clone(), | ||
name: request.name.clone(), | ||
port_number, | ||
state: State::Created, // no work to do, just run smf | ||
state: State::Requested, | ||
}; | ||
|
||
info!( | ||
|
@@ -302,26 +302,26 @@ impl DataFile { | |
let running_snapshots = | ||
inner.running_snapshots.get_mut(&request.id).unwrap(); | ||
|
||
if running_snapshots.get(&request.name).is_none() { | ||
info!( | ||
self.log, | ||
"not running for region {} snapshot {}, returning Ok", | ||
request.id.0, | ||
request.name | ||
); | ||
|
||
return Ok(()); | ||
} | ||
match running_snapshots.get_mut(&request.name) { | ||
None => { | ||
info!( | ||
self.log, | ||
"not running for region {} snapshot {}, returning Ok", | ||
request.id.0, | ||
request.name | ||
); | ||
|
||
info!( | ||
self.log, | ||
"removing snapshot {}-{}", request.id.0, request.name | ||
); | ||
return Ok(()); | ||
} | ||
|
||
running_snapshots.remove(&request.name); | ||
Some(request) => { | ||
info!( | ||
self.log, | ||
"removing snapshot {}-{}", request.id.0, request.name | ||
); | ||
|
||
if running_snapshots.is_empty() { | ||
inner.running_snapshots.remove(&request.id); | ||
request.state = State::Tombstoned; | ||
} | ||
} | ||
|
||
/* | ||
|
@@ -435,6 +435,36 @@ impl DataFile { | |
self.store(inner); | ||
} | ||
|
||
/** | ||
* Mark a particular running snapshot as failed to provision. | ||
*/ | ||
pub fn fail_rs(&self, region_id: &RegionId, snapshot_name: &str) { | ||
let mut inner = self.inner.lock().unwrap(); | ||
|
||
let mut rs = inner | ||
.running_snapshots | ||
.get_mut(region_id) | ||
.unwrap() | ||
.get_mut(snapshot_name) | ||
.unwrap(); | ||
|
||
let nstate = State::Failed; | ||
if rs.state == nstate { | ||
return; | ||
} | ||
|
||
info!( | ||
self.log, | ||
"running snapshot {} state: {:?} -> {:?}", | ||
rs.id.0, | ||
rs.state, | ||
nstate, | ||
); | ||
rs.state = nstate; | ||
|
||
self.store(inner); | ||
} | ||
|
||
/** | ||
* Mark a particular region as provisioned. | ||
*/ | ||
|
@@ -465,6 +495,51 @@ impl DataFile { | |
Ok(()) | ||
} | ||
|
||
/** | ||
* Mark a particular running snapshot as provisioned. | ||
*/ | ||
leftwo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
pub fn created_rs( | ||
&self, | ||
region_id: &RegionId, | ||
snapshot_name: &str, | ||
) -> Result<()> { | ||
let mut inner = self.inner.lock().unwrap(); | ||
|
||
let mut rs = inner | ||
.running_snapshots | ||
.get_mut(region_id) | ||
.unwrap() | ||
.get_mut(snapshot_name) | ||
.unwrap(); | ||
|
||
let nstate = State::Created; | ||
|
||
match &rs.state { | ||
State::Requested => (), | ||
State::Tombstoned => { | ||
/* | ||
* Nexus requested that we destroy this running snapshot before | ||
* we finished provisioning it. | ||
*/ | ||
return Ok(()); | ||
} | ||
|
||
x => bail!("created running_snapshot in weird state {:?}", x), | ||
} | ||
|
||
info!( | ||
self.log, | ||
"running snapshot {} state: {:?} -> {:?}", | ||
rs.id.0, | ||
rs.state, | ||
nstate, | ||
); | ||
rs.state = nstate; | ||
|
||
self.store(inner); | ||
Ok(()) | ||
} | ||
|
||
/** | ||
* Mark a particular region as destroyed. | ||
*/ | ||
|
@@ -490,6 +565,40 @@ impl DataFile { | |
Ok(()) | ||
} | ||
|
||
/** | ||
* Mark a particular running snapshot as destroyed. | ||
*/ | ||
pub fn destroyed_rs( | ||
&self, | ||
region_id: &RegionId, | ||
snapshot_name: &str, | ||
) -> Result<()> { | ||
let mut inner = self.inner.lock().unwrap(); | ||
|
||
let mut rs = inner | ||
.running_snapshots | ||
.get_mut(region_id) | ||
.unwrap() | ||
.get_mut(snapshot_name) | ||
.unwrap(); | ||
|
||
let nstate = State::Destroyed; | ||
|
||
info!( | ||
self.log, | ||
"running snapshot {} state: {:?} -> {:?}", | ||
rs.id.0, | ||
rs.state, | ||
nstate, | ||
); | ||
rs.state = nstate; | ||
|
||
// XXX shouldn't this remove the record? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. XXX How can we tell? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't remove the record, then any attempt by Nexus to provision a running snapshot with the same name for the same region wouldn't work, it would return the existing record with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I'm confused about what the XXX was for. I think removing the record is what we want, right? Otherwise we are leaking resources. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think leaving the records there is important to keep Nexus sagas idempotent. If they request that a region is to be destroyed, then the saga node reruns, we'll want the "destroy" command to not return a 404 (because the record is gone), but instead a 204 because the region is already destroyed. I've removed the XXX comments and added a note about sagas in 03086e6. |
||
|
||
self.store(inner); | ||
Ok(()) | ||
} | ||
|
||
/** | ||
* Nexus has requested that we destroy this particular region. Returns | ||
* true if the region is already destroyed, false if not destroyed | ||
|
@@ -547,11 +656,11 @@ impl DataFile { | |
} | ||
|
||
/** | ||
* The worker thread will request the first region that is in a | ||
* particular state. If there are no tasks in the provided state, | ||
* we sleep waiting for work to do. | ||
* The worker thread will request the first resource that is in a | ||
* particular state. If there are no resources in the provided state, | ||
* wait on the condition variable. | ||
*/ | ||
pub fn first_in_states(&self, states: &[State]) -> Region { | ||
pub fn first_in_states(&self, states: &[State]) -> Resource { | ||
let mut inner = self.inner.lock().unwrap(); | ||
|
||
loop { | ||
|
@@ -565,7 +674,19 @@ impl DataFile { | |
for s in states { | ||
for r in inner.regions.values() { | ||
if &r.state == s { | ||
return r.clone(); | ||
return Resource::Region(r.clone()); | ||
} | ||
} | ||
|
||
for (rid, r) in &inner.running_snapshots { | ||
for (name, rs) in r { | ||
if &rs.state == s { | ||
return Resource::RunningSnapshot( | ||
rid.clone(), | ||
name.clone(), | ||
rs.clone(), | ||
); | ||
} | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if, and when would these unwraps possibly fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent would panic, and the service would restart, scan the data file, and attempt to do the same operation. It may then succeed, or it may crash again. Too many crashes would put the service in a maintenance state and it remain disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know for sure that a panic'd agent will restart and retry? I know there are places where we don't
correctly handle panics, but I was not sure what happens to the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll verify this but I'm pretty sure.