Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actually provision read-only downstairs! #728

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 146 additions & 25 deletions agent/src/datafile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ impl DataFile {
id: request.id.clone(),
name: request.name.clone(),
port_number,
state: State::Created, // no work to do, just run smf
state: State::Requested,
};

info!(
Expand Down Expand Up @@ -302,26 +302,26 @@ impl DataFile {
let running_snapshots =
inner.running_snapshots.get_mut(&request.id).unwrap();

if running_snapshots.get(&request.name).is_none() {
info!(
self.log,
"not running for region {} snapshot {}, returning Ok",
request.id.0,
request.name
);

return Ok(());
}
match running_snapshots.get_mut(&request.name) {
None => {
info!(
self.log,
"not running for region {} snapshot {}, returning Ok",
request.id.0,
request.name
);

info!(
self.log,
"removing snapshot {}-{}", request.id.0, request.name
);
return Ok(());
}

running_snapshots.remove(&request.name);
Some(request) => {
info!(
self.log,
"removing snapshot {}-{}", request.id.0, request.name
);

if running_snapshots.is_empty() {
inner.running_snapshots.remove(&request.id);
request.state = State::Tombstoned;
}
}

/*
Expand Down Expand Up @@ -435,6 +435,36 @@ impl DataFile {
self.store(inner);
}

/**
* Mark a particular running snapshot as failed to provision.
*/
pub fn fail_rs(&self, region_id: &RegionId, snapshot_name: &str) {
let mut inner = self.inner.lock().unwrap();

let mut rs = inner
.running_snapshots
.get_mut(region_id)
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if, and when would these unwraps possibly fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent would panic, and the service would restart, scan the data file, and attempt to do the same operation. It may then succeed, or it may crash again. Too many crashes would put the service in a maintenance state and it remain disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for sure that a panic'd agent will restart and retry? I know there are places where we don't
correctly handle panics, but I was not sure what happens to the agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll verify this but I'm pretty sure.

.get_mut(snapshot_name)
.unwrap();

let nstate = State::Failed;
if rs.state == nstate {
return;
}

info!(
self.log,
"running snapshot {} state: {:?} -> {:?}",
rs.id.0,
rs.state,
nstate,
);
rs.state = nstate;

self.store(inner);
}

/**
* Mark a particular region as provisioned.
*/
Expand Down Expand Up @@ -466,7 +496,54 @@ impl DataFile {
}

/**
* Mark a particular region as destroyed.
* Mark a particular running snapshot as created.
*/
leftwo marked this conversation as resolved.
Show resolved Hide resolved
pub fn created_rs(
&self,
region_id: &RegionId,
snapshot_name: &str,
) -> Result<()> {
let mut inner = self.inner.lock().unwrap();

let mut rs = inner
.running_snapshots
.get_mut(region_id)
.unwrap()
.get_mut(snapshot_name)
.unwrap();

let nstate = State::Created;

match &rs.state {
State::Requested => (),
State::Tombstoned => {
/*
* Nexus requested that we destroy this running snapshot before
* we finished provisioning it.
*/
return Ok(());
}

x => bail!("created running_snapshot in weird state {:?}", x),
}

info!(
self.log,
"running snapshot {} state: {:?} -> {:?}",
rs.id.0,
rs.state,
nstate,
);
rs.state = nstate;

self.store(inner);
Ok(())
}

/**
* Mark a particular region as destroyed. Do not remove the record: it's
* important for calls to the agent to be idempotent in order to safely be
* used in a saga.
*/
pub fn destroyed(&self, id: &RegionId) -> Result<()> {
let mut inner = self.inner.lock().unwrap();
Expand All @@ -484,7 +561,39 @@ impl DataFile {
);
r.state = nstate;

// XXX shouldn't this remove the record?
self.store(inner);
Ok(())
}

/**
* Mark a particular running snapshot as destroyed. Do not remove the
* record: it's important for calls to the agent to be idempotent in order
* to safely be used in a saga.
*/
pub fn destroyed_rs(
&self,
region_id: &RegionId,
snapshot_name: &str,
) -> Result<()> {
let mut inner = self.inner.lock().unwrap();

let mut rs = inner
.running_snapshots
.get_mut(region_id)
.unwrap()
.get_mut(snapshot_name)
.unwrap();

let nstate = State::Destroyed;

info!(
self.log,
"running snapshot {} state: {:?} -> {:?}",
rs.id.0,
rs.state,
nstate,
);
rs.state = nstate;

self.store(inner);
Ok(())
Expand Down Expand Up @@ -547,11 +656,11 @@ impl DataFile {
}

/**
* The worker thread will request the first region that is in a
* particular state. If there are no tasks in the provided state,
* we sleep waiting for work to do.
* The worker thread will request the first resource that is in a
* particular state. If there are no resources in the provided state,
* wait on the condition variable.
*/
pub fn first_in_states(&self, states: &[State]) -> Region {
pub fn first_in_states(&self, states: &[State]) -> Resource {
let mut inner = self.inner.lock().unwrap();

loop {
Expand All @@ -565,7 +674,19 @@ impl DataFile {
for s in states {
for r in inner.regions.values() {
if &r.state == s {
return r.clone();
return Resource::Region(r.clone());
}
}

for (rid, r) in &inner.running_snapshots {
for (name, rs) in r {
if &rs.state == s {
return Resource::RunningSnapshot(
rid.clone(),
name.clone(),
rs.clone(),
);
}
}
}
}
Expand Down
Loading