Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on making Step and ProtocolContext cheaper #139

Open
martinthomson opened this issue Oct 19, 2022 · 5 comments
Open

Thoughts on making Step and ProtocolContext cheaper #139

martinthomson opened this issue Oct 19, 2022 · 5 comments

Comments

@martinthomson
Copy link
Member

We don't really know whether these structures are going to impose significant overheads just yet, but we have identified reasons for both needing some attention.

Warning: lots of rust code that I just typed into a text editor. No guarantees that any of this works.

ProtocolContext

I'm advocating for ProtocolContext being cheap so that we can make copies for every step and every record. This should improve API ergonomics. The structure (as of writing) is:

pub struct ProtocolContext<'a, N> {
    role: Identity,
    step: UniqueStepId, // or Step
    prss: &'a PrssEndpoint,
    gateway: &'a Gateway<N>,
}

That's a very small enum for role (1 bit possible, 8 bits likely), an owned string (yikes), and two references (64 bits each maybe, though this might be compressed on some platforms).

The primary offender here is step, which we'll get back to, but it seems like there is an obvious fix here. Moving the shared state into another object could make this nicer. One way to do that is to do something like this:

trait HelperState {  // name TBD
  type Network: crate::protocol::Network;
  fn role(&self) -> Identity;
  fn gateway(&self) -> &Gateway<Self::Network>;
  fn prss(&self) -> &PrssEndpoint;
}

pub struct ProtocolContext<'a, H> {
  global: &'a H,
  step: UniqueStepId, // or Step
}

impl<H: HelperState> for ProtocolContext<'_, H> {
  // This can access self.global.prss() rather than self.prss
}

This style of indirection might be necessary to allow the in-memory networking to refer to a single global state and a real networking setup to refer to a single helper-wide object. Either way, the overall cost is then dominated by the step identifier. The interface is no more complex. The only cost is following a pointer when we want something.

Step

In debugging and testing, we want something that is descriptive of the process. Our current approach describes steps with strings and then composes those together into something like a path.

We can use that process happily in debugging, but step identifiers need to be passed around a LOT and that might hurt performance when we need to shift a lot of data. What we need is a process that produces an object that can be represented in memory as a simple integer.

One way to do that is to map the path-like structure we naturally produce onto an integer, assigning each node in the path (including "directories") an integer value.

protocol => 0
protocol/a => 1
protocol/a/x => 2
protocol/a/y => 3
protocol/b => 4
protocol/c => 5
protocol/c/x => 6

Then we have a way to represent any step in that specific process with a simple and small number1. The obvious way to do this is to collect all of the steps that are used (including intermediate nodes), printing them out, (maybe) sorting them, and assigning each a number in sequence.

We can't do the number thing until we have a complete understanding of all of the steps involved in a protocol execution. For that, we can simply assemble a complete protocol, run it using generic steps, and collect the unique steps that are created during execution. From that, we can develop a complete set of states and a mapping to a number. This might be turned into a distinct type using either code generation (in build.rs say), a macro, or some combination of the two2.

From this we can produce a wrapper around an integer type that encodes this information efficiently based on calls to the Step::narrow() function. Recall that each use of narrow() takes an arbitrary enum type as input. We can then define a type that captures this information.

Taking the example above, there are two types involved in narrowing, which we might say are enums SubstepABCD and SubstepXY.

First, we establish a general interface that allows the new type we are defining to accept sub steps:

trait StepNarrow<SS> {
  fn narrow(self, sub_step: SS) -> Self;
}

Then, using whatever code generation method we decide, we define a new type of step.

trait Step: AsRef<str> {}
struct ExampleStep(u8);
impl Step for ExampleStep {}

Then we define the state transitions that are valid as follows:

impl StepNarrow<SubstepXY> for ExampleStep {
  fn narrow(self, ss: Self::SS) -> Self {
    Self(match (self.0, ss) {
      (1, SubstepXY::X) => 2,
      (1, SubstepXY::Y) => 3,
      (5, SubstepXY::X) => 6,
      _ => panic!("cannot narrow with {ss.as_ref()} from state {self.as_ref()}");
    })
  }
}

impl StepNarrow<SubstepABCD> for ExampleStep {
  fn narrow(self, ss: Self::SS) -> Self {
    assert_eq!(self.0, 0, "cannot narrow with {ss.as_ref()} from state {self.as_ref()}");
    Self(match ss {
      SubstepABCD::A => 1,
      SubStepABCD::B => 4,
      SubStepABCD::C => 5,
    }) // this form might be more common, but it might not be worth implementing this way
       // because a single form is probably easier; I only did this because I'm hand-rolling code.
  }
}

For debugging purposes we would still implement AsRef<str> for the new type (plus ensuring that it is implemented for each sub-step). For the step type, this can use a macro that just maps back to the debug-friendly strings that we use.

This requires that we make the entire protocol context generic on the protocol again, which is a bit annoying, but it should only manifest in the type definition for the protocol context. For this, we define a Step like so:

pub trait Step: AsRef<str> + Default {}
pub trait Substep: AsRef<str> {
}

In place of the struct we currently have. Then we take that implementation and we can define a blanket implementation for StepNarrow.

#[derive(Default)] // or implement it, whatever suits best
pub struct GenericStep {
  s: String,
  // + all the debug stuff
}
impl<SS: Substep + ?Sized> StepNarrow<SS> for GenericStep {
  fn narrow(&self, substep: SS) -> Self {
    Self { s: self.s + '/' + substep.as_str(), /* etc... */ }
  }
}

We can also retain the implementation of Substep for str and String to make testing simpler, though this will necessarily only work for GenericStep.

When we instantiate a version of the protocol in tests we can do something clean like this:

  let world = make_world();
  let (c0, c1, c2) = make_contexts(&world);
  // cN has a type of `ProtocolContext<'a, Arc<InMemoryEndpoint>, GenericStep>`,
  // or something along those lines.  The type won't need to be exposed in the interface.
  // This works because the test code always uses `GenericStep`

And for a real helper where we care about performance, we can instantiate a ProtocolContext using ExampleStep or the FullIpaStep or whatever we call it3. Our performance tests can define a full step mapping so that they aren't burdened with strings.

Then the protocol a) won't compile if narrow() is passed the wrong types, b) will crash if we pass the wrong values in the wrong places, and c) will not have much overhead.

Footnotes

  1. Note that we could also define an enum for this, but that might be complicated as you effectively need nodes to be Option<T> so that you can represent the parent node in the tree. Also, it's not clear that Rust will happily collapse the representation of that enum without extensive coaxing. That might be something we can explore later. That said, defining a flat enum is a distinct possibility.

  2. I tend to think that a macro is better because we can tune the list to remove nodes that are never used. Rust macros let you define a very clean interface in whatever format you prefer. The compiler provides some amount of type checking here as well, which is nice.

  3. We'll probably need a turbofish for that as we don't need to pass the starting value for the step if we require Default.

@thurstonsand
Copy link

what are you proposing is the process to assign globally unique ints to each step? would there be some kind of code generation that occurs on execution? or would you just run the protocol, have it print out all the steps, then manually assign int values to them? or something else?

@martinthomson
Copy link
Member Author

I'm proposing that we run the entire process in "debug" mode using a generic implementation that can report on what steps were created (that can be behind a feature flag as it will likely require globally-consistent, shared mutable state and really affect performance). That will produce a list of steps in a form that we can then copy into the code. We might use macros or some codegen to turn that output into the code I described.

We might automate this further, but that's where I would start.

@thurstonsand
Copy link

@akoshelev mentioned one possibility where we implement some proc macro that we would add to all steps that would accomplish the same thing at compile time. I like the idea of that since it'd be less maintenance (aside from having to maintain a proc macro)

@akoshelev
Copy link
Collaborator

Another thing we may want to implement with macro/compile time approach is to measure the depth of the circuit. MP-SPDZ seems to be using topological sort for it, we may benefit from doing it as well.

@martinthomson
Copy link
Member Author

martinthomson commented Apr 25, 2023

Another thing to consider: We might want to consider renaming Step to Gate so that we can use G for the generic parameter, rather than S.

(Edit: We can maybe s/Substep/Step/ in this case. You step from gate to gate...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants