Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPI Bidirectional Input / Output #19838

Closed
Lichtso opened this issue Sep 13, 2021 · 16 comments
Closed

CPI Bidirectional Input / Output #19838

Lichtso opened this issue Sep 13, 2021 · 16 comments
Assignees
Labels
locked issue research Open question / topic of discussion

Comments

@Lichtso
Copy link
Contributor

Lichtso commented Sep 13, 2021

So, @seanyoung and I discussed #19318 (comment) on a call and came to the conclusion that the input data slice and the returned data slice could be both one bidirectional channel.

Problem

The current implementation (#19548) uses two syscalls for returning data through a global area in the InvokeContext: One syscall in the callee to write it and one syscall in the caller to read it. It is completely independent of the input data at the moment.

There are three problems with that:

  • We want to avoid syscalls because of their context switch cost. Also, a syscall is unnecessary here because it does not need to trigger anything. The first syscall could be replaced by padding the serialization and writing it there without a syscall, the second could be achieved by modification of the existing CPI syscall which needs to happen anyways.
  • We want to avoid global data in the InvokeContext because it needs to be copied from and into the invocation stack frames.
  • A program can implicitly let the returned data fall through it (without passing it explicitly), thus have no compute meter cost associated with it. This will become a problem if the runtime needs to explicitly pass the data in turn, because we addressed the second problem (global data in InvokeContext).

The first problem can be ignored as this feature will be "born old" and replaced by the implementation in ABI v2 soon. The second however is a lot more tricky because it requires inter-operation between both ABI versions and probably copies of this returned data in between them. And the third one is up to our choosing, how to model the cost function: Should implicit passing cost as well (because it would in the new ABI)?

Proposed Solution

In ABI v2 (#19191) this would be implemented by shared memory in each invocation stack frame (the same used for input data), which both caller and callee have access to.

Let the current implementation sit as is,
but don't activate the feature gate until we have decided how to address the second and third problem.

@Lichtso Lichtso added the research Open question / topic of discussion label Sep 13, 2021
@jackcmay
Copy link
Contributor

@Lichtso

Can you elaborate on your #2 concern, what copy are you referring to, the calls to get and set?
On #3, what additional work is required that is not accounted for?

@seanyoung When do you expect to need the return data released?

@Lichtso
Copy link
Contributor Author

Lichtso commented Sep 13, 2021

@jackcmay Well, 2 and 3 are a bit blended, that is true. Both come from the differences in ABI v1 and v2, and their inter-operation.

ABI v1:
The "returned data" is passed through a global area in the InvokeContext. Two copies: One in the callee set and one in the caller get. Implicit pass / fall through is free / does not cost anything (as long as there is only ABI v1).

ABI v2:
The "returned data" is passed through an in/out area, one per invocation stack frame. No copies: The callee writes directly to that shared area and the callee reads it. There is no implicit pass / fall through as that would be a different stack frame, an explicit copy is required. Thus it costs extra to pass thing further.

Inter-operation between v1 and v2:
The "returned data" additionally needs to be passed between the global area in the InvokeContext and the shared memory on the invocation stack, depending on how it is used and implicitly passed. So under the hood this might create more copy steps which wouldn't exist if there were only one ABI version. But I am not sure yet, how exactly the v1-to-v2 or vice versa would look like.

@jackcmay
Copy link
Contributor

jackcmay commented Sep 13, 2021

I think I need more detail on the invocation stack frame, this is a memory range that caller/callee both have access to?

@seanyoung
Copy link
Contributor

As I understand it, @Lichtso design for return data in ABI v2 is that there is new return data memory range allocated for each stack frame. It is not clear to me why this cannot be shared, in a similar vein as ABI v1.

@seanyoung
Copy link
Contributor

@seanyoung When do you expect to need the return data released?

I think you mean when should be freed/cleared. After each instruction it should be cleared.

@Lichtso
Copy link
Contributor Author

Lichtso commented Sep 14, 2021

Well, in ABI v1 nothing was shared memory, everything was copied (twice) across each boundary.

The idea is for ABI v2 to allow a program to see the entire invocation stack and modify its own frame and the one above it (of the program it calls). This way programs can write data into the programs they call as input and write out data into their caller as output. Also, they can read data from the programs they call as output and their own input data from their caller. No additional copies required.

@seanyoung
Copy link
Contributor

Well, in ABI v1 nothing was shared memory, everything was copied (twice) across each boundary.

Except that return data can passed up the stack without copies, in ABI v1.

The idea is for ABI v2 to allow a program to see the entire invocation stack and modify its own frame and the one above it (of the program it calls). This way programs can write data into the programs they call as input and write out data into their caller as output. Also, they can read data from the programs they call as output and their own input data from their caller. No additional copies required.

Why can't the return buffer not be shared between callee/caller?

@Lichtso
Copy link
Contributor Author

Lichtso commented Sep 14, 2021

Except that return data can passed up the stack without copies, in ABI v1.

Yes, except for the implicit passing.

Why can't the return buffer not be shared between callee/caller?

Because if it is not in shared memory, then it needs to be copied (twice).
And we want to avoid all copies in ABI v2, as copies make up a significant part of time spent in transaction processing.
Also, we can avoid additional allocations by reusing the space reserved for input data on the invocation stack.
And the entire thing becomes more symmetric and thus less code to maintain.

@seanyoung
Copy link
Contributor

seanyoung commented Sep 14, 2021

On second thought, I'm not sure about re-using the input section as output. Consider this function:

contract c {
   function foo(bytes b) public returns (int, bytes) {
        return (1,  b);
   }
}

Now the b argument is read-only (as in it is never written to), so this can optimized to a slice rather than fully fledged vector. The slice contains pointer to bytes in input section plus length, rather than the vector which has size/length/copy of data. Solang has a special pass for this: https://github.com/hyperledger-labs/solang/blob/main/src/codegen/vector_to_slice.rs

Now with your proposed ABI v2, when we eth abi encode the return buffer, we are overwriting the input section, and by encoding the int256 we're overwriting the input bytes, and we have just clobbered the b data. We get garbage.

The vector to slice pass above could be modified to not do vector to slice conversion if a variable is used for return buffer abi encoding, but do we really want this?

Equally -- when writing rust, someone make take a slice of the input and pass that slice into something which is serializing into the return buffer. Again this will produce garbage.

@jackcmay
Copy link
Contributor

@Lichtso Are you thinking the return region will be separate from the input region?

For ABIv2 a set call could either write to the shared memory or return a mut pointer/slic to it to enable direct write. A get call could return a slice to the shared memory. Then ABIv1 could also have a shared region(s) where the set/get copies into and out of?

Or are you thinking that the program's entrypoint actually returns an optional data buffer that the runtime/helpers write into the shared region?

Also, not all programs will be using the return data and over-allocating space for it for each call might be wasteful.

@Lichtso
Copy link
Contributor Author

Lichtso commented Sep 15, 2021

Also, not all programs will be using the return data and over-allocating space for it for each call might be wasteful.
Are you thinking the return region will be separate from the input region?

That is the reason I wanted to reuse the input data area (as in not separate).
But I get that it would be less user friendly if the program needs to do in-place shuffeling / memmov inside one in/out data area.

For ABIv2 a set call could either write to the shared memory or return a mut pointer/slic to it to enable direct write. A get call could return a slice to the shared memory. Then ABIv1 could also have a shared region(s) where the set/get copies into and out of?
Or are you thinking that the program's entrypoint actually returns an optional data buffer that the runtime/helpers write into the shared region?

The first option (entrypoint provides a slice reference to write to yourself) seems more performant as it requires no extra copy step inside the helper method and no syscall.


ABIv2 could also use one global shared memory area instead of one per invocation stack frame. However, what I dislike about it is that it makes the input and output data passing artificially asymmetric.

@seanyoung
Copy link
Contributor

seanyoung commented Sep 16, 2021

Let me summarize:

  • We do not want to create a new return data memory for each stack frame, this is wasteful
  • We can either re-use the input buffer for output or share a single return buffer for all call frames
  • Re-using the input buffer will require some tricks in BPF code as the buffer is shared. This can be solved with double buffering -- which is what what we're trying to avoid in the first place
  • Re-using the input buffer means the implicit passing of return data up the stack is no longer possible
  • Using a single return buffer for call frames makes input and output "artificially asymmetric". In any ABI I've seen (for physical cpu or VM) return data and input data are just handled differently. I'm not sure this is something to aim for, although I can see the appeal.

@Lichtso
Copy link
Contributor Author

Lichtso commented Sep 22, 2021

In any ABI I've seen (for physical cpu or VM) return data and input data are just handled differently.

https://millcomputing.com/ is an (in my opinion elegant) counterexample.

But I agree with your summary and that we should leave it as it is and just map it as transaction global shared memory in ABI v2. The only thing I am still pondering about if there is a way to make it configurable / transparent to the program how the interface works.

@jackcmay
Copy link
Contributor

@Lichtso what are you aiming for when you say "configurable/transparent"?

@Lichtso
Copy link
Contributor Author

Lichtso commented Sep 22, 2021

We are still hard-coding lots of mechanisms and parameters.
And maybe there is a way to leave things open to be implemented in different ways without breaking the interface.

@Lichtso Lichtso closed this as completed Sep 24, 2021
@github-actions
Copy link
Contributor

This issue has been automatically locked since there has not been any activity in past 7 days after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked issue research Open question / topic of discussion
Projects
None yet
Development

No branches or pull requests

3 participants