Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get_processed_inner_instruction syscall #22515

Closed
wants to merge 3 commits into from

Conversation

jackcmay
Copy link
Contributor

Problem

There are scenarios where instructions (top-level or via CPI) would like to know what the last instruction processed was. Some examples are verification of assert instructions or to check for memo instructions. Programs can already check tx level instructions via the Sysvar1nstructions1111111111111111111111111 but there is no way to get information about what inner instructions have been processed.

Summary of Changes

Add a syscall that allows a program to query what inner instructions have been successfully processed.

Fixes #22437

programs/bpf/tests/programs.rs Outdated Show resolved Hide resolved
program-runtime/src/invoke_context.rs Outdated Show resolved Hide resolved
programs/bpf/rust/sibling_instruction/src/lib.rs Outdated Show resolved Hide resolved
@mvines mvines added the v1.9 label Jan 14, 2022
@jackcmay jackcmay force-pushed the sibling-instructions branch from 86b997e to 07553d2 Compare January 14, 2022 21:36
CriesofCarrots
CriesofCarrots previously approved these changes Jan 14, 2022
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my part, looks good once CI is happy

@mergify mergify bot dismissed CriesofCarrots’s stale review January 14, 2022 22:59

Pull request has been modified.

@@ -2636,6 +2659,8 @@ fn call<'a, 'b: 'a>(
)
.map_err(SyscallError::InstructionError)?;

invoke_context.add_sibling_instruction(instruction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • Can we use CompiledInstruction which has account indices, instead of Pubkeys? (better for ABIv2).
  • How about referencing the recorded instructions by index / a range?

Copy link
Contributor Author

@jackcmay jackcmay Jan 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions aren't always recorded, are you thinking we should always turn on full instruction recording?

I'll take a look at using CompiledInstruction, are you referring to what the syscall returns, or what is stored internally in the invoke_context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring should have made it cheap enough to always run it. And this PR here comes close to "always on" as well. So why not have them both use an unified underlying mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll take a look at that, this implementation is pretty cheap as is, but we can probably do better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to keep in mind is that the instruction recorder records instructions before they are invoked, so it includes instructions that haven't been fully processed yet which also results in a different order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recorded inner instructions are defined being invoked, not successfully processed:
https://docs.solana.com/developing/clients/jsonrpc-api#inner-instructions-structure

@jackcmay jackcmay force-pushed the sibling-instructions branch from c2b15a0 to 1db88a8 Compare January 15, 2022 01:22
@codecov
Copy link

codecov bot commented Jan 15, 2022

Codecov Report

Merging #22515 (b1aab0d) into master (6edeed8) will increase coverage by 0.4%.
The diff coverage is 81.8%.

@@            Coverage Diff            @@
##           master   #22515     +/-   ##
=========================================
+ Coverage    81.1%    81.5%   +0.4%     
=========================================
  Files         560      553      -7     
  Lines      151206   149771   -1435     
=========================================
- Hits       122633   122127    -506     
+ Misses      28573    27644    -929     

@jackcmay jackcmay force-pushed the sibling-instructions branch from 1db88a8 to b58222f Compare January 15, 2022 20:20
@jackcmay jackcmay changed the title Add get_sibling_instruction syscall Add get_processed_inner_instruction syscall Jan 15, 2022
@jackcmay jackcmay force-pushed the sibling-instructions branch 2 times, most recently from 9f79099 to 6d7366c Compare January 18, 2022 01:19
@jackcmay jackcmay force-pushed the sibling-instructions branch 3 times, most recently from 72812a3 to d3441b9 Compare January 19, 2022 06:14
/// when calling the `sol_get_processed_inner_instruction` syscall.
#[repr(C)]
#[derive(Default, Debug, Clone, Copy)]
pub struct InnerMeta {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: InnerMeta seems a little too generic. ProcessedInnerInstructionMeta?

mvines
mvines previously approved these changes Jan 19, 2022
@@ -197,6 +197,7 @@ pub struct InvokeContext<'a> {
pub timings: ExecuteDetailsTimings,
pub blockhash: Hash,
pub lamports_per_signature: u64,
pub processed_inner_instructions: Vec<(usize, Instruction)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go in TransactionContext, because it is not runtime internal and should be exposed to ABIv2 programs as well. I am preparing a PR that moves the InstructionRecorder there, on which you could rebase this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this today, feel free to move it after

),
result
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets convert from CompiledInstruction to Instruction on demand here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, we can make those changes after this pr lands.

@mergify mergify bot dismissed stale reviews from Lichtso and mvines January 19, 2022 16:50

Pull request has been modified.

@mvines
Copy link
Member

mvines commented Jan 19, 2022

I think there is a "B" missing in that example like so: [(1, F), (1, E), (2, D), (1, C), (0, B), (2, Z), (1, Y), (1, X), (0, A)] , right?

right, i missed (0,B) there :)

@Lichtso
Copy link
Contributor

Lichtso commented Jan 19, 2022

As I said, we should always include all instructions. It is easy to filter out the unfinished ones if a program does not want that.

@mvines
Copy link
Member

mvines commented Jan 19, 2022

Could pass a max depth:

👍🏼

@mvines
Copy link
Member

mvines commented Jan 19, 2022

As I said, we should always include all instructions. It is easy to filter out the unfinished ones if a program does not want that.

The future inner instructions are not known though, so the runtime could only return the unfinished transaction instructions and we already have the instructions sysvar for can be used for that purpose.

@jackcmay
Copy link
Contributor Author

I'm a little uneasy about [(1, F), (1, E), (2, D), (1, C), (0, B), (2, Z), (1, Y), (1, X), (0, A)] where all the instructions are processed but B, and the caller has to distinguish that B is the not-yet-fully-processed instruction in the list.

@mvines
Copy link
Member

mvines commented Jan 19, 2022

I'm a little uneasy about [(1, F), (1, E), (2, D), (1, C), (0, B), (2, Z), (1, Y), (1, X), (0, A)] where all the instructions are processed but B, and the caller has to distinguish that B is the not-yet-fully-processed instruction in the list.

🤔 yeah

/// A
/// A -> X
/// A -> Y -> Z
///                ** Z calls `get_processed_instruction()` **
/// B

At this call site, I think the result would currently be [(1, Y), (1, X), (0, A)] or [(2, Z), (1, X), (0, A)]? If only fully processed instructions were returned then Z would see a result of [(1,X)]. The former seems neat as it enables an invoked program to figure out who's calling it in all cases

@jackcmay
Copy link
Contributor Author

jackcmay commented Jan 19, 2022

At this call site, I think the result would currently be [(1, Y), (1, X), (0, A)] or [(2, Z), (1, X), (0, A)]?

Current result would be [(1, X)], proposed result would be: [(1, Y), (1, X), (0, A)] where both (1, Y), and (0, A) have not been fully processed

@mvines
Copy link
Member

mvines commented Jan 19, 2022

Result would be empty if the syscall was called before A invoked X too?

@jackcmay
Copy link
Contributor Author

Result would be empty if the syscall was called before A invoked X too?

The current implementation yes, the proposed would return [(0, A)]

@jackcmay
Copy link
Contributor Author

@mvines @CriesofCarrots , given:

/// A
/// A -> X
/// A -> Y -> Z
///                ** Z calls `get_processed_instruction()` **
/// B

If SPL token is Z, will returning [] be expected, or does SPL token need X? And if needs X, does it also need A?

@jackcmay
Copy link
Contributor Author

@Lichtso One of the impedments to relying directly on InstructionRecorder and something we need to fix before ABIv2 is this: #19467

@jackcmay jackcmay force-pushed the sibling-instructions branch from 0d1bbe5 to b1aab0d Compare January 19, 2022 23:04
@mvines
Copy link
Member

mvines commented Jan 19, 2022

@mvines @CriesofCarrots , given:

/// A
/// A -> X
/// A -> Y -> Z
///                ** Z calls `get_processed_instruction()` **
/// B

If SPL token is Z, will returning [] be expected, or does SPL token need X? And if needs X, does it also need A?

The case we'd need is when SPL Token is Y and it needs to see X.

But for the sake of discussion, if SPL token was Z (or Y) then I'd expect it to see X since X has completed. I think what we're trying to determine is whether Z would also see Y and A since those instructions haven't completed yet.

I'm leaning towards including Y and A, even though they are in progress, since this allows Z to figure out who's calling it and I think that enables some interesting use cases (programs changing their behaviour based on caller). But perhaps that's actually an anti-pattern/footgun that we shouldn't enable?

@jackcmay
Copy link
Contributor Author

The reason I ask is that X isn't a direct sibling of Z within Y.

Seems like a separate syscall should be added, something like get_invoke_stack to get the callers, and later get_all_instructions to get all the instructions called so far. Some of these could be utility routines over the updated InstructionRecorder that may be exposed in ABIv2

@mvines
Copy link
Member

mvines commented Jan 20, 2022

Got it. Yeah not opposed to deferring a more general solution to later. SPL Token just needs to inspect the previous sibling instruction at its depth.

@CriesofCarrots
Copy link
Contributor

Hey, all. Sorry I was offline for most of this dialog. Is the PR up to date with the currently proposed implementation? If so, I'll catch up with a close read.
Based on above, it sounds like it might be possible to have this syscall work for top-level instructions after all? 🙏 I hope so. I still think it will be better to have the api consistent for instructions at any depth in the call stack.

@jackcmay
Copy link
Contributor Author

@CriesofCarrots Yeah, I see the desire for this api to include top-level instructions and that becomes non-trivial to define the desired behavior. For example, if we include top-level instructions do we also include their inner instructions? Do we include all instructions a depth minus? Do we include instructions currently being processed and let the program figure out via depth if they are in-progress or not? A lot of info was shared in this thread, reading through that would be helpful. My current thinking is for this API to include all inner instructions of the current top-level and take a max depth. Then we could provide a separate syscall to get the call stack. I want to make sure we cover the use case for SPL but also provide a clear API for other use cases without clouding or returning the world. I think there is also an argument to include all instructions at depth or below that have been processed which would include processed top-level instructions.

@CriesofCarrots
Copy link
Contributor

CriesofCarrots commented Jan 25, 2022

Okay, I've read through everything carefully now.
If we take the general approach, I think that we do need to include instructions currently being processed. Otherwise, it may not be clear which lower level ix called a higher one. eg, in the example way above, the output [(1, F), (1, E), (2, D), (1, C), (2, Z), (1, Y), (1, X), (0, A)] could be from either of these call trees (or one of many other combinations):

/// A
/// A -> X
/// A -> Y -> Z
/// B -> C -> D
/// B -> E
/// B -> F
/// ** B calls `get_processed_instruction()` here **
/// G

/// A
/// A -> X
/// A -> Y -> Z
/// A -> C -> D
/// A -> E
/// A -> F
/// B
/// ** B calls `get_processed_instruction()` here **
/// G

I think that ambiguity could be a big footgun.

Yeah not opposed to deferring a more general solution to later. SPL Token just needs to inspect the previous sibling instruction at its depth.

That said, this ^ . It does appeal to me to implement the simpler approach now, and offer the more general utility later if needed. That simpler implementation being:

/// A
/// A -> X
/// A -> Y -> Z
/// B -> C -> D
/// B -> E
/// B -> F
/// ** F calls `get_processed_instruction()` here **
/// G
/// ** G calls `get_processed_instruction()` here **

F sees: [(1, E), (1, C)] or [E, C]
B G sees: [(0, B), (0, A)] or [B, A]

@mvines
Copy link
Member

mvines commented Jan 26, 2022

Maybe we need

/// A
/// A -> X
/// A -> Y -> Z
/// B -> C -> D
/// B -> E
/// B -> F
/// ** F calls `get_processed_instruction()` here **
/// G
/// ** G calls `get_processed_instruction()` here **

F sees: [(1, E), (1, C)] or [E, C]
B sees: [(0, B), (0, A)] or [B, A]

"B sees" -> "G sees" right? Sounds good to me if so!

But maybe we need a call with everybody on the PR to work out any remaining design discussion with our vocal folds?

@CriesofCarrots
Copy link
Contributor

"B sees" -> "G sees" right? Sounds good to me if so!

Oops, yep, edited in my comment

But maybe we need a call with everybody on the PR to work out any remaining design discussion with our vocal folds?

Happy to oscillate some vocal folds. I think Jack is unavailable the rest of this week, so maybe Monday?

@jackcmay
Copy link
Contributor Author

In the examples above I'm inclined to return [] from call depth 0 since there are no sibling instructions (just top-level instructions). The main reason for doing that is I think it is confusing to return F sees: [E,C] and not include the top-level successfully processed instruction A ([E, C, A]). A has been processed and so it come down to what is the definition of this API, is it "sibling instructions" or "processed instructions", or maybe going with @CriesofCarrots 's suggestion "processed instructions at my call depth"?

@mvines
Copy link
Member

mvines commented Jan 31, 2022

The previously (successfully -- implicit) processed instructions at the caller's depth is what we're immediately after.
At depth 0, this set would be the previously processed top-level instructions.

@jackcmay
Copy link
Contributor Author

jackcmay commented Feb 1, 2022

Replaced by #22859

@jackcmay
Copy link
Contributor Author

jackcmay commented Feb 4, 2022

Replaced by #22859

@jackcmay jackcmay closed this Feb 4, 2022
@jackcmay jackcmay deleted the sibling-instructions branch April 20, 2022 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability for programs to check the processed instruction stack
5 participants