-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose transaction queue information to downstream clients #2920
Comments
I'm open to suggestions on how to make this better. Exposing more complexity to clients is not necessarily the right way to do this. I think at the end of the day, clients don't (should not) care to know why things fail at the local node level: they just want to be able to make sane assumptions on what should they do (including nothing). Lets first start to try to answer the easier question, which is from the network point of view, and this is a tri state: The problem is that the cases where we can answer "yes" or "no" at the network level (as in "I can see inside the memory of all computers on that network") is fairly limited because transitions (each node's state a second later) are randomized by a bunch of factors (new transactions submited to the network, how network traffic flows, when nodes receive messages, etc). List of cases where we know "for sure" that a transaction is not going to be included:
List of cases where we know "for sure" that a transaction is going to be included:
I think we'd have to settle for some probabilistic approach already at the network level. For example if a transaction transaction (and its predecessors) is one of the most expensive transactions in the validator that will introduce the tx set to the network (as defined by the nomination protocol) and we're close to nomination time, we're pretty confident that the transaction will be included in the next ledger. Obviously all clients can't just talk to validators like this, and unless we can get high probability answers to those questions we're back to "maybe" type of answers that are not super useful to tease out. In any case, I'll close this comment with how I started: I'm open to suggestions on how to make this better. |
@MonsieurNicolas thanks for this reply. I think this is an important discussion to have.
I somewhat agree with the sentiment of not giving devs too much rope to hang themselves with, but in this case we're giving them close to nothing. Yes, Stellar is a non-trivial distributed system - but trying to obfuscate that with an overly simplified tx submission contract is backfiring - tx submitters are at loss when attempting to debug. I argue that simplification should happen in the upper levels of the stack (horizon/sdk/wallets). Core should output known facts - for example, if a tx was dropped from the queue because it became invalid - that's a useful fact that clients have no access to right now.
Fully agreed. I'm not advocating for core to try and guesstimate network level behavior - just to output it's own local node knowledge. Downstream clients should then build their own heuristics. We can keep horizon tx submission opinionatedly simple but give developers that are interested access to more fine grained information As mentioned before, there's a significant amount of prior art in the blockchain space of queryable transactions queues (or "mempools"). Stellar is "cursed" by being much simpler and faster so clients usually don't need these capabilities. But when they do - there is very little. As a bonus, I think that exposing this information will allow ecosystem devs to build better network analysis tools. |
The way that transaction submission is implemented in Horizon and promoted in guides/docs it is described like any other HTTP call to a non-distributed application where a result is determined. In some ways this is a lie, because if you cover all edge cases that is never sufficient and Horizon provides very little help past the happy path. It would be great to keep it the way it is if we can improve the handling of edge cases.
Do submitters of transactions want their transactions to hang around in the queue? Why do transactions get kept in the transaction queue? What problem is that solving? Is it creating more problems than it is solving? I think some of the pain comes from transaction being submitted, failing or timing out, then finding out later it succeeded. Obviously there's no way to 100% prevent that because anyone who saw the transaction could resubmit it even if it didn't make it into a ledger, but we could limit the problem under normal operation by not keeping transactions in the queue. If transactions are not kept in the queue this also limits the questions @tomerweller raised. Is it in the queue? No. If submission failed, its not in the queue. We could also maybe limit these cases by encouraging small single ledger sized timebounds in our guides and SDKs, but even after conversation with @jonjove and others I don't feel like I completely understand how to set a small enough timebounds to guarantee it will only get into the 1 or 2 ledgers.
It would be great if Horizon knew this and could immediately respond to the submitter saying it is rejected because it is banned.
What does this mean? We should probably work our way backwards and propose the changes we want to the Horizon transaction submission API, and then identify what information we need from Core. cc @ire-and-curses @stellar/horizon-committers |
Thanks @leighmcculloch. I agree that the immediate goal should be improving the Horizon tx-submission API. The low hanging fruit that I see
With that said, I would consider taking a bottom up approach and just expose all the information that stellar-core has about the queue. This can allow alternative tx submission APIs to have their own heuristics and facilitate external analysis tools. Interested to hear what @jonjove, @ire-and-curses and @bartekn think. |
@tomerweller I think "low hanging fruit" 1 and 2 are interesting ideas.
I kind of see this from the exact opposite perspective. Let's get a concrete vision of what we want Horizon to implement (as a reference and test case, not because there can't be other transaction submission APIs) and then figure out how to expose the information. I don't like the idea of exposing everything with the hope that people will eventually consume it, because it's hard to evaluate the merit of that work relative to other work (one could argue that it's useless since no one is planning to consume it immediately, or one could argue that the upside is huge because people might make great use of the information) and prioritize it. Perhaps let's start from what @ire-and-curses @bartekn would actually want to implement in Horizon in the short term? Then we can make a plan for exposing the right information in the right way. |
I realized the same thing while discussing with @Shaptic in parallel to this issue. Right now clients are not doing the right thing when hitting much simpler (and more likely to happen) glitches than surge pricing (like what can happen when calling Horizon from a mobile device). What I suspect will happen is that dealing with "surge pricing" related issues will be much simpler to integrate safely after that iteration. |
We spent some time discussing this in the Horizon team. We identified some problems with the Horizon API contract, as well as places where Core's non-determinism remains a problem. The way I think about it, a user should always be able to make clear decisions if they always use sequence numbers as the source of truth, bounded in time. To do that safely they need confirmation on the ledger. When Horizon responds to a client's tx submission, they get a kind of "optimised" response - they are told earlier than is strictly correct about what's going on, and they can sometimes infer the wrong things. The problem is, that response hides a lot of complexity, and when there is no response (i.e. timeout), people sometimes are at a loss to proceed. Behind the scenes many things could be happening. The other problem is that in non-surge pricing times, dealing exclusively with the response (and not checking the ledger) can get you a performant solution. But that strategy doesn't hold up during surge pricing. This is compounded by certain historical decisions in the Horizon API, such as queuing of submissions (so that you can submit N, N+1, N+2 seqnums even out of order, and they can be submitted and succeed), and a lot of latitude in how you craft submissions (e.g. allowing txs with no timebounds). Taking this view, I think what we want is
I realise that's pretty hand-wavy but I think it's a reasonable way to describe the abstract problem. I'll let @bartekn post a specific detailed proposal that we can use as a starting point for implementation discussions. Then I think we should meet. We'll need expertise from both Core and Horizon to work through edge cases and understand where we can make tradeoffs on the design, and Partner Engineering to keep our use cases honest. |
Thanks @ire-and-curses! I was thinking about how we can improve the txsub experience for clients. I started with listing common mistakes users are making when submitting transactions:
With this in mind I came up with an idea for the extra txsub endpoint designed for very simple apps like web/mobile wallets (ex. our Account Viewer) with built-in checks that would prevent many common mistakes in transaction submission. In general:
An extra table to temporarily store tx hashes
How does it protect from common mistakes?
There’s one change on the client side that must be added for this to work: clients should keep track of the last txsub by the user (they need to save tx envelope somewhere, can be even a cookie) and disallow sending another tx until the one last is resolved. |
@MonsieurNicolas and I discussed this offline. We think the next step is to lay out a flow diagram that captures the different core, Horizon and SDK paths. @MonsieurNicolas is going to take a first pass at that diagram from the core perspective, aiming for something by the end of the month. |
OK, so I think we can discuss this some more. I put together a diagram that describes more or less the existing flow for submitting transactions in a safe way. This version is supposed to deal with the following failures correctly:
Here is a slightly scary dot file with my findings (there are comments both in the diagram and in the source code). I had to add a new field returned by core's tx endpoint and by Horizon (called I also don't know if clients have access to "other errors" returned by core's A few observations:
I think that the client side code can be simplified a lot if the transaction submission flow was "first classed" from an API flow point of view. Here are some ideas:
|
This is already the case for simple transactions where only the state of a single account is required. Anything beyond a single account or list of accounts the Horizon API doesn't support querying in a single call. Given that the state of the ledger is consistent for a single ledger it would be interesting if Horizon supported querying any data at specific ledgers, but that's way out-of-scope for what Horizon does today I think, and I'm not sure it is required for the majority of cases.
This could complicate Horizon's submission process, and I think tx submission through Horizon is already more complex than it should be. We need to find a way to simplify it. Applications that need to deal with tx submission at any non-trivial scale will probably want to submit using an asynchronous submission pattern, e.g. something akin to stellar/go#3564, and use patterns such as:
After considering the above, I'm not sure visibility into the queue would really help me write an application to handle these edge cases. While a transaction is valid, even if it isn't in the queue, and even if I haven't resubmitted it, there is technically nothing stopping it showing up again due to some other actor on the network. |
This is not the case: today's |
Are you meaning that the information might be delayed due to the time it takes to ingest, or do you mean there isn't information like the ledger seq number in the account response? |
there's a |
Ah cool @tamirms : Also, we would need to add this information to core's responses note that this is especially important on failure to propagate core's ledger number not Horizon's ingested number (this is when transaction submission errors out). |
@MonsieurNicolas I don't think the header is documented anywhere. We added the header for testing purposes. When we have a release candidate we deploy it to staging and compare responses between staging and production to ensure they have the same response if both responses have the same |
Problem
Transaction submission is currently a black box. From the moment a transaction is submitted until it makes it's way to consensus the client can not reason about it's status. If it doesn't make it to consensus then things get even more complicated.
Core to the rescue
However, core can answer some questions to help clients reason about the status of transactions in flight. For example:
Implementation Proposal
TxQueue
events which Horizon ingestsTxQueue
events for a specific transactionNotes
bitcoin-core
andgeth
come to mind)The text was updated successfully, but these errors were encountered: