-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wallet2, RPC: Optimize RPC calls for periodic refresh from 3 down to 1 call #8076
Conversation
afc5761
to
15f99c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not finished reviewing, but reading your coments and the zeroed timestamps makes me think that maybe it'd be both simpler and working in every case to keep, say, a circular queue (fixed max size), containing three fields: a counter, a txid and a "removed or added" flag.
Every time the txpool adds/removes a tx, it's pushed to that, with counter++. That counter is used instead of timestamps to determine where the "branch" point is. Faster and less memory. Less complex. If the client sends a counter <= the first counter in the queue (since old ones can get bumped if the queue reaches capacity), send a "overflow, request full list", or just send back full list outright. On daemon startup, init list with a set of all txes in the txpool, from counter 0. If the client sends a counter > last counter in the queue, the daemon was restarted and you need to send everything (or fail). Do you think this could work ?
@moneromooo-monero wrote:
I thought this through in the meantime and agree that using a single queue this way is indeed quite a bit simpler than the solution as implemented now, using two maps. I will therefore switch to this. However, after looking at implementations of circular queues in C++ I decided to use |
Today I started to implement @moneromooo-monero 's approach, with a single One thing is the handling of added txs when they leave the pool, becoming removed txs. There is Another thing: For finding those added txs that become removed scanning over the queue is necessary. With both added and removed txs in a single queue the queue will be longer and scanning thus slower. Thus after much thinking back and forth I think I stick with the currently implemented approach, and would be glad for more review of the code as-is. |
15f99c8
to
e097af9
Compare
I made code changes for the issues pointed out by @moneromooo-monero today. I became aware that several PRs are currently on the way that will change the minor RPC version number like this one does, but as we probably don't know in which order they will get merged I don't think it makes sense to do something preventively, and it's best for me to just wait for eventual conflicts and then resolve those in a timely manner. |
I tried to reproduce the error in the functional tests after building that GitHub documents here but failed, i.e. the "mining" functional tests ran through on my machine without any problem repeatedly. Because of this, and because of a comment by @moneromooo-monero on IRC that this might be a false alarm that already occured in the past, I propose to just ignore this error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going through this pretty slowly (not cause of your code, code is looking solid -- it's me), but wanted to point out a couple things just to get stuff going. Will do a complete review next pass.
EDIT: also 100% feel free to wait until I do a full review to jump back into this. I'm still working through it
EDIT2: I was mistaken in this initial comment, there isn't a repeating "Output found in pool" issue. Mistakenly thought I was using your code, but turns out was using my own buggy code to try to reduce the round trips to the daemon (which I think is still a valid comment, just need to set req.include_pool_info = true
)
First thing, when I have a tx in the pool and type "refresh" or "show_transfers" in monero-cli, it keeps repeating "Output found in pool", even after it has already found the output. Behavior prior to this was only showing this message once. Didn't dive deep into the source of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok some small comments, and then following up on the discussion with @moneromooo-monero from above^
One thing is the handling of added txs when they leave the pool, becoming removed txs. There is std::deque::erase but I guess it's no performance wonder.
I think his suggestion was a little different. IIUC you could use a queue and never have to actually search for and dequeue the added txs, you just need to take action when it reaches capacity and handle that case properly. It seems like something like this might be able to work (can just use an array I think):
[Step 1]
Add tx's A, B, C
Queue: [add A, add B, add C]
Counter: 3
*client requests from Counter 2, daemon responds with [add B, add C]*
[Step 2]
Remove tx B
Queue: [add A, add B, add C, remove B]
Counter: 4
*client requests from Counter 4, daemon responds with [remove B]*
Then once the array reaches capacity (let's say capacity is 4), you do something like:
[Step 3]
Remove tx A
Queue: [remove A]
Counter: 5
*some client requests from Counter 4, daemon responds with "overflow, request full list", or just send back full list outright (I like the latter)*
Thanks @j-berman for reviewing! Can you tell me when you are through so I can look at everything and improve in one go? TIA. One advance general comment: There are quite a number of code pieces that I just moved around or extracted as separate methods, but where I took care to not change them. The basic idea behind this: Make it easier for people who look at my changes to see the changes themselves, and make it easier to arrive at reasonable certainty that everything is still ok by seeing that code that worked before is probably still working - because merely moved, but unchanged. I chose this strategy being conscient that I absolutely must not introduce errors because that code is so important. Wallets scanning for transactions is a "mission critical" activity :) I mention this because at first glance I think I saw proposals for improvements in such "old" code that I did neither write myself nor change myself. But anyway, I am not really sure whether my strategy has real merit, and I see that for you as a reviewer it's of course not easy at all to see which is which - for me as the one making the changes it was much easier. If after these comments you find it worthwhile to take the chance and also improve "old" code I certainly won't oppose. |
Done reviewing @rbrunner7 :) The 2 main things in my review that I don't think had anything to do with the location of code were:
|
About your re-visit of @moneromooo-monero 's alternative storage proposal: Presented this way, that array looks like a protocol of adds and removes to the pool. This looks indeed like very simple to implement, just log the "Add" and "Remove" calls in the pool into the array and care somehow about overflow. I see a drawback however that manifest itself here:
Say a client is still at 1 and requests "Give increment since counter 1". That client will get tx B twice, and without an easy way to recognize that circumstance it will probably also process B twice, which is unfortunate. See what I mean? But anyway, there are so many different containers in the STL, all with different advantages, drawbacks and trade-offs. Plus there are many possible ways to implement those increments with them. No problem to search through "problem space" for quite some time ... Maybe I am not yet at the optimum, but I don't think my code does anything really dumb, and with usual traffic the pool will contain 100 transactions or so, which should not tax those containers too much anyway. |
Another problem with my proposed approach (that I believe moo's circular queue would solve), is that on re-size in step 3, a bunch of connected clients would then end up getting all pool tx's. Whereas in moo's circular queue, this is much less likely to happen. Once the queue fills up you do a sort of wrap around, dequeue FIFO, and enqueue to take its spot. And store the counter along with each operation in the queue. So you're not getting rid of all the older operations in the container once it fills up, just replacing one at a time. Granted this is also not the simplest to implement, and so I'm gonna re-review to see if the case is compelling enough. The reason I thought so at first was because this is hard to reason through + it can avoid the O(logn) search, but this is in memory and you're right not that many tx's anyway. So probably not a huge deal :) (edit: although, gotta consider in the worst case of pool being massive, it could be not so great) |
The crux is that the longer you look the more aspects come to light :) When I first mentioned to implement this on IRC @hyc voted to implement the necessary "incremental" data structures as new LMDB tables, and this would be clearly in line with the rest of the class that also uses a permanent store for the pool, and we wouldn't have to worry about gigantic pools at certain times as the database would handle those without breaking a sweat. But still, for your normal mode of operation with maybe 50 transactions in the pool until the next block arrives, implementing anything as database looks a bit like overkill, and those incremental lists even more so ... I did look at how to implement circular buffers after @moneromooo-monero proposed to use one, and they are of course performing very well, but those maps ordered by time are so much easier to understand (and also debug) ... |
I'd imagine implementing in LMDB + using a circular queue where elements are stored in the db instead of in memory is a reasonable solution, albeit fairly complex (no need for the table to grow in size along with the chain). It still wouldn't need to have a time complexity of O(logn) for each operation moving added tx's to removed tx's. We saw issues with nodes crashing when the network was spammed back in I think October and the pool ballooned. I think it's worth trying to make more optimal here for that circumstance alone But, if still not convinced, I will go back to reviewing :) |
Actually the issue in October I believe was large transactions with many inputs, not a large volume of transactions. So I don't think this PR as is would have really affected that circumstance in particular anyway. I imagine you're going to want to keep as is now :) |
I try to find a suitable balance between my lazyness and the undisputed right of the dev community to insist on clean and solid code from me before this get merged :) |
a5913bb
to
5f77e4b
Compare
I rebased to current master, incremented the minor daemon RPC version from 9 to 10, and implemented a proposal resulting from @j-berman 's review to re-use a method after making it a little bit more flexible with an additional parameter. |
5f77e4b
to
703f87e
Compare
I just pushed an update with significant improvements of the code based on valuable review suggestions from @j-berman. I hope they will find time to re-review this update soon to give this PR a chance of inclusion into the next hardfork. Inclusion would be quite fortunate IMHO because after the hardfork all daemons would naturally support the new capabilities of the |
703f87e
to
580bd64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking good structurally - - thank you for the changes :)
Apologies for taking a while on this. Took me a while to identify and reason through the duplicate entry issue when adding to the transient list
src/cryptonote_core/tx_pool.cpp
Outdated
{ | ||
if (include_sensitive || !details.sensitive) | ||
{ | ||
added_txs.push_back(std::move(details)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible for a duplicate to be added here, because m_added_txs_by_time
can have the same txid added twice as noted in the other comment, which is a leak on sensitive txs and messes up the API. See other comment "When you submit a tx to a node..."
6729cb0
to
9781bac
Compare
@jeffro256, quick question: Is your review still ongoing? |
@rbrunner7 Sorry, I got side tracked by other things. I'm going to go in deep on the review these next couple days. I'm mostly ready to sign off, however I believe I have a more performant incremental transaction pool scheme. Don't worry! I don't plan on introducing it this PR, but I want to see if I could make it backwards compatible with this PR. I'm also testing some spam requests against the tx pool scheme as-is |
I certainly don't want to hold it up, though. But seeing as the new release just got tagged, we probably have at least a couple weeks |
- `/getblocks.bin` respects the `RESTRICTED_TX_COUNT` (=100) when returning pool txs via a restricted RPC daemon. - A restricted RPC daemon includes a max of `RESTRICTED_TX_COUNT` txs in the `added_pool_txs` field, and returns any remaining pool hashes in the `remaining_added_pool_txids` field. The client then requests the remaining txs via `/gettransactions` in chunks. - `/gettransactions` no longer does expensive no-ops for ALL pool txs if the client requests a subset of pool txs. Instead it searches for the txs the client explicitly requests. - Reset `m_pool_info_query_time` when a user: (1) rescans the chain (so the wallet re-requests the whole pool) (2) changes the daemon their wallets points to (a new daemon would have a different view of the pool) - `/getblocks.bin` respects the `req.prune` field when returning pool txs. - Pool extension fields in response to `/getblocks.bin` are optional with default 0'd values.
uint64_t m_pool_info_query_time; | ||
std::vector<std::tuple<cryptonote::transaction, crypto::hash, bool>> m_process_pool_txs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields cache RPC request information. Put these fields inside NodeRPCProxy
and invalidate them inside of the NodeRPCProxy::invalidate
method. Different nodes may have different clock times (or steady clocks in future PRs). When the daemon is switched out, so should this info.
People, I want to declare the party, this my PR #8076, as finished. It was much fun, and thanks to everybody, but now guests are kindly and respectfully asked to leave. This PR was ready to merge about 2 months ago, but @selsta expressed some uneasyness about the small chance that performance might be worse than before instead of being better. Alright, sounded reasonable to check. @j-berman thankfully took on the job to stress-test and benchmark. Then we said, well, that's only a single source of info. Ok, thankfully @plowsof took the job to verify and replicate the results. Results were confirmed. Now it looks like I have yet again a proposal to modify on the table, one that seems to me to be mostly important not in the frame of this PR, but of a future PR of @jeffro256. But, well, for me that's now the straw that breaks the camel's back. It should be no problem at all to immediately start to improve my code and build on top of it, after it's merged. The code of my PR was reviewed, was tested, was benchmarked, was stress-tested, was reviewed again. Now, after more than 1 year, let's merge that darned thing. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as on IRC: "I'm cool with adding #8706 to master under the condition that the changes don't make it to the release branch without another PR adding code to invalidate the wallet refresh cache when switching daemons, ..."
Thanks @rbrunner7 !!!
Benchmark for refreshing wallet with 4600 small transactions in the mempool:
|
Is anyone working on invalidating the wallet refresh cache when switching daemons, so this PR can come alive in the release branch? |
@jeffro256 was planning to, in fact they even wanted to make another PR to my PR, but that was the point where I said that is something for another day, merge that poor PR already. |
I can see that |
If I remember correctly I was just lazy to add that to the parameter lists of several methods and pass it 2 or so levels down and took the easy way of a class field. Might be that after some reorganization this doesn't even make sense anymore. Feel free to improve :) |
Okay I opened #8798 which a step in the right direction |
After further review, it appears that |
@rbrunner7 you can open it against release |
This PR improves/optimizes wallet refresh, especially keeping a wallet synced continuously, by reducing the number of RPC calls to the daemon that
wallet2
needs to issue periodically from 3 down to 1. This thus implements the Monero issue #7571, duplicated as the Haveno issue haveno-dex/haveno#110.To achieve this the daemon RPC call
getblocks.bin
can now optionally give back info about the tx pool that the wallet uses for early detection of incoming payments and tracking whether transaction sending was successful. The extension to this call works in an incremental way: It does not simply return all the transactions that are currently in the pool, but a list of the transactions that entered the pool since the last call, plus a list of the ids of all transactions that were removed from the pool since that time.The optimization is fully downward-compatible:
wallet2
handles the case that it requests pool info throughgetblocks.bin
but does not get it because the daemon does not contain the code of this PR yet.There are other situations where the call is not able to give back pool info and a fallback is necessary: Producing the increments is based on data that the tx pool keeps in RAM and which is therefore lost should the daemon get restarted, and after restart it will take some minutes until it can again serve clients that may ask something like "give me what happened in the tool since 2 minutes ago, if possible".
simplewallet
currently refreshes with an average interval of 90 seconds and will profit from the incremental system occasionally, if it queries the pool more than once in the time between two blocks, and the second query does not return all the transactions in the pool but only new ones. (If a new block happened between two calls it does not help much because the pool transactions will all be different of course.)With this PR shorter average refresh intervals would be possible if wanted for
simplewallet
with hardly any increase of the amount of data transferred.monero-wallet-rpc
seems to do auto-refresh with a surprisingly low default interval of 20 seconds and should therefore profit more from the incremental system.If you test this code on testnet be aware that incremental refresh only starts after the daemon has seen at least 1 added and 1 removed transactions. Because there are so few transactions happening on testnet you will probably have to make a single transaction yourself to get things going.