-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp/orderbook: Improve path finding algorithm #4096
Conversation
@tamirms Nice work! In addition, the number of allocations still seems to be pretty high, which probably slows down the algorithm quite a bit. It may be possible to bring them further down without fundamental changes to the algorithm. |
yeah, there is one more change that I would like to try in a different PR. Currently we represent all assets as strings in all the graph data structures (e.g. the adjacency list). I have a feeling we can improve the performance significantly if we assign all assets integer ids and represent the assets in the graph data structures as 32 bit ints. |
I would suggest getting a memory allocations profile first to confirm it's the strings that produce so many allocations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I couple thoughts:
- Regarding the complexity I think it's close to worst-case
O(k*e^2)
not only for dense graphs but for all paths starting/ending with XLM given that there are markets to XLM from almost every other asset - so the second loop will havee
elements after the first iteration (I haven't checked the actual data so I can be wrong). But this is fine, it looks it still will be faster than the current one. - Regarding testing,
horizon-cmp
will not be enough. Would be great to have a small tool to compare with paths found by the previous version to see if they aren't worse (great if new paths are better!). - I think having an array
arr[a][b][i]
that istrue
if there's any (no matter what are the final amounts) path froma
tob
ini
jumps will make this even faster because it will remove all iterations in which reaching a final asset is not possible. Such array could be calculated on graph init and then updated on graph update.
@bartekn I think you mean regarding (2) the paths returned by this algorithm will only be better if there is an arbitrage opportunity (which I think might be rare) otherwise it should return the same. However, there will be more paths in the response for the previous version. But regardless, I agree it would be worth writing a tool to verify the paths on some more data beyond the unit tests. |
I think it should be simple to use a replace directive to import horizon/master and run a path search for random pairs with both implementations. |
Improve path finding algorithm
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
In the Stellar protocol it is possible to perform a path payment where some amount of an asset is converted into a different asset through a series of trades and then delivered to a recipient. In order to submit a path payment you need to specify the source asset, the source amount, the destination asset, and, crucially, the list of intermediate assets to reach your destination asset.
Given a source asset, source amount, and destination asset, Horizon provides an endpoint to determine the paths which would maximize the amount of the destination asset delivered to the recipient. The algorithm used to implement this endpoint models path payments as a graph problem.
The nodes in the graph represents assets. A directed edge between nodes represents a trade to sell the source asset in exchange for the destination asset. These trades can occur on the orderbook or against liquidity pools. We also add the restriction that we can perform at most
k
trades in a path payment. In other words, we only consider paths which havek
or less edges.The current implementation of the path finding algorithm does a depth first search algorithm which examines all possible simple paths (a simple path does not contain any repeated nodes) of length up to
k
from the source asset to the destination asset. Once all the paths are discovered, the algorithm picks paths which maximize the destination amount.The algorithm recurses up to a depth of
k
. At a depth ofi
, the dfs function call itself on at mostn-i-1
other nodes as a possibility for the next node in the path (wheren
is the total number of nodes in the graph). The time complexity of this algorithm is approximatelyO(n^(k+2))
because the total number of function calls is bounded byn^(k+1)
and each function call will examine at mostn
other nodes.This PR implements a more efficient algorithm using dynamic programming which is described below:
Given a source asset
s
, a source amounta
, and a destination assetd
, we want to transforma
units ofs
into the maximum amount ofd
using at mostk
trades.dp[i][j]
is the maximum amount of asseti
that can be obtained starting witha
units of assets
using at mostj
trades.When we initialize
dp
we setdp[s][0] = a
and for every other celldp[i][j] = 0
.Once we have computed
dp
for all assetsi
using at mostj
trades, we can computedp[i][j+1]
by examining all edges in the graph to see which trades would produce a higher destination amount.The algorithm runs in
O(k*e)
time wheree
is the number of edges (in the worst caseO(k * n^2)
because a dense graph can have at mostn^2
edges). The correctness of this algorithm can be proven by induction (intuitively, the optimal path of lengthk+1
must first begin with an optimal path of lengthk
to some other node and from that node we can take an edge to the destination).However, there is one problem with this algorithm. If there are arbitrage opportunities then it is possible that edges are reused in the optimal path. For example, if we have an arbitrage opportunity where we can convert USD to EUR and from EUR back to USD where we make a profit, then the optimal path is to iterate over the cycle as many times as possible and then convert into our destination asset.
This is problematic because when we execute a trade the offers that we used are no longer there. So if we try to execute the same trade again the calculation will be different. The algorithm above does not take into account that the offers can change. So we need to modify the algorithm so that we don't consider edges that would be visited more than once in a given path.
With that modification, the algorithm runs in
O(k^2 * e)
because the duplicate edge check takes at mostO(k)
time.Benchmarks on dfs:
Benchmark on new code:
It looks to be substantially better in both runtime and memory.
Known limitations
The previous dfs algorithm included much more paths in the result. This algorithm will include the optimal path using 1 edge, the optimal path using 2 edges, .., the optimal path using
k
edges.