exp/orderbook: Improve path finding algorithm #4096

tamirms · 2021-11-23T10:08:20Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

In the Stellar protocol it is possible to perform a path payment where some amount of an asset is converted into a different asset through a series of trades and then delivered to a recipient. In order to submit a path payment you need to specify the source asset, the source amount, the destination asset, and, crucially, the list of intermediate assets to reach your destination asset.

Given a source asset, source amount, and destination asset, Horizon provides an endpoint to determine the paths which would maximize the amount of the destination asset delivered to the recipient. The algorithm used to implement this endpoint models path payments as a graph problem.

The nodes in the graph represents assets. A directed edge between nodes represents a trade to sell the source asset in exchange for the destination asset. These trades can occur on the orderbook or against liquidity pools. We also add the restriction that we can perform at most k trades in a path payment. In other words, we only consider paths which have k or less edges.

The current implementation of the path finding algorithm does a depth first search algorithm which examines all possible simple paths (a simple path does not contain any repeated nodes) of length up to k from the source asset to the destination asset. Once all the paths are discovered, the algorithm picks paths which maximize the destination amount.

The algorithm recurses up to a depth of k. At a depth of i, the dfs function call itself on at most n-i-1 other nodes as a possibility for the next node in the path (where n is the total number of nodes in the graph). The time complexity of this algorithm is approximately O(n^(k+2)) because the total number of function calls is bounded by n^(k+1) and each function call will examine at most n other nodes.

This PR implements a more efficient algorithm using dynamic programming which is described below:

Given a source asset s, a source amount a, and a destination asset d, we want to transform a units of s into the maximum amount of d using at most k trades.

dp[i][j] is the maximum amount of asset i that can be obtained starting with a units of asset s using at most j trades.

When we initialize dp we set dp[s][0] = a and for every other cell dp[i][j] = 0.

Once we have computed dp for all assets i using at most j trades, we can compute dp[i][j+1] by examining all edges in the graph to see which trades would produce a higher destination amount.

for j := 1; j <= k; j++ {
     for _, edge := range graph {
          src, dst := edge[0], edge[1]
          srcAmount := dp[src][j-1]
          dstAmount := executeTrade(src, srcAmount, dst)
          dp[dst][j] = max(dp[dst][j-1], dp[dst][j], dstAmount) // take the max of all 3
     }
}

The algorithm runs in O(k*e) time where e is the number of edges (in the worst case O(k * n^2) because a dense graph can have at most n^2 edges). The correctness of this algorithm can be proven by induction (intuitively, the optimal path of length k+1 must first begin with an optimal path of length k to some other node and from that node we can take an edge to the destination).

However, there is one problem with this algorithm. If there are arbitrage opportunities then it is possible that edges are reused in the optimal path. For example, if we have an arbitrage opportunity where we can convert USD to EUR and from EUR back to USD where we make a profit, then the optimal path is to iterate over the cycle as many times as possible and then convert into our destination asset.

This is problematic because when we execute a trade the offers that we used are no longer there. So if we try to execute the same trade again the calculation will be different. The algorithm above does not take into account that the offers can change. So we need to modify the algorithm so that we don't consider edges that would be visited more than once in a given path.

With that modification, the algorithm runs in O(k^2 * e) because the duplicate edge check takes at most O(k) time.

Benchmarks on dfs:

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	       1	2663075250 ns/op	61439896 B/op	 2817685 allocs/op
PASS

Benchmark on new code:

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	      16	  74524903 ns/op	17888031 B/op	   65145 allocs/op
PASS

It looks to be substantially better in both runtime and memory.

Known limitations

The previous dfs algorithm included much more paths in the result. This algorithm will include the optimal path using 1 edge, the optimal path using 2 edges, .., the optimal path using k edges.

2opremio · 2021-11-23T12:55:52Z

@tamirms Nice work!

In addition, the number of allocations still seems to be pretty high, which probably slows down the algorithm quite a bit. It may be possible to bring them further down without fundamental changes to the algorithm.

tamirms · 2021-11-23T13:18:07Z

In addition, the number of allocations still seems to be pretty high, which probably slows down the algorithm quite a bit.

yeah, there is one more change that I would like to try in a different PR. Currently we represent all assets as strings in all the graph data structures (e.g. the adjacency list). I have a feeling we can improve the performance significantly if we assign all assets integer ids and represent the assets in the graph data structures as 32 bit ints.

2opremio · 2021-11-23T14:44:21Z

I would suggest getting a memory allocations profile first to confirm it's the strings that produce so many allocations.

bartekn

LGTM! I couple thoughts:

Regarding the complexity I think it's close to worst-case O(k*e^2) not only for dense graphs but for all paths starting/ending with XLM given that there are markets to XLM from almost every other asset - so the second loop will have e elements after the first iteration (I haven't checked the actual data so I can be wrong). But this is fine, it looks it still will be faster than the current one.
Regarding testing, horizon-cmp will not be enough. Would be great to have a small tool to compare with paths found by the previous version to see if they aren't worse (great if new paths are better!).
I think having an array arr[a][b][i] that is true if there's any (no matter what are the final amounts) path from a to b in i jumps will make this even faster because it will remove all iterations in which reaching a final asset is not possible. Such array could be calculated on graph init and then updated on graph update.

tamirms · 2021-11-23T18:21:17Z

@bartekn I think you mean O(k * n^2), not O(k * e^2) since if the graph is dense e = n^2, right? If so then I agree. Actually it would be O(k^2 * n^2) considering the duplicate edge check.

regarding (2) the paths returned by this algorithm will only be better if there is an arbitrage opportunity (which I think might be rare) otherwise it should return the same. However, there will be more paths in the response for the previous version. But regardless, I agree it would be worth writing a tool to verify the paths on some more data beyond the unit tests.

2opremio · 2021-11-23T19:18:04Z

I think it should be simple to use a replace directive to import horizon/master and run a path search for random pairs with both implementations.

https://www.percona.com/blog/2020/03/09/using-different-versions-of-a-package-in-an-application-via-go-modules/

Improve path finding algorithm

Improve pathfinding algorithm

1725e8c

tamirms requested review from jonjove and a team November 23, 2021 10:11

Remove unused function

265af10

bartekn approved these changes Nov 23, 2021

View reviewed changes

Merge branch 'master' into improved-pathfinding

04e76e1

tamirms merged commit 8791961 into stellar:master Nov 24, 2021

tamirms deleted the improved-pathfinding branch November 24, 2021 11:02

tamirms mentioned this pull request Nov 29, 2021

services/horizon: Improve performance of path finding endpoint #4106

Closed

5 tasks

tamirms linked an issue Nov 29, 2021 that may be closed by this pull request

services/horizon: Improve performance of path finding endpoint #4106

Closed

5 tasks

tamirms mentioned this pull request Nov 30, 2021

exp/orderbook: Exclude arbitrage paths from path finding search #4109

Merged

7 tasks

tamirms added a commit that referenced this pull request Dec 1, 2021

exp/orderbook: Improve path finding algorithm (#4096)

65fda34

Improve path finding algorithm

erika-sdf pushed a commit to erika-sdf/go that referenced this pull request Dec 3, 2021

exp/orderbook: Improve path finding algorithm (stellar#4096)

f205ac4

Improve path finding algorithm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp/orderbook: Improve path finding algorithm #4096

exp/orderbook: Improve path finding algorithm #4096

tamirms commented Nov 23, 2021 •

edited

Loading

2opremio commented Nov 23, 2021

tamirms commented Nov 23, 2021

2opremio commented Nov 23, 2021

bartekn left a comment •

edited

Loading

tamirms commented Nov 23, 2021

2opremio commented Nov 23, 2021

exp/orderbook: Improve path finding algorithm #4096

exp/orderbook: Improve path finding algorithm #4096

Conversation

tamirms commented Nov 23, 2021 • edited Loading

PR Structure

Thoroughness

Release planning

What

Known limitations

2opremio commented Nov 23, 2021

tamirms commented Nov 23, 2021

2opremio commented Nov 23, 2021

bartekn left a comment • edited Loading

Choose a reason for hiding this comment

tamirms commented Nov 23, 2021

2opremio commented Nov 23, 2021

tamirms commented Nov 23, 2021 •

edited

Loading

bartekn left a comment •

edited

Loading