Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp/orderbook: Improve path finding algorithm #4096

Merged
merged 3 commits into from
Nov 24, 2021

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Nov 23, 2021

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

In the Stellar protocol it is possible to perform a path payment where some amount of an asset is converted into a different asset through a series of trades and then delivered to a recipient. In order to submit a path payment you need to specify the source asset, the source amount, the destination asset, and, crucially, the list of intermediate assets to reach your destination asset.

Given a source asset, source amount, and destination asset, Horizon provides an endpoint to determine the paths which would maximize the amount of the destination asset delivered to the recipient. The algorithm used to implement this endpoint models path payments as a graph problem.

The nodes in the graph represents assets. A directed edge between nodes represents a trade to sell the source asset in exchange for the destination asset. These trades can occur on the orderbook or against liquidity pools. We also add the restriction that we can perform at most k trades in a path payment. In other words, we only consider paths which have k or less edges.

The current implementation of the path finding algorithm does a depth first search algorithm which examines all possible simple paths (a simple path does not contain any repeated nodes) of length up to k from the source asset to the destination asset. Once all the paths are discovered, the algorithm picks paths which maximize the destination amount.

The algorithm recurses up to a depth of k. At a depth of i, the dfs function call itself on at most n-i-1 other nodes as a possibility for the next node in the path (where n is the total number of nodes in the graph). The time complexity of this algorithm is approximately O(n^(k+2)) because the total number of function calls is bounded by n^(k+1) and each function call will examine at most n other nodes.

This PR implements a more efficient algorithm using dynamic programming which is described below:

Given a source asset s, a source amount a, and a destination asset d, we want to transform a units of s into the maximum amount of d using at most k trades.

dp[i][j] is the maximum amount of asset i that can be obtained starting with a units of asset s using at most j trades.

When we initialize dp we set dp[s][0] = a and for every other cell dp[i][j] = 0.

Once we have computed dp for all assets i using at most j trades, we can compute dp[i][j+1] by examining all edges in the graph to see which trades would produce a higher destination amount.

for j := 1; j <= k; j++ {
     for _, edge := range graph {
          src, dst := edge[0], edge[1]
          srcAmount := dp[src][j-1]
          dstAmount := executeTrade(src, srcAmount, dst)
          dp[dst][j] = max(dp[dst][j-1], dp[dst][j], dstAmount) // take the max of all 3
     }
}

The algorithm runs in O(k*e) time where e is the number of edges (in the worst case O(k * n^2) because a dense graph can have at most n^2 edges). The correctness of this algorithm can be proven by induction (intuitively, the optimal path of length k+1 must first begin with an optimal path of length k to some other node and from that node we can take an edge to the destination).

However, there is one problem with this algorithm. If there are arbitrage opportunities then it is possible that edges are reused in the optimal path. For example, if we have an arbitrage opportunity where we can convert USD to EUR and from EUR back to USD where we make a profit, then the optimal path is to iterate over the cycle as many times as possible and then convert into our destination asset.

This is problematic because when we execute a trade the offers that we used are no longer there. So if we try to execute the same trade again the calculation will be different. The algorithm above does not take into account that the offers can change. So we need to modify the algorithm so that we don't consider edges that would be visited more than once in a given path.

With that modification, the algorithm runs in O(k^2 * e) because the duplicate edge check takes at most O(k) time.


Benchmarks on dfs:

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	       1	2663075250 ns/op	61439896 B/op	 2817685 allocs/op
PASS

Benchmark on new code:

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	      16	  74524903 ns/op	17888031 B/op	   65145 allocs/op
PASS

It looks to be substantially better in both runtime and memory.

Known limitations

The previous dfs algorithm included much more paths in the result. This algorithm will include the optimal path using 1 edge, the optimal path using 2 edges, .., the optimal path using k edges.

@tamirms tamirms requested review from jonjove and a team November 23, 2021 10:11
@2opremio
Copy link
Contributor

@tamirms Nice work!

In addition, the number of allocations still seems to be pretty high, which probably slows down the algorithm quite a bit. It may be possible to bring them further down without fundamental changes to the algorithm.

@tamirms
Copy link
Contributor Author

tamirms commented Nov 23, 2021

In addition, the number of allocations still seems to be pretty high, which probably slows down the algorithm quite a bit.

yeah, there is one more change that I would like to try in a different PR. Currently we represent all assets as strings in all the graph data structures (e.g. the adjacency list). I have a feeling we can improve the performance significantly if we assign all assets integer ids and represent the assets in the graph data structures as 32 bit ints.

@2opremio
Copy link
Contributor

I would suggest getting a memory allocations profile first to confirm it's the strings that produce so many allocations.

Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I couple thoughts:

  1. Regarding the complexity I think it's close to worst-case O(k*e^2) not only for dense graphs but for all paths starting/ending with XLM given that there are markets to XLM from almost every other asset - so the second loop will have e elements after the first iteration (I haven't checked the actual data so I can be wrong). But this is fine, it looks it still will be faster than the current one.
  2. Regarding testing, horizon-cmp will not be enough. Would be great to have a small tool to compare with paths found by the previous version to see if they aren't worse (great if new paths are better!).
  3. I think having an array arr[a][b][i] that is true if there's any (no matter what are the final amounts) path from a to b in i jumps will make this even faster because it will remove all iterations in which reaching a final asset is not possible. Such array could be calculated on graph init and then updated on graph update.

@tamirms
Copy link
Contributor Author

tamirms commented Nov 23, 2021

@bartekn I think you mean O(k * n^2), not O(k * e^2) since if the graph is dense e = n^2, right? If so then I agree. Actually it would be O(k^2 * n^2) considering the duplicate edge check.

regarding (2) the paths returned by this algorithm will only be better if there is an arbitrage opportunity (which I think might be rare) otherwise it should return the same. However, there will be more paths in the response for the previous version. But regardless, I agree it would be worth writing a tool to verify the paths on some more data beyond the unit tests.

@2opremio
Copy link
Contributor

I think it should be simple to use a replace directive to import horizon/master and run a path search for random pairs with both implementations.

https://www.percona.com/blog/2020/03/09/using-different-versions-of-a-package-in-an-application-via-go-modules/

@tamirms tamirms merged commit 8791961 into stellar:master Nov 24, 2021
@tamirms tamirms deleted the improved-pathfinding branch November 24, 2021 11:02
@tamirms tamirms linked an issue Nov 29, 2021 that may be closed by this pull request
5 tasks
tamirms added a commit that referenced this pull request Dec 1, 2021
erika-sdf pushed a commit to erika-sdf/go that referenced this pull request Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

services/horizon: Improve performance of path finding endpoint
3 participants