Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp/orderbook: Represent assets in orderbook graph as int32 instead of strings #4102

Merged
merged 4 commits into from
Nov 30, 2021

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Nov 29, 2021

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Previously we were representing the adjacency list for the graph as a mapping from asset string to the list of offers / pools which buy / sell that asset.

Now, every asset is assigned an integer id and the adjacency list is represented as an array where the integer id is used to index into the array. This new data structure is much more compact because the asset strings were very lengthy. The new data structure is also much faster because array indexing is significantly faster than looking up keys in a map.

Why

New benchmark:

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	     100	  12064900 ns/op	 2811558 B/op	   67441 allocs/op
PASS

Old benchmark

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	      16	  74524903 ns/op	17888031 B/op	   65145 allocs/op
PASS

The new code reduced the latency from 74.5 ms per call to 12 ms per call. Also, it reduced the space from 17.8 mb per call to 2.8mb per call.

Known limitations

[N/A]

Comment on lines 203 to 212
if len(graph.vacantIDs) > 0 {
id = graph.vacantIDs[len(graph.vacantIDs)-1]
graph.vacantIDs = graph.vacantIDs[:len(graph.vacantIDs)-1]
graph.idToAssetString[id] = assetString
} else {
id = int32(len(graph.idToAssetString))
graph.idToAssetString = append(graph.idToAssetString, assetString)
graph.venuesForBuyingAsset = append(graph.venuesForBuyingAsset, nil)
graph.venuesForSellingAsset = append(graph.venuesForSellingAsset, nil)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires further clarification. Particularly as to why there are multiple ways to obtain the ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we really need the vacantIDs mechanism? It seems to complicate things and it's not obvious (to me at least) what gain it brings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of vacantIDs is to make sure the assets array does not waste any cells.

let's say we start out with an empty graph . the assets array will be empty in that case.
then we add the following assets:
0 -> 'usd'
1 -> 'eur'
2 -> 'chf'
3 -> 'sek'

now, we remove all offers and pools which have the chf asset so we can remove it. at this point the array looks like:

0 -> 'usd'
1 -> 'eur'
2 -> ''
3 -> 'sek'

the cell at index 2 is vacant and we can reuse it the next time we add a new asset, for example 'yen'

0 -> 'usd'
1 -> 'eur'
2 -> 'yen'
3 -> 'sek'

without the vacantIDs mechanism we will either have to add 'yen' to index 4 and forever let cell 2 to be empty, or we could try to reshuffle the mapping so there are no empty cells when remove 'chf' but that would be an expensive operation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Does wasting cells have a big performance impact? I presume we would be consuming a bit more memory, but maybe that's acceptable?

Copy link
Contributor Author

@tamirms tamirms Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a small performance impact because we iterate over all the cells in the search algorithm here:

for currentAsset := int32(0); currentAsset < totalAssets; currentAsset++ {

For the empty cells we skip them through this check:

			currentAmount := bestAmount[currentAsset]
			if currentAmount == 0 {
				continue
			}

I think a few empty cells wouldn't matter but if we didn't eventually fill in the empty cells I think the performance would slowly get worse over time as we accrue more and more empty cells until horizon restarts

exp/orderbook/graph.go Outdated Show resolved Hide resolved
// we assign id to asset
graph.idToAssetString = append(graph.idToAssetString, assetString)
graph.venuesForBuyingAsset = append(graph.venuesForBuyingAsset, nil)
graph.venuesForSellingAsset = append(graph.venuesForSellingAsset, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we clear venuesForBuyingAsset and venuesForSellingAsset when assigning to a vacant id? It seems we don't do that in maybeDeleteAsset either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to get included in the vacant id list it is a necessary condition that graph.venuesForBuyingAsset[asset] and graph.venuesForSellingAsset[asset] are empty:

func (graph *OrderBookGraph) maybeDeleteAsset(asset int32) {
	buyingEdgesEmpty := len(graph.venuesForBuyingAsset[asset]) == 0
	sellingEdgesEmpty := len(graph.venuesForSellingAsset[asset]) == 0

	if buyingEdgesEmpty && sellingEdgesEmpty {
		delete(graph.assetStringToID, graph.idToAssetString[asset])
		// When removing an asset we do not resize the idToAssetString array.
		// Instead, we allow the cell occupied by the id to be empty.
		// The next time we will add an asset to the graph we will allocate the
		// id to the new asset.
		graph.idToAssetString[asset] = ""
		graph.vacantIDs = append(graph.vacantIDs, asset)
	}
}

return id
}
// before creating a new int32 asset id we will try to use
// a vacant id so that we can plug any empty cells in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great design, just curious, would storing nil in idToAssetString equate to same result as maintaining separate vacancy state, i.e., iterate for idToAssetString=nil instead, perhaps for less code, but just wondering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's true we could avoid having a vacantIDs list entirely if we scan through idToAssetString to find the first empty cell. in the worst case if there are no empty cells we have to scan through the entire array before realizing we have to append to the end. Having vacantIDs makes the operation of adding a new asset faster

Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great insight and improvement.

@tamirms tamirms merged commit b5d2058 into stellar:master Nov 30, 2021
@tamirms tamirms deleted the int32-nodes branch November 30, 2021 09:05
tamirms added a commit that referenced this pull request Dec 1, 2021
…f strings (#4102)

Represent assets in orderbook graph as int32 instead of strings
erika-sdf pushed a commit to erika-sdf/go that referenced this pull request Dec 3, 2021
…f strings (stellar#4102)

Represent assets in orderbook graph as int32 instead of strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

services/horizon: Improve performance of path finding endpoint
4 participants