Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion nodes cannot be separated from request serving horizon nodes #2250

Closed
tamirms opened this issue Feb 10, 2020 · 4 comments · Fixed by #2299, #2617 or #2630
Closed

Ingestion nodes cannot be separated from request serving horizon nodes #2250

tamirms opened this issue Feb 10, 2020 · 4 comments · Fixed by #2299, #2617 or #2630

Comments

@tamirms
Copy link
Contributor

tamirms commented Feb 10, 2020

We would like support the following deployment topology for horizon:

There are two pools of horizon instances. The first pool consists of horizon nodes which solely focus on ingestion. The nodes in this pool do not serve API requests. The other pool consists of horizon nodes which do not participate in ingestion and only serve API requests.

We discussed this issue previously in #1529 and #1519 (comment) . We came to the conclusion that the simplest solution was to make all horizon instances participate in ingestion AND serve API requests.

Now that we have several months of experience running horizon in this scheme (where all nodes serve API requests and ingest) we've observed that ingestion can be quite demanding in terms of CPU and memory. We would like to separate the ingestion operations from request serving operations to reduce resource contention between the two.

The difficulty in separating ingestion nodes from API nodes is that the /order_book and /paths endpoint rely on an in memory orderbook which is populated by ingestion. As a workaround we have chosen to still maintain the two pools of horizon instances but, the ingestion nodes also serve /order_book and /paths requests and the API nodes serve all other requests.

Ideally, we would like a solution that does not require special routing to handle the in memory orderbook case.

Here are some ideas:

For the /order_book endpoint we actually don't need to use the in memory graph. We have all the offers recorded in the horizon database:

https://github.com/stellar/go/blob/release-horizon-v0.25.0/services/horizon/internal/db2/schema/migrations/19_offers.sql

It should be possible to obtain all the data we need to fulfill the /order_book requests with sql queries on the offers table. However, I don't know how quickly those queries would run and we may need to add some additional indexes to improve performance.


For the path finding endpoints, I think we still need to maintain an in memory graph. But, I think it should be possible to decouple maintaining the in memory graph from ingestion. Here's the idea which was originally proposed in #1519 (comment) :

When the API horizon instances start up, they build an in memory order book graph by reading all rows from the offers table. The API horizon instances will periodically poll the offers table for new updates and apply them to the in memory order book graph. We should be able to find updates from the offers table using the last_modified_ledger column.

However, when an offer is removed from the order book, the corresponding row in the offers table is deleted. When polling for updates we would not be able to observe changes where offers are removed from the orderbook. To fix this issue we could add a deleted column on the offers table which acts as a tombstone. Instead of deleting a row, we would set the deleted flag to true. To limit the size of the offers table, we can periodically delete tombstone rows from the offers table which are older than 10,000 ledgers (or some other large cutoff).

@bartekn
Copy link
Contributor

bartekn commented Feb 10, 2020

Thanks for creating the issue!

For the /order_book endpoint we actually don't need to use the in memory graph. We have all the offers recorded in the horizon database:

I think we actually found that serving it from the graph is slightly slower on average than from a DB (#1963). It's strange because the graph keeps the offers sorted.

For the path finding endpoints, I think we still need to maintain an in memory graph. But, I think it should be possible to decouple maintaining the in memory graph from ingestion. Here's the idea which was originally proposed in #1519 (comment).

What we could do is to keep the code as it is but if a special config value is set we wait if the node finds out it should ingest into a DB. Then, when other (backend) node that actually ingests into a DB, the frontend node updates the graph only.

@bartekn
Copy link
Contributor

bartekn commented Feb 21, 2020

Closed in #2299.

@bartekn bartekn closed this as completed Feb 21, 2020
@bartekn
Copy link
Contributor

bartekn commented May 20, 2020

Reopening for a discussion connected to FSC.

If Horizon-Core communication via pipe stays after a prototyping stage ingesting into memory in frontend instances can be a problem. It would require N core processes for N Horizon instances (N-N) instead of N-M (one core process for each backend ingesting instance: M). Because of this:

  • CPU and RAM assigned for serving HTTP responses can be used by a Stellar-Core process, potentially slowing down the responses and affecting HTTP metrics.
  • It may generate higher data transfer/bandwidth costs connected to running stellar-core instance.

Even outside FSC, in-memory ingestion can be a problem:

  • It requires and extra code path for in-memory ingestion, complicating the code.
  • It's an extra ingestion type which can be hard to understand.

@tamirms
Copy link
Contributor Author

tamirms commented May 30, 2020

Another benefit of fixing this issue is that only ingesting nodes will require write access to the Horizon DB. Request serving Horizon instances will be able to operate with read only access to the DB.

Currently, any Horizon nodes which ingest into the in memory orderbook graph still require write access to the Horizon DB because it is not possible to do a SELECT FOR UPDATE query using a read only connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment