Ingestion nodes cannot be separated from request serving horizon nodes #2250

tamirms · 2020-02-10T12:50:58Z

We would like support the following deployment topology for horizon:

There are two pools of horizon instances. The first pool consists of horizon nodes which solely focus on ingestion. The nodes in this pool do not serve API requests. The other pool consists of horizon nodes which do not participate in ingestion and only serve API requests.

We discussed this issue previously in #1529 and #1519 (comment) . We came to the conclusion that the simplest solution was to make all horizon instances participate in ingestion AND serve API requests.

Now that we have several months of experience running horizon in this scheme (where all nodes serve API requests and ingest) we've observed that ingestion can be quite demanding in terms of CPU and memory. We would like to separate the ingestion operations from request serving operations to reduce resource contention between the two.

The difficulty in separating ingestion nodes from API nodes is that the /order_book and /paths endpoint rely on an in memory orderbook which is populated by ingestion. As a workaround we have chosen to still maintain the two pools of horizon instances but, the ingestion nodes also serve /order_book and /paths requests and the API nodes serve all other requests.

Ideally, we would like a solution that does not require special routing to handle the in memory orderbook case.

Here are some ideas:

For the /order_book endpoint we actually don't need to use the in memory graph. We have all the offers recorded in the horizon database:

https://github.com/stellar/go/blob/release-horizon-v0.25.0/services/horizon/internal/db2/schema/migrations/19_offers.sql

It should be possible to obtain all the data we need to fulfill the /order_book requests with sql queries on the offers table. However, I don't know how quickly those queries would run and we may need to add some additional indexes to improve performance.

For the path finding endpoints, I think we still need to maintain an in memory graph. But, I think it should be possible to decouple maintaining the in memory graph from ingestion. Here's the idea which was originally proposed in #1519 (comment) :

When the API horizon instances start up, they build an in memory order book graph by reading all rows from the offers table. The API horizon instances will periodically poll the offers table for new updates and apply them to the in memory order book graph. We should be able to find updates from the offers table using the last_modified_ledger column.

However, when an offer is removed from the order book, the corresponding row in the offers table is deleted. When polling for updates we would not be able to observe changes where offers are removed from the orderbook. To fix this issue we could add a deleted column on the offers table which acts as a tombstone. Instead of deleting a row, we would set the deleted flag to true. To limit the size of the offers table, we can periodically delete tombstone rows from the offers table which are older than 10,000 ledgers (or some other large cutoff).

The text was updated successfully, but these errors were encountered:

bartekn · 2020-02-10T13:06:49Z

Thanks for creating the issue!

For the /order_book endpoint we actually don't need to use the in memory graph. We have all the offers recorded in the horizon database:

I think we actually found that serving it from the graph is slightly slower on average than from a DB (#1963). It's strange because the graph keeps the offers sorted.

For the path finding endpoints, I think we still need to maintain an in memory graph. But, I think it should be possible to decouple maintaining the in memory graph from ingestion. Here's the idea which was originally proposed in #1519 (comment).

What we could do is to keep the code as it is but if a special config value is set we wait if the node finds out it should ingest into a DB. Then, when other (backend) node that actually ingests into a DB, the frontend node updates the graph only.

bartekn · 2020-02-21T12:22:23Z

Closed in #2299.

bartekn · 2020-05-20T21:06:33Z

Reopening for a discussion connected to FSC.

If Horizon-Core communication via pipe stays after a prototyping stage ingesting into memory in frontend instances can be a problem. It would require N core processes for N Horizon instances (N-N) instead of N-M (one core process for each backend ingesting instance: M). Because of this:

CPU and RAM assigned for serving HTTP responses can be used by a Stellar-Core process, potentially slowing down the responses and affecting HTTP metrics.
It may generate higher data transfer/bandwidth costs connected to running stellar-core instance.

Even outside FSC, in-memory ingestion can be a problem:

It requires and extra code path for in-memory ingestion, complicating the code.
It's an extra ingestion type which can be hard to understand.

tamirms · 2020-05-30T13:43:59Z

Another benefit of fixing this issue is that only ingesting nodes will require write access to the Horizon DB. Request serving Horizon instances will be able to operate with read only access to the DB.

Currently, any Horizon nodes which ingest into the in memory orderbook graph still require write access to the Horizon DB because it is not possible to do a SELECT FOR UPDATE query using a read only connection.

tamirms added the horizon label Feb 10, 2020

tamirms linked a pull request Feb 20, 2020 that will close this issue

services/horizon/expingest: Ingest into memory only flag #2299

Merged

7 tasks

bartekn added this to the Horizon 1.1.0 milestone Feb 20, 2020

bartekn mentioned this issue Feb 20, 2020

services/horizon/expingest: Ingest into memory only flag #2299

Merged

7 tasks

bartekn closed this as completed Feb 21, 2020

bartekn reopened this May 20, 2020

bartekn added the fast-txmeta label May 20, 2020

tamirms self-assigned this May 21, 2020

tamirms mentioned this issue May 22, 2020

services/horizon: Use Horizon DB to look up orderbook details #2617

Merged

7 tasks

tamirms closed this as completed in #2617 May 28, 2020

tamirms mentioned this issue May 28, 2020

services/horizon: Update in memory orderbook graph using Horizon DB instead of ingestion #2630

Merged

7 tasks

tamirms reopened this May 28, 2020

tamirms closed this as completed in #2630 Jun 1, 2020

tamirms modified the milestones: Horizon 1.0.1, Horizon 1.4.0 Jun 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingestion nodes cannot be separated from request serving horizon nodes #2250

Ingestion nodes cannot be separated from request serving horizon nodes #2250

tamirms commented Feb 10, 2020

bartekn commented Feb 10, 2020 •

edited

Loading

bartekn commented Feb 21, 2020

bartekn commented May 20, 2020

tamirms commented May 30, 2020

Ingestion nodes cannot be separated from request serving horizon nodes #2250

Ingestion nodes cannot be separated from request serving horizon nodes #2250

Comments

tamirms commented Feb 10, 2020

bartekn commented Feb 10, 2020 • edited Loading

bartekn commented Feb 21, 2020

bartekn commented May 20, 2020

tamirms commented May 30, 2020

bartekn commented Feb 10, 2020 •

edited

Loading