Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design backend for serving TxMeta #4910

Closed
tamirms opened this issue Jun 15, 2023 · 4 comments
Closed

Design backend for serving TxMeta #4910

tamirms opened this issue Jun 15, 2023 · 4 comments
Labels
horizon performance issues aimed at improving performance

Comments

@tamirms
Copy link
Contributor

tamirms commented Jun 15, 2023

The results from https://docs.google.com/document/d/1YETNALx5EzqZDNSVWzTfaK5Ogw84PsBlrt64nOr-Njg/edit?usp=sharing show that precomputing TxMeta is very effective in speeding up ingestion. We need to design a solution for distributing TxMeta so Horizon operators (along with Hubble) can ingest precomputed TxMeta instead of relying on captive core.

Using a blobstore like S3 or GCS (google cloud storage) is appealing because then we don't need to build any infrastructure to serve requests for TxMeta. However, there are concerns about download latency being too slow. Perhaps these concerns can be mitigated by using cloudflare for caching and batching several ledgers together in one file.

To complete this issue we need a design document which proposes a solution for distributing TxMeta and analyzes the cost and performance of the solution.

@tamirms tamirms moved this from Backlog to Next Sprint Proposal in Platform Scrum Jun 15, 2023
@mollykarcher mollykarcher added performance issues aimed at improving performance and removed snapshots labels Jun 15, 2023
@mollykarcher mollykarcher moved this from Next Sprint Proposal to Current Sprint in Platform Scrum Jun 20, 2023
@Shaptic Shaptic moved this from Current Sprint to Next Sprint Proposal in Platform Scrum Jun 20, 2023
@sreuland
Copy link
Contributor

sreuland commented Jun 22, 2023

I'd throw kafka broker in for consideration as 'blobstore' candidate and alternative to proprietary cloud options. Available on most cloud infra providers as 'managed' deployment or can deploy internal(but, requires substantial ops support). Kafka has been used in terabyte situations and has a client-broker transport for throughput, h/a, message delivery. ideas on model would be like message_id: <ledger_id> and message_payload: <base64_ledger_txmeta> additional kafka message headers could be added for letting clients do pro-active filtering/routing based on other attributes of the ledger.

the notion of consumer offest in the protocol could be interesting way to enable random access to ledgers by sequence number, and thereby allowing clients to consume historical ledgers in a custom replay ranged use case, synonymous with reingest range <from> <to>.

using the kafka message offset for random access to ledgers would entail using a single partition topic strategy and initially publishing messages to the topic starting with genesis ledger as offset=0, that way the insertion order of messages is preserved and the offset mirrors the ledger sequence number such as ledgerN will be at offsetN-1.

@mollykarcher mollykarcher moved this from Next Sprint Proposal to Current Sprint in Platform Scrum Jul 19, 2023
@mollykarcher mollykarcher moved this from Current Sprint to Next Sprint Proposal in Platform Scrum Aug 29, 2023
@mollykarcher mollykarcher moved this from Next Sprint Proposal to Current Sprint in Platform Scrum Aug 29, 2023
@sreuland
Copy link
Contributor

sreuland commented Oct 16, 2023

may want to include as part of HLD(high level design) if/how the existing mono-repo ingestion sdk will enable access to new TxMeta source, new ledgerbackend? identify which programming languages are highly desired for ingestion sdk with this new TxMeta capability but not present yet, is the mono-go the only one currently, are any other languages minimally required as part of new TxMeta solution?

also, have the ingestion sdk provide outbound path from the ledgerbackend interface, so have the ability for apps to build publishers to remote tx meta sources as well, same sdk can be pub and sub sides.

@mollykarcher
Copy link
Contributor

mollykarcher commented Oct 19, 2023

may want to include as part of HLD(high level design) if/how the existing mono-repo ingestion sdk will enable access to new TxMeta source, new ledgerbackend? identify which programming languages are highly desired for ingestion sdk with this new TxMeta capability but not present yet, is the mono-go the only one currently, are any other languages minimally required as part of new TxMeta solution?

also, have the ingestion sdk provide outbound path from the ledgerbackend interface, so have the ability for apps to build publishers to remote tx meta sources as well, same sdk can be pub and sub sides.

+1 to this. I think it makes sense to productize/productionalize both the publish and consumption side. That is, there is an SDK/package that produces this new TxMeta ledger backend, and there is also one that consumes from it. I'm not particularly opinionated on whether those should be the same package or different packages, but I think keeping them both in the ingest package could definitely make sense.

@sreuland
Copy link
Contributor

sreuland commented Nov 1, 2023

@chowbao , just wanted to reference this design ticket earmarked for remote tx meta storage, as you're design proposal is overlapping, we'll want to incorporate your summary here, thanks!

@github-project-automation github-project-automation bot moved this from Current Sprint to Done in Platform Scrum Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
horizon performance issues aimed at improving performance
Projects
Status: Done
Development

No branches or pull requests

3 participants