Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp/lighthorizon: Refactor archive interface and support parallel ledger downloads. #4548
exp/lighthorizon: Refactor archive interface and support parallel ledger downloads. #4548
Changes from 1 commit
4d82993
b6028d9
5289d48
07698f3
899b4b6
a4dcfea
53d850e
5d0b32a
d3cbca6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how's the test coverage at this, I can't tell if it's used in other tests or mocked out. Maybe at some point a parallel_ingester_test.go will be worthwhile to assert unit testing of an instance and PrepareRange/GetLedger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage is pretty non-existent besides my manual testing, unfortunately. I definitely want to mock out a parallel ingestion simulation but I'm worried about the sprint time crunch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can circle back as tech debt, we should try to include the effort of writing tests in our point estimates during poker also as it's integral part of the feature, it's ok if the story doesn't close by eos due to test coverage, as it just reflects the accurate velocity of feature work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's on me for sure. I didn't factor in the challenge of testing the implementation during the poker (unfortunately it was also "in-progress" when we did poker so I had to point it myself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to prune the feed entries per seqNum, what if for simplification, remove the write locking escalation from read lock, and just return the found feed state, and this just gets gc'd once when caller is done and its parallelIngester falls out of scope, could init the queue to empty at top of PrepareRange() to support caller doing multiple ranges on same instance of ParallelIngester?
This could also prevent caller from doing a re-entrant call to GetLedger with same ledgerSeq not falling into non-parallel ln 123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is support for multiple ranges! They will all get added to the queue. The only caveat is that each ledger will only be accessible one time (noteworthy in the case of overlapping ranges).
As for GC, isn't it the case that the
parallelIngester
will exist essentially throughout the duration of the program? It gets created once and passed around by reference everywhere. This would mean that theledgerFeed
would continue to grow and grow, essentially acting as an in-memory cache on top of the on-disk cache on top of the S3 ledger source. I'm not opposed to it necessarily (maybe we can actually use an LRU cache ofuint32
->downloadState
instead of amap[uint32]downloadState
), but I want to make sure that such a design doesn't just eventually OOM because I don't think the GC will ever kick in.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'm wondering if you think it's worthwhile for simplification, to have instance of
parallelIngester
and queue/feed state scoped to per invocation of PrepareRange() such as a closure rather than as singleton (same instance per lifetime of program)? Ideally it seems this could avoid additional locking around shared state like here where wouldn't need to do the lock escalation from read up to write to purge, rather we know this queue instance will get gc'd as a whole when PrepareRange() invocation ends and any vars in it's closure lose reference.Would that also safely allow parallel invocations of PrepareRange(), since each isolates on its own queue/feed, and they all converge into the single LRU cache which has concurrency already right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you're getting at. The model right now is something like:
parallelIngester
is created on launchYou note that this probably will cause high contention for the workers and the ledger feed (
sync.Map
still has to handle concurrency, after all). You propose that each request get its own dedicated worker pool, ledger feed, etc. in order to minimize contention. I think that's a great point, but I have some follow-up concerns:--parallel-downloads
workers dedicated to them will actually lead to too much saturation and churn on the network? This is my biggest concern. ImagineworkerCount = 4
and we serve 100 requests per second. That's 400 concurrent downloads, open ports, etc. all to the same destination. Or should we keep the download worker pool global while the feeds/queues stay local as you suggest?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the concurrent web requests end up time slicing over x workers, there might be a point of inflection on web response times where it's fast in low traffic but would slow down in higher traffic. But, I would avoid trying perf tuning this any further and instead get the functionality out as-is first to tune around if needed later. Already have a good algorithm here with some nice locking optimizations, so, should provide good starting point, thanks for considering the idea and discussion, I don't want to hold up the merge, thanks!