Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make small retrieval 200x faster #7693

Merged
merged 9 commits into from
Nov 29, 2021
Merged

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Nov 26, 2021

This PR adds direct random-access capability to the lotus DAGStore mount.

Right now, when any retrieval is performed, DAGStore will read the entire piece and store it in a temporary file for random access. This is acceptable when retrieving the entire piece, but incredibly inefficient when performing retrievals for small % of the original piece.

By avoiding this copy step, small retrievals in many cases become sub-second, and time-to-first-byte in all retrievals is drastically reduced.

Example timings

Running current master, retrieving a few kb from a 16G piece

 ./lotus client ls --miner f0... --maxPrice 0 bafy...
Recv 0 B, Paid 0 FIL, Open (New), 0s
Recv 0 B, Paid 0 FIL, DealProposed (WaitForAcceptance), 9ms
Recv 0 B, Paid 0 FIL, DealAccepted (Accepted), 45ms
Recv 0 B, Paid 0 FIL, PaymentChannelSkip (Ongoing), 45ms
Recv 0 B, Paid 0 FIL, Complete (CheckComplete), 1m6.321s
Recv 0 B, Paid 0 FIL, WaitForLastBlocks (WaitingForLastBlocks), 1m6.338s
Recv 79 KiB, Paid 0 FIL, BlocksReceived (WaitingForLastBlocks), 1m6.34s
Recv 79 KiB, Paid 0 FIL, AllBlocksReceived (FinalizingBlockstore), 1m6.354s
Recv 79 KiB, Paid 0 FIL, BlockstoreFinalized (Completed), 1m6.354s

Running this PR, also retrieving a few kb from a different 16G piece (different piece to avoid any potential caching)

./lotus client ls --miner f02620 --maxPrice 0 QmebwyN5z8CiczuTFsfr3rio3ttHao8CoZwMeFHVu4Xhmd
Recv 0 B, Paid 0 FIL, Open (New), 0s
Recv 0 B, Paid 0 FIL, DealProposed (WaitForAcceptance), 10ms
Recv 0 B, Paid 0 FIL, DealAccepted (Accepted), 67ms
Recv 0 B, Paid 0 FIL, PaymentChannelSkip (Ongoing), 67ms
Recv 0 B, Paid 0 FIL, Complete (CheckComplete), 267ms
Recv 0 B, Paid 0 FIL, WaitForLastBlocks (WaitingForLastBlocks), 280ms
Recv 2.025 KiB, Paid 0 FIL, BlocksReceived (WaitingForLastBlocks), 284ms
Recv 2.025 KiB, Paid 0 FIL, AllBlocksReceived (FinalizingBlockstore), 284ms
Recv 2.025 KiB, Paid 0 FIL, BlockstoreFinalized (Completed), 285ms

@magik6k magik6k requested a review from a team as a code owner November 26, 2021 18:10
@jennijuju jennijuju requested review from aarshkshah1992, nonsense, raulk, masih and dirkmc and removed request for aarshkshah1992 November 26, 2021 18:22
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a cursory look, and the approach generally LGTM. However, something to call out is that efficiency will be (a) DAG-shape dependent and (b) request-dependent.

Re: (a), a DAG composed of tiny blocks will lead to substantial read amplification and roundtripping, and I expect this approach to be inefficient (comparatively how inefficient with the baseline approach will depend on (b)).

Re: (b), if the caller is fetching the root CID of the deal with the default selector, this equates to fetching the entire deal DAG. At that point it could be more efficient to copy the shard data and serve from local, like we were doing before.

@magik6k -- what do you think about introducing a "planner" component that decides which strategy to use based on (b)?

For (a) we should eventually store metadata about a DAG, something like a DESCRIBE DAG, as we receive it, along with some stats (average min/max/avg node size, etc). This would allow us to strategise around (a).

@jennijuju jennijuju removed the request for review from masih November 26, 2021 19:07
@jennijuju jennijuju modified the milestones: v1.13.1, v1.13.2 Nov 26, 2021
@magik6k
Copy link
Contributor Author

magik6k commented Nov 26, 2021

@raulk From my testing, when retrieving the whole piece, reads tend to be sequential because graphsync requests data from the DAGStore in the same order it was written to the CAR file, and the pieceReader is exploiting this property.

For some, bigger partial retrievals most accesses should still be mostly sequential, but possibly with multiple "heads", but here more data/experimentation is needed.

(basically, in my testing, I didn't come up with an example in which copying data would be more efficient)

@magik6k magik6k force-pushed the feat/randacc-dagstore-mount branch from 6be6dc4 to 9110e6f Compare November 26, 2021 19:25
@raulk
Copy link
Member

raulk commented Nov 26, 2021

@magik6k Oh yeah, that's a good point. Both the layout of data in disk and the retrieval order are depth-first. That fact, along with the "burn" leniency, should solve both challenges in majority of cases 👍 Maybe we can think of some selectors and DAG combinations whose pattern of access would be different, and test those?

@codecov
Copy link

codecov bot commented Nov 26, 2021

Codecov Report

Merging #7693 (4bcde2f) into master (3b1d86b) will decrease coverage by 0.03%.
The diff coverage is 68.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7693      +/-   ##
==========================================
- Coverage   39.56%   39.53%   -0.04%     
==========================================
  Files         637      638       +1     
  Lines       67924    67990      +66     
==========================================
+ Hits        26872    26877       +5     
- Misses      36446    36488      +42     
- Partials     4606     4625      +19     
Impacted Files Coverage Δ
markets/dagstore/piecereader.go 56.66% <56.66%> (ø)
extern/sector-storage/mock/mock.go 65.37% <66.66%> (ø)
extern/sector-storage/piece_provider.go 55.69% <68.42%> (-1.07%) ⬇️
markets/dagstore/miner_api.go 63.09% <70.00%> (ø)
markets/sectoraccessor/sectoraccessor.go 72.30% <71.42%> (-2.30%) ⬇️
markets/dagstore/mocks/mock_lotus_accessor.go 80.95% <82.60%> (+0.46%) ⬆️
markets/dagstore/mount.go 85.71% <100.00%> (+12.38%) ⬆️
node/builder_miner.go 94.89% <100.00%> (+0.03%) ⬆️
node/modules/storageminer_dagstore.go 61.53% <100.00%> (ø)
journal/types.go 86.66% <0.00%> (-13.34%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b1d86b...4bcde2f. Read the comment docs.

Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question but in general LGTM 👍

markets/dagstore/piecereader.go Outdated Show resolved Hide resolved
@magik6k magik6k merged commit 26c3752 into master Nov 29, 2021
@magik6k magik6k deleted the feat/randacc-dagstore-mount branch November 29, 2021 15:10
@magik6k magik6k mentioned this pull request Dec 8, 2021
6 tasks
@hunjixin
Copy link
Contributor

hunjixin commented Jan 5, 2022

is this pr was rollback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants