Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared: improve interface for more efficient reading #1

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

RaduBerinde
Copy link
Owner

The current shared.Storage interface forces the implementation to perform a metadata operation to look up the object on each read.

This commit improves the interface to allow opening the object once and then performing many reads.

@RaduBerinde RaduBerinde force-pushed the better-shared-iface branch from d23bc21 to 376886d Compare May 4, 2023 16:57
bananabrick and others added 7 commits May 5, 2023 20:21
keyOutOfBounds is called deep in the iterator function call stack
and should be inlined.
Remove the arm-specific VFS logic, extending the previously non-arm linux file
to apply to all linux builds. This separation originated in cockroachdb#171, due to the
SyncFileRange syscall being undefined in arm builds, but the unix package now
provides it.

Fix cockroachdb#2514.
AWS (and probably some other blob storage providers) use a prefix of
the object names to partition the buckets (the length of the prefix is
chosen dynamically to get a desired number of partitions). We get best
performance when IO load is distributed evenly across the partitions.
To achieve this we prepend a hash value to the object name,
consisting of 4 hex digits. The hash value is derived from the other
parts of the name - the creator ID and creator file num.

In this change we also improve the functions that generate names to
avoid unnecessary allocations.
Remove leading zeros from the creator ID; it makes the object names
unwieldy and it's not very useful (unlike filenums, the IDs aren't
ordered in any significant way).
Previously, ingestions would trigger obsolete file deletions during the ingest
application step. This has the potential to exacerbate the write "hiccup"/stall
induced by sstable ingestion. The sequence numbers assigned to the ingestion
are not published until after ingestion application. This prevents any
concurrent writes that obtained later sequence numbers from publishing their
own sequence numbers.

When Options.Experimental.MinDeletionRate is configured, this step to delete
obsolete files should have minimal impact. However, when
Experimental.MinDeletionRate is not configured, the ingesting goroutine would
synchronously perform file deletions, extending the length of the commit stall.

Now ingestions don't trigger deletion of obsolete files. The only file deletion
that an ingestion operation itself could trigger is deletion of a manifest that
was made obsolete during a manifest rotation while committing the version edit.
This is very rare, and delaying the deletion until a flush or compaction
completes is okay.
@RaduBerinde RaduBerinde force-pushed the better-shared-iface branch from 376886d to 6928c72 Compare May 12, 2023 21:05
jbowens and others added 6 commits May 16, 2023 16:36
Add a new Metric recording the database's uptime since Open. This can be used
in combination with other metrics to integrate over time.
Add a new Compact.Duration metric that measures the cumulative time spent in
compactions since Open. This may be used in combination with Metrics.Uptime to
compute the effective compaction concurrency (cockroachdb#1934). Callers may compute
effective compaction concurrency over an interval by measuring both Uptime and
Compact.Duration at the beginning and end of the interval and subtracting.
When I extracted out the readahead logic from the sstable Reader, I
accidentally omitted some initialization code - setting
`readaheadState.size` to `initialReadaheadSize`. Fixing.
Remove the `genericFileReadable` implementation; since we moved
`Prealloc` to be a method on `File`, the regular implementation should
always work. Using the same implementation for all cases helps testing
- e.g. we can test readahead logic with files in MemFS.
This change improves the objstorage test to allow reading arbitrary
parts of objects. We add a test that verifies that the read-ahead
implementation does the Prefetch and the reopen that is expected.

We also improve logging FS to log `ReadAt` and `Prefetch` calls.

Informs cockroachdb#2531.
The current `shared.Storage` interface forces the implementation to
perform a metadata operation to look up the object on each read. Some
experimental backends have a separate internal open step that doesn't
need to be performed every time.

This commit improves the interface to allow opening the object once
and then performing many reads.
@RaduBerinde RaduBerinde force-pushed the better-shared-iface branch from 6928c72 to 52aefa8 Compare May 17, 2023 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants