Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support/storage: Make the on-disk cache thread-safe. #4575

Merged
merged 2 commits into from
Sep 7, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 23 additions & 6 deletions support/storage/ondisk_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,10 @@ func (b *OnDiskCache) GetFile(filepath string) (io.ReadCloser, error) {
L := b.log.WithField("key", filepath)
localPath := path.Join(b.dir, filepath)

if _, ok := b.lru.Get(localPath); !ok {
// If the lockfile exists, we should defer to the remote source.
_, statErr := os.Stat(nameLockfile(localPath))

if _, ok := b.lru.Get(localPath); !ok || statErr == nil {
// If it doesn't exist in the cache, it might still exist on the disk if
// we've restarted from an existing directory.
local, err := os.Open(localPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this attempt to access local cache file be guarded by stateErr != nil? these local disk cache files could be getting re-written by the async preload process concurrently at same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b.lru.Get will only check the in-memory LRU cache that will tell us if localPath might be on disk, so it's a safe check. It doesn't touch the on-disk file.

Expand Down Expand Up @@ -94,7 +97,9 @@ func (b *OnDiskCache) GetFile(filepath string) (io.ReadCloser, error) {
return remote, nil
}

return teeReadCloser(remote, local), nil
return teeReadCloser(remote, local, func() error {
return os.Remove(nameLockfile(localPath))
}), nil
}

// The cache claims it exists, so just give it a read and send it.
Expand Down Expand Up @@ -143,7 +148,8 @@ func (b *OnDiskCache) Size(filepath string) (int64, error) {
}

L.WithError(err).Debug("retrieving size of cached ledger failed")
b.lru.Remove(localPath) // stale cache?
b.lru.Remove(localPath) // stale cache?
os.Remove(nameLockfile(localPath)) // errors don't matter
}

return b.Storage.Size(filepath)
Expand All @@ -162,7 +168,9 @@ func (b *OnDiskCache) PutFile(filepath string, in io.ReadCloser) error {
L.WithError(err).Error("failed to put file locally")
} else {
// tee upload data into our local file
in = teeReadCloser(in, local)
in = teeReadCloser(in, local, func() error {
return os.Remove(nameLockfile(path.Join(b.dir, filepath)))
})
}

return b.Storage.PutFile(filepath, in)
Expand Down Expand Up @@ -202,11 +210,19 @@ func (b *OnDiskCache) createLocal(filepath string) (*os.File, error) {
if err != nil {
return nil, err
}
_, err = os.Create(nameLockfile(localPath))
if err != nil {
return nil, err
}

b.lru.Add(localPath, struct{}{}) // just use the cache as an array
return local, nil
}

func nameLockfile(file string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than a lock file per each cache file, wdyt about courser lock scope and intra-process for preload exec, it would cover loading of the entire cache as one atomic thing, and letting the caller coordinate it by pass the lock:

func MakeOnDiskCache(upstream Storage, dir string, maxFiles uint, preloadSemaphore *bool) (Storage, error) 

then could replace os.Stat(nameLockfile(localPath)) with atomic.LoadBool(preloadSemaphore)

was mentioning intra-process for the preload, because it doesn't seem to add value for for externalizing preload as another o/s process(requiring file lock) rather, the preload can be a goroutine launched in the web server process at startup and passes the semaphore from that to here. works in bare-metal or containerized environments with less config(i.e. no need to define a second container in k8s deployment pod to run the preload)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm a little hazy on the details on what you're proposing here, but let me offer up a justification for the fine-grained locking I did here.

First off, this part is intentionally separate from the add'l code that will be necessary for the preload task, because I wanted to (a) keep things decoupled and (b) this needs thread safety for #4468 (parallel fetches) as well. I'm seeing issues in my draft PR with reads interfering with parallel fetches because of the caching layer (i.e. parallel fetches are still writing to the cache, but the GetLedger() notices a cache hit so it tries to read). That's why the locking is fine-grained on a per-file basis: parallel downloads means some files are safe while multiple may be unsafe.

I agree that the preload itself can be a goroutine, but that's distinct from needing the cache to be thread-safe for simultaneous cache read/write. Maybe the semaphore aspect can come in that next PR and act as a layer on top of the per-file locking here?

Copy link
Contributor

@sreuland sreuland Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shaptic , got it, the cache thread safety design needs granularity that works at both extremes, i.e. whole cache for preload and concurrently per cache key/entry for web requests. I see how the file lock mechanism encapsulated inside here will work for both those extremes.

I played around with a reader/writer pattern with mutexes as alternative to per-file locks for concurrency. In this approach, both the web requests and preload threads sync across concurrent read locks and exclusive write locks on each cache key/entry from the same singleton instance of OnDiskCache.

I drew it out in this sequence diagram

the net effect on sync behavior should be the same, it's just a matter of how much thrash do we think is avoided by doing memory locks vs. file system locks, lmk if you think it's worth exploring this.

the OnDiskCache interface here stays the same, but, if it's used, on that follow-on PR I do as part of #4526 would be to have the web server main create one instance of OnDiskCache and pass it to Archive for all web request threads and same instance passed to preload BuildCache thread, rather than letting each of them create their own instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of file-based locks is that there's no need for read locks - we get that "for free" from the OS (concurrent reads of a file is always safe), and similarly there's no need for a singleton because it's all on disk and visible to everyone.

And also the biggest benefit is that the work is already done 😆 imo we should punt exploring memory-based synchronization till later unless we see issues with this. The latency is going to be dominated by ledger and index downloads, anyway.

return file + ".lock"
}

// The below is a helper interface so that we can use io.TeeReader to write
// data locally immediately as we read it remotely.

Expand All @@ -219,12 +235,13 @@ func (t trc) Close() error {
return t.close()
}

func teeReadCloser(r io.ReadCloser, w io.WriteCloser) io.ReadCloser {
func teeReadCloser(r io.ReadCloser, w io.WriteCloser, onClose func() error) io.ReadCloser {
return trc{
Reader: io.TeeReader(r, w),
close: func() error {
r.Close()
return w.Close()
w.Close()
return onClose()
},
}
}