Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
utils/cached_file: reduce latency (and increase overhead) of partiall…
…y-cached reads Currently, `cached_file::stream` (currently used only by index_reader, to read index pages), works as follows. Assume that the caller requested a read of the range [pos, pos + size). Then: - If the first page of the requested range is uncached, the entire [pos, pos + size) range is read from disk (even if some later pieces of it are cached), the resulting pages are added to the cache, and the read completes (most likely) from the cached pages. - If the first page of the read is cached, then the rest of the read is handled page-by-page, in a sequential loop, serving each page either from cache (if present) or from disk. For example, assume that pages 0, 1, 2, 3, 4 are requested. If exactly pages 1, 2 are cached, then `stream` will read the entire [0, 4] range from disk and insert the missing 0, 3, 4, and then it will continue serving the read from cache. If exactly pages 0 and 3 are cached, then it will serve 0 from cache, then it will read 1 from disk and insert it into cache, then it will read 2 from disk and insert it into cache, then it will serve 3 from cache, then it will read 4 from disk and insert it into cache. If exactly the first page is cached, a 128 kiB read turns into 31 I/O sequential read ops. This is weird, and doesn't look intended. In one case, we are reading even pages we already have, just to avoid fragmenting the read, and in the other case we are reading pages one-by-one (sequentially!) even if they are neighbours. I'm not sure if cached_file should minimize IOPS or byte throughput, but the current state is surely suboptimal. Even if its read strategy is somehow optimal, it should still at least coalesce contiguous reads and perform the non-contiguous reads in parallel. This patch leans into minimizing IOPS. After the patch, we serve as many front pages from the cache as we can, but when we see an uncached page, we read the entire remainder of the read from disk. As if we trimmed the read request by the longest cached prefix, and then performed the rest using the logic from before the patch. For example, if exactly pages 0 and 3 are cached, then we serve 0 from cache, then we read [1, 4] from disk and insert everything into cache. For partially-cached files, this will result in more bytes read from disk, but less IOPS. This might be a bad thing. But if so, then we should lean the other way in a more explicit and efficient way than we currently do. Closes scylladb#20935
- Loading branch information