-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement read coalescing algorithm #1198
Conversation
The removal of the stateful handle is motivated by #1157 (comment) |
Only in python 3.11+ do we see
looking into it... edit: because in other versions the test is skipped:
disabled in versions less than 3.11 in #1012 |
The problem is the signature of import fsspec
fs, path = fsspec.url_to_fs("s3://pivarski-princeton/pythia_ppZee_run17emb.picoDst.root", anon=True) If I create a file-object handle and read the first 4 bytes, I get: fh = fs.open(path)
fh.seek(0)
assert fh.read(4) == b"root" if I instead use data = fs.cat_file(path, 0, 4)
assert len(data) == 4 # fails, len(data) is 978299156
assert data == b"root" # also fails, data[:4] is b'\x00\x00\xf1l' So there is some mismatch in the behavior that I was initially puzzled by. Then I realized that, depending on the implementation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, src/uproot/source/coalesce.py is a suite of utility functions used only by FSSpecSource to merge nearby but not necessarily adjacent byte requests—download a bit too much data, but have (far?) fewer requests and responses in flight, which is a net savings, depending on how a network's throughput and latency compare.
Presumably, it would need to be tuned for each network. How are the default settings? (It looks to me like the coalesce functions are always used by FSSpecSource, so this will affect the default case.) The most conservative default is to only coalesce adjacent byte ranges, but the most common network scenario (grad student in a basement, trying to get data from CERN) is more latency-bound than throughput-bound. Maybe the default could accept 25% over-read or so.
I see that this also only affects FSSpecSource. It could, in principle, be applied to the pre-FSSpecSources, but there's less reason to do that. FSSpecSource has the many-requests-in-flight model built into it, and the old sources attempt to do vector reads (only one request in flight).
I'm in favor of this. If @lobis is available, it would be great to hear what you think! If we don't hear from you by Thursday, I think we should go ahead with this PR.
The defaults are in spirit similar to the CMSSW defaults, though the use case is certainly different. I was hoping to gain some experience in the wild how different these are. As for implementing it in pre-fsspec sources, indeed that is an option, and one I think may be useful. For example, even in the case of vector reads (which fsspec is now doing if the protocol supports, through the |
It's Thursday; I think we should go ahead with this PR. I'll update to |
This PR implements two read optimization strategies:
Coalesce nearby byte ranges into one read range
For example, two ranges in a request
[(100, 140), (142, 160)]
becomes one range[(100, 160)]
that is then split on the receiving side. The motivation is that, on the server side, experience from CMSSW with xrootd shows that sometimes servers handle the two ranges separately in backend calls in a way that lowers overall throughput.Split very large requests into smaller chunks
Since #1162 we went from a situation where there was one request per range using
fs.cat_file
to one request for several ranges usingfs.cat_ranges
. This was motivated by the fact that we observed a large overhead with xrootd for separate read requests due to needing to open and close a (statueful 😮) xrootd file handle. Arguably, this was fixed in a different way with CoffeaTeam/fsspec-xrootd#54 but one might debate that we have reason to prefer a vector read call (used in the fsspec-xrootd implementation ofcat_ranges
) over individual read calls in xrootd.One of the consequences of this change is the future that notifies the downstream work will wait until all data is received. In the case where we are requesting several TBaskets for several branches, this can become a large read request (Megabytes). If we separate a bulk multi-range request into a few smaller (but still substantial compared to the ranges in the request) request, we can do work on (e.g. decompress) a subset of the expected data while the rest is in flight. For example, a 20MB request might be composed of two hundred 100k ranges, and waiting for the whole thing adds overall latency:
Tuning
The PR introduces a configuration tunable via, e.g.
The defaults are not tuned at the moment. Further studies will be needed to choose good defaults.
edit: now that I think about it,
min_first_request_bytes
is probably not super important. We probably want to set it the same asmax_request_bytes
and both should be something maybe a factor 10 smaller than the total size of the data to read in a singlesource.chunks()
call.