-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Fix File.ReadAllBytes{Async} for virtual file system files #28388
Conversation
1a6538c
to
8a5bfc2
Compare
} | ||
|
||
[Theory] | ||
[PlatformSpecific(TestPlatforms.Linux)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar test for OSX maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know of any file paths that present this behavior on macOS?
Some file systems, like procfs, can return 0 for a file's length, even though it actually contains on-demand computed contents. This breaks File.ReadAllBytes{Async}, which currently assumes that a length of 0 means the file is empty. This commit fixes it by changing the assumption to mean that a length of 0 means we can't depend on the Length and instead need to read until EOF.
8a5bfc2
to
c40fa44
Compare
Thanks @stephentoub! |
@@ -341,6 +350,50 @@ public static byte[] ReadAllBytes(string path) | |||
} | |||
} | |||
|
|||
private static byte[] ReadAllBytesUnknownLength(FileStream fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed something but it doesn't seem to throw new IOException(SR.IO_FileTooLong2GB)
in case of newLength > int.MaxValue
like it is done in ReadAllBytes
for the case where length is known. And it seems like DefaultArrayPool.Rent
doesn't throw either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this stops when Rent throws out-of-memory which will happen when you ask it to allocate more than MaxByteArrayLength (or when there is no more memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see such limit in DefaultArrayPool
:
buffer = new T[minimumLength]; |
So it will throw only on some runtime limit or out of memory which seem to be inconsistent with known length case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaxByteArrayLength
is the max array length supported by .NET.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks! That's what I actually didn't realize
uint newLength = (uint)buffer.Length * 2; | ||
if (newLength > MaxByteArrayLength) | ||
{ | ||
newLength = (uint)Math.Max(MaxByteArrayLength, buffer.Length + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't there be signed integer overflow in buffer.Length + 1
at some point? We don't seem to have any hard limit for this whole loop. Maybe you actually meant Math.Min
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For buffer.Length + 1 > MaxByteArrayLength
case, Rent will end up throwing (I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand it correctly - when buffer.Length becomes MaxByteArrayLength on all further iterations it grows by one byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmds That's the thing - DefaultArrayPool.Rent implementation doesn't seem have any limits (or maybe I missed one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see this is how it behaves:
On each iteration size doubles;
Then it tries MaxByteArrayLength;
And on the next iteration (buffer.Length + 1), Rent throws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm looking into DefaultArrayPool.Rent implementation and in case requested length is greater than supported by pool it just creates an array of requested length:
buffer = new T[minimumLength]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So newermind. I didn't realize that MaxArrayLength is the actual maximum supported by .NET
Q. Didn't we suppose to limit this fix only for Unixes? |
I considered that but decided not to special case because a) this only applies to the minor case where the file length is 0, b) it adds minimal cost to that case (one read operation), c) it actually saves some cost in that case as well by using If others disagree, though, it's trivial to change it to just be for non-Windows. @JeremyKuhne? |
Strictly speaking when it runs under WSL it recognizes itself running on Linux so Windows-only code is not executed there |
I understand that. I'm saying that this forward-proofs it against a potential future where you can access those paths from outside of the subsystem. Or against other arbitrary file system drivers that behave similarly. |
I see. Thanks |
I'm fine with it being included on Windows for the purposes you mentioned. |
…refx#28388) Some file systems, like procfs, can return 0 for a file's length, even though it actually contains on-demand computed contents. This breaks File.ReadAllBytes{Async}, which currently assumes that a length of 0 means the file is empty. This commit fixes it by changing the assumption to mean that a length of 0 means we can't depend on the Length and instead need to read until EOF. Commit migrated from dotnet/corefx@77bf407
Some file systems, like procfs, can return 0 for a file's length, even though it actually contains on-demand computed contents. This breaks File.ReadAllBytes{Async}, which currently assumes that a length of 0 means the file is empty. This commit fixes it by changing the assumption to mean that a length of 0 means we can't depend on the Length and instead need to read until EOF.
This brings File.ReadAllBytes inline with the behavior of File.ReadAllText. Note that this does mean that it'll also start OOM'ing in similar situations, e.g. if you try to do
File.ReadAllBytes("/dev/urandom")
, it'll eventually OOM whereas today it returns an empty array.Fixes https://github.com/dotnet/corefx/issues/28343
cc: @JeremyKuhne, @tmds, @VasiliyNovikov