FileHelpers: Add FileDescriptor.read(filling buffer:) #84

simonjbeaumont · 2022-05-05T14:27:15Z

Because read(into:) and write(into:) return the number of bytes read or written which may be smaller than desired, users will often want to call these functions in a loop until they have read or written all the bytes they require. Such loops require keeping track of the index and will be repeated toil in each application.

Swift System already provides an extensions writeAll(_:) and writeAll(toAbsoluteOffset:_:) which operates on a sequence of bytes and writes all the bytes in the sequence.

This patch adds an analogous helper function for reading, read(filling buffer:) which takes an UnsafeMutableRawBufferPointer and reads until the buffer is full, or until EOF is reached.

milseman

IIUC this is ABI breaking.

CC @lorentey

lorentey · 2022-05-05T21:26:21Z

Yep, sadly we cannot remove the existing, throwing entry point.

We could introduce a new non-throwing one while keeping the original around, but unfortunately it would need to have a more recent availability declaration, and we could only call it after an if #available check. Still, if the throwing variant is causing performance issues (does it?), then it may be a good idea to do that.

simonjbeaumont · 2022-05-06T11:40:57Z

Makes sense, well this was in favour of me adding readAll(into:) and readAll(fromAbsoluteOffset:into:) to mirror the existing writeAll(_:) and writeAll(toAbsoluteOffset:_:) helpers.

Maybe I'll repurpose this PR for that since it provides a real motivation for the non-throwing, internal _read.

milseman · 2022-05-06T12:15:57Z

We couldn't do readAll at the time. With opaque return types with associated type bounds we should soon be able to do a read all that returns some Collection<UInt8>. When we get RawArray or some universal appendable untyped view, we can do readAll(into:).

I wonder if there's anything similar to readAll(into:) we could do that would make it easier for other types like ByteBuffer to wrap

simonjbeaumont · 2022-05-06T13:40:41Z

Do I assume that it's not appealing to have a wrapper that still deals with UnsafeMutableRawBufferPointer, one that just handles the while loop over the underlying read(into:)?

milseman · 2022-05-06T14:02:56Z

Do I assume that it's not appealing to have a wrapper that still deals with UnsafeMutableRawBufferPointer, one that just handles the while loop over the underlying read(into:)?

read(into:) wouldn't be able to resize the UMRBP. We could have something that attempts to fill it and returns how much it filled and whether there might be more input left.

simonjbeaumont · 2022-05-06T14:25:36Z

read(into:) wouldn't be able to resize the UMRBP. We could have something that attempts to fill it and returns how much it filled and whether there might be more input left.

Ah, right, I meant one that would read until the UMRBP was filled. Clearly readAll is the wrong name here. It's fill this buffer with read bytes, please, I don't know how it should be spelled exactly, maybe read(filling buffer:)? It's an analog to the fact that we have a helper on the write path which does the while-loop-index dance, but nothing on the read path.

milseman · 2022-05-06T19:44:45Z

read(filling:) makes sense to me if it will avoid the loop dance for use cases that want to read up to a fixed length from a file. E.g. they're doing a segmented or buffering scheme anyways.

@lorentey, thoughts?

Because `read(into:)` and `write(into:)` return the number of bytes read or written which may be smaller than desired, users will often want to call these functions in a loop until they have read or written all the bytes they require. Such loops require keeping track of the index and will be repeated toil in each application. Swift System already provides an extensions `writeAll(_:)` and `writeAll(toAbsoluteOffset:_:)` which operates on a sequence of bytes and writes all the bytes in the sequence. This patch adds an analogous helper function for reading, `read(filling buffer:)` which takes an `UnsafeMutableRawBufferPointer` and reads until the buffer is full, or until EOF is reached.

simonjbeaumont · 2022-05-16T08:30:19Z

@milseman wrote:

read(filling:) makes sense to me if it will avoid the loop dance for use cases that want to read up to a fixed length from a file. E.g. they're doing a segmented or buffering scheme anyways.

@lorentey, thoughts?

OK, well I repurposed this PR for read(filling buffer:) and made sure to reinstate the (not actually) throwing internal _read.

milseman

@lorentey can you take a look? What do you think the behavior should be?

milseman · 2022-05-16T14:30:22Z

Sources/System/FileOperations.swift

+    _readNoThrow(into: buffer, retryOnInterrupt: retryOnInterrupt)
+  }
+
+  @usableFromInline


This needs availability, right @lorentey ?

Unfortunately yes, and on ABI stable platforms, read will need to dispatch to either _read or _readNoThrow, depending on the platform & version we're running on.

Given the importance of this particular entry point, I think it's worth considering marking the new non-throwing variant @_alwaysEmitIntoClient @inline(never) to prevent the availability check.

Otherwise we'll need to mess with #if SWIFT_PACKAGE to enable/disable availability checking based on whether or not we're building the ABI stable variant.

I think it's okay to land this without resolving the availability mess -- ABI stable builds aren't possible to do within this repo, and I wouldn't want to force Simon to have to deal with a tricky problem based on partial info and no tooling. It's tricky enough to get right when we have the benefit of the ABI checker and a working build system. (Unless he really really wants to do it, of course!)

I wouldn't want to force Simon to have to deal with a tricky problem based on partial info and no tooling. It's tricky enough to get right when we have the benefit of the ABI checker and a working build system. (Unless he really really wants to do it, of course!)

😅 sounds like I might be opening a can of worms here. It sounds like merging this PR without addressing the ABI issue might be the quickest solution. After merging though, I'm happy to collaborate on how to please the ABI checker, but I confess this isn't something I've done before.

milseman · 2022-05-16T14:31:15Z

Sources/System/FileHelpers.swift

@@ -79,6 +79,75 @@ extension FileDescriptor {
    }
  }

+  /// Reads bytes at the current file offset into a buffer until the buffer is filled.


What happens if the file runs out of bytes?

Well, I guess that's up for grabs. When you added this comment it would block. But this doesn't seem right. I added an update which checks for EOF and returns (which makes returning an Int more relevant again).

There's another option to consider. We could have read(filling:) throw an error if it encounters EOF before the buffer is filled. What do you think?

I've used libraries where the fill-this-buffer-by-reading-bytes operation would signal an error if it runs out of bytes and I used ones where it would return a partial buffer. Generally I found the erroring variant more convenient, as in the other case I'd usually just end up having to signal an error myself. (This is supposing that the regular partial read is available as a separate operation, like is the case with System.) So from a usability standpoint I'd prefer this threw an error on EOF.

Unfortunately, there is no errno value that represents the EOF condition -- the original i/o interfaces do not treat this as an error. So we'll have to define a custom Error type if we want to do this.

(We'll need to step carefully here, as I'd prefer if we did not start a slippery slope where we end up with a full blown File abstraction within System -- this library ought to simply sanitize system interfaces, not reinvent them.)

I think returning the actual number of bytes read is better than throwing. How do FIFOs, etc. work in this respect?

Upon successful completion, read(), readv(), pread(), and preadv() return
the number of bytes actually read and placed in the buffer. The system
guarantees to read the number of bytes requested if the descriptor
references a normal file that has that many bytes left before the end-of-
file, but in no other case.

RETURN VALUES
If successful, the number of bytes actually read is returned. Upon reading
end-of-file, zero is returned. Otherwise, a -1 is returned and the global
variable errno is set to indicate the error.

The current state of this PR is that it will return an Int for the number of bytes read, which will be at most buffer.count. If there were enough bytes in the file to fill the buffer then it does fill the buffer and returns buffer.count, if EOF is encountered then it will not throw, but return the number of bytes read into the buffer, which will be less than buffer.count.

How do we feel about that? LMK if you want something different.

milseman · 2022-05-16T14:31:50Z

Sources/System/FileHelpers.swift

+  ///
+  /// - Parameters:
+  ///   - buffer: The region of memory to read into.
+  /// - Returns: The number of bytes that were read, equal to `buffer.count`.


If it's always equal, then we'd return a Bool. But, if we might only partly fill for an oversized buffer, then it makes sense to return the count.

That makes sense. If we go the route of returning a Bool are we happy to accept the asymmetry it will create with the writeAll(_:) helpers, which currently return Int even though the docs for writeAll(_:) state:

/// Writes a sequence of bytes to the current offset /// and then updates the offset. /// /// - Parameter sequence: The bytes to write. /// - Returns: The number of bytes written, equal to the number of elements in `sequence`. /// /// This method either writes the entire contents of `sequence`, /// or throws an error if only part of the content was written. ... @_alwaysEmitIntoClient @discardableResult public func writeAll<S: Sequence>( ...

Beyond consistency, returning an integer has the very slight convenience benefit that I can write c += read(filling: buffer) to keep a running total of bytes read. (read(filling: buffer); c += buffer.count works but it is more error prone)

Returning a boolean instead of throwing an error on EOF is an option to consider. It's sort of consistent with the original interfaces, but it has the same convenience issue as returning a partial buffer -- callers would need to manually check for it and in the partial case they will typically end up throwing anyway.

So I'm slightly preferring having this return buffer.count.

Current state in PR returns an Int, discussed in more detail in #84 (comment).

Signed-off-by: Si Beaumont <[email protected]>

simonjbeaumont · 2022-05-19T08:50:34Z

@swift-ci please test

lorentey · 2022-05-19T17:16:07Z

Sources/System/FileHelpers.swift

@@ -79,6 +79,75 @@ extension FileDescriptor {
    }
  }

+  /// Reads bytes at the current file offset into a buffer until the buffer is filled.


I've used libraries where the fill-this-buffer-by-reading-bytes operation would signal an error if it runs out of bytes and I used ones where it would return a partial buffer. Generally I found the erroring variant more convenient, as in the other case I'd usually just end up having to signal an error myself. (This is supposing that the regular partial read is available as a separate operation, like is the case with System.) So from a usability standpoint I'd prefer this threw an error on EOF.

Unfortunately, there is no errno value that represents the EOF condition -- the original i/o interfaces do not treat this as an error. So we'll have to define a custom Error type if we want to do this.

(We'll need to step carefully here, as I'd prefer if we did not start a slippery slope where we end up with a full blown File abstraction within System -- this library ought to simply sanitize system interfaces, not reinvent them.)

lorentey · 2022-05-19T17:21:57Z

Sources/System/FileHelpers.swift

+  ///
+  /// - Parameters:
+  ///   - buffer: The region of memory to read into.
+  /// - Returns: The number of bytes that were read, equal to `buffer.count`.


Beyond consistency, returning an integer has the very slight convenience benefit that I can write c += read(filling: buffer) to keep a running total of bytes read. (read(filling: buffer); c += buffer.count works but it is more error prone)

Returning a boolean instead of throwing an error on EOF is an option to consider. It's sort of consistent with the original interfaces, but it has the same convenience issue as returning a partial buffer -- callers would need to manually check for it and in the partial case they will typically end up throwing anyway.

So I'm slightly preferring having this return buffer.count.

lorentey · 2022-05-19T17:30:23Z

Sources/System/FileOperations.swift

+    _readNoThrow(into: buffer, retryOnInterrupt: retryOnInterrupt)
+  }
+
+  @usableFromInline


Unfortunately yes, and on ABI stable platforms, read will need to dispatch to either _read or _readNoThrow, depending on the platform & version we're running on.

Given the importance of this particular entry point, I think it's worth considering marking the new non-throwing variant @_alwaysEmitIntoClient @inline(never) to prevent the availability check.

Otherwise we'll need to mess with #if SWIFT_PACKAGE to enable/disable availability checking based on whether or not we're building the ABI stable variant.

I think it's okay to land this without resolving the availability mess -- ABI stable builds aren't possible to do within this repo, and I wouldn't want to force Simon to have to deal with a tricky problem based on partial info and no tooling. It's tricky enough to get right when we have the benefit of the ABI checker and a working build system. (Unless he really really wants to do it, of course!)

lorentey · 2022-05-19T17:34:44Z

Sources/System/FileHelpers.swift

+  }
+
+  @usableFromInline
+  internal func _read(


If we want to make this function opaque, then it'll need to come with availability, which means that the read(filling:) methods will need to have it too.

@milseman Is there a good reason writeAll is implemented using an opaque function?

lorentey · 2022-05-19T17:55:16Z

I'm fine with the read(filling:) name! (This operation is the reading equivalent to writeAll, so I would have been fine with readAll(_:) as well, for symmetry.)

I don't think this package needs to provide an operation that reads the entire file into memory returning a freshly allocated, dynamically sized, owning buffer. (That would be a new operation, independent of writeAll.)

Signed-off-by: Si Beaumont <[email protected]>

milseman · 2022-05-20T17:28:18Z

I've used libraries where the fill-this-buffer-by-reading-bytes operation would signal an error if it runs out of bytes and I used ones where it would return a partial buffer. Generally I found the erroring variant more convenient, as in the other case I'd usually just end up having to signal an error myself. (This is supposing that the regular partial read is available as a separate operation, like is the case with System.) So from a usability standpoint I'd prefer this threw an error on EOF.

Does that mean the use site needs to first call stat to see how big the file is in order to not over-allocate the buffer? Also, would it back out the partial fill or does that still happen as an effect?

callers would need to manually check for it and in the partial case they will typically end up throwing anyway.

What's your use case like? Is the idea that someone gave you a hard-request for exactly some number of bytes? Would you want to write the bytes you could and throw?

@simonjbeaumont, what's the immediate intended use of this? read will get every byte requested from a normal file, but I'm not as clear on what happens with fifos or sockets.

@milseman Is there a good reason writeAll is implemented using an opaque function?

Opaque is the default, and it's not as clear to me the benefits of making it not-opaque just to inline the loop into user code. I'm not sure if it would improve non-blocking I/O, though.

lorentey · 2022-05-21T19:15:15Z

Does that mean the use site needs to first call stat to see how big the file is in order to not over-allocate the buffer? Also, would it back out the partial fill or does that still happen as an effect?

No! The use site will typically want to read exactly $n$ bytes, for example, to deserialize a four-byte integer value of some endianness. If there aren't enough bytes, then the file is truncated/malformed and the caller will want to report an error.

It's not the best use of system resources to use unbuffered I/O like this (i.e., one syscall per primitive value read or written), but given that we provide writeAll, it seems silly not to also have the equivalent operation in the opposite direction.

What's your use case like? Is the idea that someone gave you a hard-request for exactly some number of bytes? Would you want to write the bytes you could and throw?

If it's okay not to fill up the buffer, then the regular read is the right operation for the job.

lorentey · 2022-05-21T19:23:09Z

Opaque is the default, and it's not as clear to me the benefits of making it not-opaque just to inline the loop into user code. I'm not sure if it would improve non-blocking I/O, though.

I'm asking because these utility functions can be expressed on top of existing public API, and moving them behind a resilience boundary forces them to come with more recent availability on Darwin.

These would probably be great use cases for the new @_backDeploy attribute; meanwhile, @_alwaysEmitIntoClient would probably be fine for them. I don't see a need to ever touch these implementations in the future.

milseman · 2022-05-22T23:17:40Z

given that we provide writeAll, it seems silly not to also have the equivalent operation in the opposite direction.

Yes, but read goes in the opposite direction and we do not have a universal appendable untyped storage object. We could have a readAll that returns a RawArray when that's a thing and we could have one that takes something appendable as inout. writeAll doesn't have this problem because it reads from its parameter.

If it's okay not to fill up the buffer, then the regular read is the right operation for the job.

Yes, for normal files that is guaranteed. I wasn't clear on what happens if it's a fifo. Would it just block and wait for more bytes?

lorentey · 2022-06-13T22:23:56Z

given that we provide writeAll, it seems silly not to also have the equivalent operation in the opposite direction.

Yes, but read goes in the opposite direction and we do not have a universal appendable untyped storage object. We could have a readAll that returns a RawArray when that's a thing and we could have one that takes something appendable as inout. writeAll doesn't have this problem because it reads from its parameter.

The equivalent operation to writeAll isn't an operation that reads every remaining byte in a file descriptor into a dynamically sized buffer. I do not think System needs to provide such an operation.

writeAll writes exactly $n$ bytes to a file from a provided buffer, where $n$ is a value that is known at the time the call is made.

The read-side equivalent to that is to read exactly $n$ bytes from a file into a provided buffer, where $n$ is a value that is known at the time the call is made.

If it's okay not to fill up the buffer, then the regular read is the right operation for the job.

Yes, for normal files that is guaranteed. I wasn't clear on what happens if it's a fifo. Would it just block and wait for more bytes?

I don't understand this question. What is guaranteed?

The exactly-fill-this-fixed-size-buffers-with-bytes-or-throw-an-error operation would call read in a loop until either (1) the buffer has been successfully filled, or (2) the EOF condition is reached or (3) read reports an error. I expect the operation would return normally in case (1), and throw an error in cases (2) and (3).

simonjbeaumont requested a review from milseman May 5, 2022 14:27

milseman requested changes May 5, 2022

View reviewed changes

simonjbeaumont force-pushed the sb/internal-read-does-not-throw branch from a54ce83 to c7cba01 Compare May 16, 2022 08:26

simonjbeaumont changed the title ~~FileDescriptor: Remove throws declaration from _read(into:retryOnInterrupt)~~ FileHelpers: Add FileDescriptor.read(filling buffer:) May 16, 2022

milseman reviewed May 16, 2022

View reviewed changes

fixup: read(filling:) returns on EOF

1740da9

Signed-off-by: Si Beaumont <[email protected]>

simonjbeaumont requested a review from milseman May 19, 2022 08:49

Merge branch 'main' into sb/internal-read-does-not-throw

b60b0fa

milseman requested a review from lorentey May 19, 2022 14:15

lorentey reviewed May 19, 2022

View reviewed changes

fixup: Docs and order of functions

440fda2

Signed-off-by: Si Beaumont <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileHelpers: Add FileDescriptor.read(filling buffer:) #84

FileHelpers: Add FileDescriptor.read(filling buffer:) #84

simonjbeaumont commented May 5, 2022 •

edited

Loading

milseman left a comment

lorentey commented May 5, 2022

simonjbeaumont commented May 6, 2022

milseman commented May 6, 2022

simonjbeaumont commented May 6, 2022

milseman commented May 6, 2022

simonjbeaumont commented May 6, 2022

milseman commented May 6, 2022 •

edited

Loading

simonjbeaumont commented May 16, 2022

milseman left a comment

milseman May 16, 2022

lorentey May 19, 2022

simonjbeaumont May 20, 2022

milseman May 16, 2022

simonjbeaumont May 16, 2022

lorentey May 19, 2022

milseman May 19, 2022

simonjbeaumont May 20, 2022

milseman May 16, 2022

simonjbeaumont May 16, 2022

lorentey May 19, 2022

simonjbeaumont May 20, 2022

simonjbeaumont commented May 19, 2022

lorentey May 19, 2022

lorentey May 19, 2022

lorentey May 19, 2022

lorentey May 19, 2022

lorentey commented May 19, 2022

milseman commented May 20, 2022

lorentey commented May 21, 2022

lorentey commented May 21, 2022

milseman commented May 22, 2022

lorentey commented Jun 13, 2022 •

edited

Loading

FileHelpers: Add FileDescriptor.read(filling buffer:) #84

Are you sure you want to change the base?

FileHelpers: Add FileDescriptor.read(filling buffer:) #84

Conversation

simonjbeaumont commented May 5, 2022 • edited Loading

milseman left a comment

Choose a reason for hiding this comment

lorentey commented May 5, 2022

simonjbeaumont commented May 6, 2022

milseman commented May 6, 2022

simonjbeaumont commented May 6, 2022

milseman commented May 6, 2022

simonjbeaumont commented May 6, 2022

milseman commented May 6, 2022 • edited Loading

simonjbeaumont commented May 16, 2022

milseman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjbeaumont commented May 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentey commented May 19, 2022

milseman commented May 20, 2022

lorentey commented May 21, 2022

lorentey commented May 21, 2022

milseman commented May 22, 2022

lorentey commented Jun 13, 2022 • edited Loading

simonjbeaumont commented May 5, 2022 •

edited

Loading

milseman commented May 6, 2022 •

edited

Loading

lorentey commented Jun 13, 2022 •

edited

Loading