Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Loom] Interrupt behaviour differs from the system implementation #158

Closed
cenodis opened this issue Apr 19, 2024 · 13 comments
Closed

[Loom] Interrupt behaviour differs from the system implementation #158

cenodis opened this issue Apr 19, 2024 · 13 comments
Assignees
Labels
compatibility issue An issue with a particular runtime environment merged A fix has been merged to main verify The issue is considered fixed/done, and reassigned to the originator to verify.
Milestone

Comments

@cenodis
Copy link

cenodis commented Apr 19, 2024

Split off from #157

The current 2.10.0 snapshot has added support for virtual thread I/O. As part of this change the Sockets and SocketChannels have gained the ability to be interrupted to support task cancellation.

While junixsocket does properly respond to an interrupt by throwing an exception, the type of the exception thrown as well as the state of the socket and thread after an interruption do not match the default implementation in the JDK.

Short overview

This table shows the state of the program immediately after the virtual I/O thread has received an interrupt and returned to the calling code:

Java Junixsocket
Socket SocketChannel Socket SocketChannel
Exception thrown SocketException ClosedByInterruptException InterruptedIOException InterruptedIOException
Thread interrupt flag set set unset unset
Socket closed? closed closed open open

Details

Socket

While the current Socket implementation is valid according to the specification I think there is value in matching the behaviour of the default implementation as close as possible to minimize potential incompatibilities. This is especially relevant when using junixsocket to "fake" a normal IP socket and passing it into a library that normally does not support domain sockets.

SocketChannel

java.nio.channels.SocketChannel on the other hand does document that it must throw a ClosedByInterrupt exception, interrupt the current thread and close the channel if it is interrupted during an I/O operation. So the current implementation is not strictly compliant with the specification.

Open Socket and InterruptedIOException

The exception in combination with the still open socket may also present a rather unique issue.

The problem with leaving the socket open after an interrupt is that it's not obvious if the I/O operation has completed fully, failed or only partially completed. This leaves the socket in an inconsistent state in which the caller does not necessarily know which bytes should be processed next to ensure a well formed message is created.
If it is a read operation then certain bytes may also have been read from the underlying socket but not arrived at the caller due to the exception, requiring the socket to perform otherwise unnecessary buffering to ensure no information is lost.

It is theoretically possible to pass the necessary information to the caller via InterruptedIOException and its bytesTransferred field which according to its documentation:

Reports how many bytes had been transferred as part of the I/ O operation before it was interrupted.

I would like to advocate against using this exception however.

Currently junixsocket does not set this field at all, meaning it is always 0. I am not sure this is correct as to my understanding it's possible for the async runtime to submit the operation to the OS before the interrupt exception is thrown.

Throwing InterruptedIOException along with leaving the socket open could result in code attempting to handle this by rewinding its buffer and playing the "missing" bytes back. But if the number of missing bytes is incorrect this will instead corrupt the socket stream.

This entire "recovery" process feels very brittle and easy to misuse even if the byte count is correct. It may be even further complicated if there are multiple layers of buffering which would each need to rewind. Closing the socket implicitly on an interrupt avoids this scenario altogether since no replay can be attempted.

Considering how easy it is to implement custom interrupt behaviour with virtual threads and the fact that the JDK has also chosen this approach, I do not see much benefit to support these kind of half-writes.

kohlschuetter added a commit that referenced this issue Apr 20, 2024
... from VirtualThreads, to be extra-compatible with the standard
implementation.

Add test cases to demonstrate the feature.

#158
@kohlschuetter kohlschuetter added verify The issue is considered fixed/done, and reassigned to the originator to verify. merged A fix has been merged to main compatibility issue An issue with a particular runtime environment labels Apr 20, 2024
@kohlschuetter kohlschuetter added this to the 2.10.0 milestone Apr 20, 2024
@kohlschuetter
Copy link
Member

Thanks for reporting, @cenodis!

Please verify against the current snapshot (changes referenced above).

If there are still discrepancies between junixsocket and vanilla Java, please provide some code/unit test to demonstrate, this will be a tremendous help.

@cenodis
Copy link
Author

cenodis commented Apr 21, 2024

The current implementation still throws unusual exceptions and keeps the socket open under some conditions.

I have taken a stab at writing some unit tests for interrupt behaviour. The tests and the results on my linux machine can be found here: InterruptTest.java. They are not very pretty but should cover the interrupt behaviour for all blocking methods (accept, connect, read, write) on the standard Unix socket.

Assuming the test code is free of bugs, here are some observations I have made based on the results:

  • Once a connection is established AFSocketChannel always reports itself as open. Even if a ClosedByInterruptException was thrown.
    Looking a bit into the code, AFSocketChannel (and its subclasses) do not appear to make use of the begin() and end(boolean) methods defined in AbstractInterruptibleChannel, breaking its API contract:

    A concrete channel class must invoke the begin and end methods before and after, respectively, invoking an I/ O operation that might block indefinitely.

    These methods appear to be responsible for setting the private closed field and translating the interrupt or close to the appropriate exception.

  • The results of some test seem to depend on when the interrupt happens relative to the start of the blocking operation. Some fail when the interrupt happens immediately but succeed if the interrupt happens some time after the operation has already started.
    I suspect this might be caused by some methods performing an early check of the interrupt flag to avoid performing unnecessary, expensive operations. Those checks then fail to properly update the state of the socket resulting in these discrepancies.
    To try and cover these variations every test is excuted twice: Once with an immediate interrupt and once with an interrupt after a delay of 500ms.

The tests currently only cover the basic Unix domain socket. I unfortunately lack the means to test more OS specific socket types. But it should be fairly straightforward to extend the test with those if desired.

@cenodis

This comment was marked as outdated.

kohlschuetter added a commit that referenced this issue Jun 28, 2024
Make the behavior of interrupted junixsocket sockets closer to the
vanilla Java implementations.

#158
@kohlschuetter
Copy link
Member

kohlschuetter commented Jun 28, 2024

@cenodis Thanks for providing these additional details!
I've added some more changes to make your tests pass, apart from the aforementioned interrupt flag state — and ClosedChannelException vs. ClosedByInterruptException: ClosedByInterruptException actually is a subclass of ClosedChannelException.

Can you please try the latest code (either from main or SNAPSHOT builds, 2.10.0-20240628.170903-8 or newer) and report back? Cheers!

PS: I'm happy to add your test class you referenced above, if you're willing to contribute the code under the Apache 2.0 license to this project.

@cenodis
Copy link
Author

cenodis commented Jun 29, 2024

I have run the exact same tests with the newest snapshot version (c0853fe according to git.properties) and still get some failures:
https://gist.github.com/cenodis/a7bb9144aee2ddc4e660649557c57993

Test [3] seems to fail or succeed somewhat randomly while [5] and [6] fail consistently on my machine. For completeness I have included a report from one of the runs where [3] has also failed.

The tests are run on an Ubuntu machine with Linux 6.3.6 if that helps.

ClosedChannelException vs. ClosedByInterruptException: ClosedByInterruptException actually is a subclass of ClosedChannelException.

Not completely sure what you are trying to tell me with that. Why is the inheritance chain of ClosedByInterruptException important? The SocketChannel should throw either a ClosedByInterruptException or an AsynchronousCloseException for interrupts and closes respectively.

I'm happy to add your test class you referenced above, if you're willing to contribute the code under the Apache 2.0 license to this project.

Sure. The code is so basic I have my doubts whether its even licensable. But if it makes you more comfortable I give my blessing to use this code with whatever license this project uses.

I have also taken a look at the changes you have commited so far and there is something that stands out to me. You have a few blocks that have this shape:

begin();
try {
  // blocking op
} catch (/*exceptions*/) {
  if (Thread.currentThread().isInterrupted()) {
    throw new ClosedByInterruptException();
  }
  // more exception handling
} finally {
  end(complete);
}

This interrupt check strikes me as redundant. If an interrupt occurs after (or during) begin() then AbstractInterruptibleChannel::end(boolean) is responsible for detecting it and then throwing the appropriate ClosedByInterruptException. Checking the threads interrupt status manually is therefore unnecessary.

And another thing:

try {
  end(complete);
} catch (ClosedByInterruptException e) {
  throw closeAndThrow(e);
}

end(boolean) handles both interrupts and async close events. It can throw both ClosedByInterruptException and AsynchronousCloseException. Based on my understanding of your code it should catch both to ensure that closeAndThrow is always called.
Though I do wonder why this block is even necessary to begin with. Both interrupts and async closes always call AbstractInterruptibleChannel::implCloseChannel (which in turn calls implCloseSelectableChannel()) so the channel should already be fully closed at this point?

kohlschuetter added a commit that referenced this issue Jun 30, 2024
Make the begin/end/handle-interrupt logic for Channels a little bit more
concise.

#158
kohlschuetter added a commit that referenced this issue Jun 30, 2024
... and add license header/Javadoc.

#158
kohlschuetter added a commit that referenced this issue Jun 30, 2024
Make test compile/run on Java 8 and newer.

Change assumptions around exceptions and interrupt state:
- Change expectation of ClosedByInterruptException to
  AsynchronousCloseException (its superclass).
- Only expect Thread#interrupted on actual ClosedByInterruptException.
- Use temporary AFUNIXSocketAddress instead of hardcoded one to prevent
  flaky results.

#158
kohlschuetter added a commit that referenced this issue Jun 30, 2024
When we already get a ClosedChannelException, do not let
AbstractInterruptibleChannel#end rethrow another one.

#158
kohlschuetter added a commit that referenced this issue Jun 30, 2024
The parametrized method variants are not useful as-is for the selftest.
Let's simplify the naming a little bit, and reformat the source code so
we can find the corresponding test variant more easily.

#158
@kohlschuetter
Copy link
Member

Please try again with the latest snapshot (2.10.0-20240630.191437-9 corresponding to commit 7025a04).

I've reworked both the exception handling code as well as your unit test, which is now also included in the selftest. Thanks again, for allowing me to include the test, @cenodis!

Not completely sure what you are trying to tell me with that. Why is the inheritance chain of ClosedByInterruptException important? The SocketChannel should throw either a ClosedByInterruptException or an AsynchronousCloseException for interrupts and closes respectively.

Previously, the test code expected ClosedByInterruptException, whereas now any subclass of ClosedChannelException is valid. However, if ClosedByInterruptException is thrown, the test will check if the Thread#interrupted flag is set.

The Java API specs around SocketChannel/ServerSocketChannel/DatagramChannel permit any kind of IOException to be thrown, not just ClosedByInterruptException and AsynchronousCloseException, particularly ClosedChannelException, which is the common parent class. That one should be thrown if the channel is already closed upon calling, for example.

I think the main change, compared to older versions is that now we properly throw ClosedChannelException (an IOException subclass) instead of a junixsocket-specific SocketClosedException (a SocketException subclass), and that now in all cases the socket should be properly closed.

If there are still discrepancies in what exceptions are thrown compared to the JVM implementations, I would argue that these are an implementation detail that should be ignored. Better follow the specs as to what may happen that what currently does...

Though I do wonder why this block is even necessary to begin with. Both interrupts and async closes always call AbstractInterruptibleChannel::implCloseChannel (which in turn calls implCloseSelectableChannel()) so the channel should already be fully closed at this point?

Unfortunately, "closed" may mean multiple things , so we need to make sure that all resources are properly closed by calling our own close method when required. This currently may or may not be required, but it doesn't hurt since this code is only called when an exception is thrown. I hope that helps.

@cenodis
Copy link
Author

cenodis commented Jun 30, 2024

Previously, the test code expected ClosedByInterruptException

Yes, and in my opinion this is the behaviour required by the specification. InterruptIssue158Test always interrupts the blocking thread so a ClosedByInterruptException is always expected. Replacing it with a generic ClosedChannelException means code written like this breaks:

try {
  channel.blockingOp();
} catch (ClosedByInterruptException e) {
  // ...
} catch (ClosedChannelException e) {
  // ...
}

To quote InterruptibleChannel:

A channel that implements this interface is asynchronously closeable: If a thread is blocked in an I/O operation on an interruptible channel then another thread may invoke the channel's close method. This will cause the blocked thread to receive an AsynchronousCloseException.

A channel that implements this interface is also interruptible: If a thread is blocked in an I/O operation on an interruptible channel then another thread may invoke the blocked thread's interrupt method. This will cause the channel to be closed, the blocked thread to receive a ClosedByInterruptException, and the blocked thread's interrupt status to be set.

If a thread's interrupt status is already set and it invokes a blocking I/O operation upon a channel then the channel will be closed and the thread will immediately receive a ClosedByInterruptException; its interrupt status will remain set.

To me this reads that if the the blocking operation is interrupted then the method is required to throw a ClosedByInterruptException or subclass thereof, not just a generic ClosedChannelException.

Additionally the blocking methods on SocketChannel all explicitly document the interrupt and async close exceptions. If it was permitted to throw a generic superclass then these cases would be documented as "optional specific" exceptions (see java.nio.file.Files::delete). The fact that this isn't the case means it is required to throw the most specific exception.

Example from SocketChannel::connect(SocketAddress):

  1. ClosedChannelException – If this channel is closed
  2. AsynchronousCloseException – If another thread closes this channel while the connect operation is in progress
  3. ClosedByInterruptException – If another thread interrupts the current thread while the connect operation is in progress, thereby closing the channel and setting the current thread's interrupt status

2 and 3 should already be handled by AbstractInterruptibleChannel assuming both begin and end are properly used for all blocking operations and implCloseChannel wakes up the thread currently blocking on the channel.
If complete does not throw a ClosedByInterruptException for the scenarios in InterruptIssue158Test then there is a bug in the test or the channel (or both).

The Java API specs around SocketChannel/ServerSocketChannel/DatagramChannel permit any kind of IOException to be thrown

Yes, but only if you look exclusively at the function signature. And by that logic it could also throw HttpRetryException. As stated above the javadoc specs do list the individual exceptions that can be thrown by a method and it should always throw the most specific exception unless the exception is documented as an "optional specific exception".

Better follow the specs as to what may happen that what currently does

Agreed. I try to avoid including any implementation specific behaviour. See above for my understanding of the specification. And feel free to call out anything you see as implementation specific.

Unfortunately, "closed" may mean multiple things, [...]. This currently may or may not be required, but it doesn't hurt

Thats fine. I just saw a piece of code that looked strange to me and wanted to point it out. Not to demand any changes.

kohlschuetter added a commit that referenced this issue Jul 1, 2024
Previously, we were throwing SocketClosedException in cases where
BrokenPipeSocketException was more appropriate.

This enables us to better differentiate between
AsynchronousCloseException (broken pipe, etc.) and any other
ClosedChannelException.

Adjust the corresponding unit tests, which now always either throw
AsynchronousCloseException or ClosedByInterruptException.

#158
kohlschuetter added a commit that referenced this issue Jul 1, 2024
Under some circumstances, we would erroneously throw a
ClosedChannelException.

Rework the code to not throw an exception unless necessary, and throw
the proper exception based on the error code (ClosedChannelExceptions
are now handled separately in Java code).

#158
kohlschuetter added a commit that referenced this issue Jul 1, 2024
kohlschuetter added a commit that referenced this issue Jul 1, 2024
Previously, the test may fail sporadically due to a race condition
between receiving a "thread interrupted" state on a live connection and
one that has been closed already (resulting in an
AsynchronousCloseException instead of a ClosedByInterruptException).

Make sure that the client connection is closed from the servce side only
after a delay that is significantly longer than any other delay in this
test class. Run that close() in a separate Thread to avoid unnecessarily
long delays.

Finally, set all expected exceptions back to ClosedByInterruptException,
as intended by the original author (cenodis).

#158
kohlschuetter added a commit that referenced this issue Jul 1, 2024
Refactor the unit tests for issue 158 such that we can compare the
behavior of junixsocket AFUNIXSocketChannel etc. and the Java 16+ JEP380
as well as regular Java inet versions.

Add both tests to the selftest, but disable the JEP380/Inet versions by
default (enable with
-Dselftest.enable-module.junixsocket-common.JEP380=true
-Dselftest.enable-module.junixsocket-common.JavaInet=true
)

Add concise exception logging (enable with
-Dselftest.issue.158.debug=true
).

Lastly, move some JEP380-specific logic into its own class.

#158
kohlschuetter added a commit that referenced this issue Jul 1, 2024
... and log the stacktrace for unexpected exceptions

#158
@kohlschuetter
Copy link
Member

kohlschuetter commented Jul 1, 2024

OK, so I think I have it now. There are a few things I was able to change, and now we get the expected ClosedByInterruptException in all cases where we expect them.

I had to modify your test case again in one particular regard though: Because of a race condition, the interrupt may have occurred after closing the socket from the server-side (there was no delay). In that case, the latest iteration of my changes correctly returned AsynchronousCloseException.

Moreover, I reworked the tests so we can now verify the exact behavior of JEP380 (and Java Inet) sockets on the same code (see the three subclasses in commit 0bed969)

Surprisingly, without a delay of closing the socket (run with -Dselftest.issue.158.delay-close=false), the JEP380 implementation will not throw any exception (which is permitted by the test): the client somehow "successfully" sends the single byte to the server, even though it clearly does not read the byte on the server-side (!)

On the other end, adding a significant delay will cause the tests to work as expected in all cases.

Regarding your concern about specific exceptions, I can only again stress that checking for specific exceptions is not recommended. Using exception handling for flow control is extremely brittle (I've run into it myself while fixing this bug; see commit 66fb640).

By running the test code against the Java SDK JEP380 code, I was able to occasionally trigger a case where testClientInterruptionWithDelay variant 6 throws an unexpected exception that is just a plain IOException, not even an AsynchronousCloseException:

java.io.IOException: Broken pipe
	at java.base/sun.nio.ch.SocketDispatcher.write0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:137)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:102)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:58)
	at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:542)
	at org.newsclub.net.unix/org.newsclub.net.unix.InterruptIssue158Test.lambda$16(InterruptIssue158Test.java:115)
	at org.newsclub.net.unix/org.newsclub.net.unix.InterruptIssue158Test.runOperation(InterruptIssue158Test.java:257)
	at org.newsclub.net.unix/org.newsclub.net.unix.InterruptIssue158Test.lambda$26(InterruptIssue158Test.java:168)
	at java.base/java.lang.VirtualThread.run(VirtualThread.java:309)

So it's fair to say that your assumptions around certain exception types don't hold true even for the Java SDK JEP380 implementation. This actually goes against their own javadocs, see SocketChannel.java, lines 642-647:

    /**
     * @throws  NotYetConnectedException
     *          If this channel is not yet connected
     * @throws  ClosedChannelException      {@inheritDoc}
     * @throws  AsynchronousCloseException  {@inheritDoc}
     * @throws  ClosedByInterruptException  {@inheritDoc}
     */
    public abstract long write(ByteBuffer[] srcs, int offset, int length)
        throws IOException;

I will file a bug to Oracle shortly afterwards.
(I do not see this behavior when using standard Java AF_INET sockets, for what it's worth).

Notably, I agree that in this case it should have been an AsynchronousCloseException (or, if the interrupt was received fast enough, ClosedByInterruptException). Therefore, I'm not going to change my code back to mimic the Java SDK :-)

@kohlschuetter
Copy link
Member

Please verify with code from the latest commit 9f5377f.

You can also run the latest selftest jar as follows:

java \
    -Dselftest.issue.158.debug=true \
    -Dselftest.enable-module.junixsocket-common.JEP380=true \
    -Dselftest.enable-module.junixsocket-common.JavaInet=true \
    -jar junixsocket-selftest-2.10.0-20240701.224420-13-jar-with-dependencies.jar

Try inserting

    -Dselftest.issue.158.delay-close=false \

after the first line to disable the delay, which should then occassionally trigger the bug in JEP380 code (verified on a MacBook Pro M1 Max). It will also show that junixsocket will throw an AsynchronousCloseException (which is expected, as described above; the test will fail regardless)

@kohlschuetter
Copy link
Member

kohlschuetter commented Jul 2, 2024

Java bug reported to Oracle, JDK-8335600

@cenodis
Copy link
Author

cenodis commented Jul 4, 2024

Tested on 262827d and now all relevant methods throw a proper ClosedByInterruptException on interrupts, report the socket as closed and have the thread set as interrupted. Seems good to me.

Thank you for putting up with this nitpicky issue and updating everything to be in line with the spec. Even the optional parts in Socket. I imagine chasing specific error classes is annoying to deal with. Don't think me hammering on specific wording in the docs made it any better.

Surprisingly, without a delay of closing the socket [...] the client somehow "successfully" sends the single byte to the server, even though it clearly does not read the byte on the server-side

That does seem very strange. I could imagine a world where the OS reads the byte into cache even if the consumer application is not ready yet. But that doesn't fit with the even greater delay resulting in a "not-send" state. Can't say I have an idea on what could cause this.

I had to modify your test case

I see where I made the mistake now. Sorry about that. I reviewed it twice and have no idea how that slipped past me.

checking for specific exceptions is not recommended. Using exception handling for flow control is extremely brittle

I actually agree with this sentiment. On the other hand sometimes exception types are the only way to meaningfully distinguish between different kinds of errors.
I don't think exceptions vs explicit checks is a productive discussion, at least not in this context. Java has decided to use exceptions to model error scenarios, so its for the best that the ecosystem remains as compatible as possible when interacting with parts of the spec.

This actually goes against their own javadocs

Agreed. This does seem like a bug in the SDK.

@kohlschuetter
Copy link
Member

@cenodis I'm glad we got this resolved. Investigating this bug was quite fruitful, as we found a couple more issues in junixsocket, plus a JDK bug :)

@kohlschuetter
Copy link
Member

junixsocket 2.10.0 has been released. Please verify and re-open if necessary. Thanks again for reporting , @cenodis !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility issue An issue with a particular runtime environment merged A fix has been merged to main verify The issue is considered fixed/done, and reassigned to the originator to verify.
Projects
None yet
Development

No branches or pull requests

2 participants