-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverted: Increase stream-cancellation-delay default to 1000 millis #590
Reverted: Increase stream-cancellation-delay default to 1000 millis #590
Conversation
Reading the Akka issue and PR seems to indicate that this will cause test failures. I can help if needed to try to fix up the test issues. I'm hoping to get Pekko HTTP 1.1.0 released but it seems useful to fix this. |
The test failures are expected because some of the tests expect the stream to be cancelled within 100ms and now that is has been increased to 1000ms those tests are now broken (as per @jrudolph 's comment, the change itself is safe). |
Ill have time to look into this tonight |
Should this actually be marked as closing the issue? For reference, we've used |
Fair point, ill change this so it won't close original issue |
fe0067b
to
3998b45
Compare
3998b45
to
5bdce88
Compare
PR is ready, I managed to fix the issues with the tests. |
@@ -6,4 +6,5 @@ pekko { | |||
default-dispatcher.throughput = 1 | |||
} | |||
stream.materializer.debug.fuzzing-mode = off | |||
stream.testkit.all-stages-stopped-timeout = 20 s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout is related to Utils.assertAllStagesStopped
which is designed to check that all stream stages finish when the test does. Since stream-cancellation-delay
creates a stage in of itself this needs to be increased.
To me it makes sense to do this globally (for tests) rather than on a test by test basis because there is no issue in having it too long as it only increases the failure test case, not the positive.
The default value of this 5 s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot remember the reason to choose 100ms for the delay cancellation limit. Having to increase the all-stages-stopped-timeout to higher values is somewhat of an indication of what kind of real world implications this change might have: cleanup of broken/closed connections might now take longer.
IIRC, the whole reason to introduce the delay was to deal with the fact that stream cancellation could not be well-configured in Akka 2.5. With 2.6, it's not 100% clear if the delay is even still needed or whether it could be solved in a better way (i.e. by resolving the cancellation race more intelligently, taking the cancellation reason into account at the right/more places) to avoid symptoms as shown in the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot remember the reason to choose 100ms for the delay cancellation limit. Having to increase the all-stages-stopped-timeout to higher values is somewhat of an indication of what kind of real world implications this change might have: cleanup of broken/closed connections might now take longer.
iirc its only in a few tests which used Utils.assertAllStagesStopped
started timing out so I can granularly update those tests. My ultimate reasoning for making this change global is was to reduce surprises for people writing future tests (i.e. giving more QoL to developers), more specifically avoiding the "I just updated/wrote a test and now its failing due to a finely tuned timeout and now I have to figure out that it was because of that said timeout"
I think that as long as you handle shutdown of pekko http gracefully I don't think it should cause issues in prod however I admit that many people may not even be handling shutdowns of pekko http correctly.
IIRC, the whole reason to introduce the delay was to deal with the fact that stream cancellation could not be well-configured in Akka 2.5. With 2.6, it's not 100% clear if the delay is even still needed or whether it could be solved in a better way (i.e. by resolving the cancellation race more intelligently, taking the cancellation reason into account at the right/more places) to avoid symptoms as shown in the ticket.
If this is the case then it should definitely be looked into, I am not aware of the updates to stream cancellation in akka 2.6 but judging by the date of the report at #422 (comment), if that stream cancellation update is now the default it didn't seem to fix the underlying issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My ultimate reasoning for making this change global is was to reduce surprises for people writing future tests (i.e. giving more QoL to developers), more specifically avoiding the "I just updated/wrote a test and now its failing due to a finely tuned timeout and now I have to figure out that it was because of that said timeout"
I agree with the goal of helping test writers but it's not good if tests only succeed or fail after a long timeout. Event the old 2 second timeout might have been somewhat of a test fixing bankruptcy issue where we didn't have the time to make sure all tests run through in a timely fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the goal of helping test writers but it's not good if tests only succeed or fail after a long timeout. Event the old 2 second timeout might have been somewhat of a test fixing bankruptcy issue where we didn't have the time to make sure all tests run through in a timely fashion.
Should I go through and undo this global timeout increase and only apply it to the specific tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a nice middle ground would be to increase the timeout in the configuration for the test classes that are affected? Not as neat as specifying it for an individual test, but more granular than applying it globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats definitely true, however its usually only one specific test per test classes that requires an override (as an example with WebSocketIntegrationSpec
its only send back 100 elements and then terminate without error even when not ordinarily closed
that even needs the override.
Basically its only tests which explicitly cancel some part of request/response as part of what is being tested and given how big test suites normally are in pekko-http it might not be enough of a difference to be useful.
@@ -306,6 +306,7 @@ class H2SpecIntegrationSpec extends PekkoFreeSpec( | |||
executable, | |||
"-k", "-t", | |||
"-p", port.toString, | |||
"-o", "9", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/summerwind/h2spec?tab=readme-ov-file#usage for reference. The default value is 2 seconds, I got to 9 with manual testing by incrememting 1 each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@pjfanning Feel free to merge when you see fit |
…)" This reverts commit 932a22f.
References: #422
Still need to update tests as per @jrudolph original comment, i.e.