Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverted: Increase stream-cancellation-delay default to 1000 millis #590

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/main/paradox/release-notes/releases-1.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ All the changes in the @ref:[1.0.x releases](releases-1.0.md) up to and includin
### Changes
* Changed names of HTTP status codes 413 and 422 ([PR87](https://github.com/apache/pekko-http/pull/87))
* Parse entire HTTP chunk size ([PR528](https://github.com/apache/pekko-http/pull/528))
* Increased default value of `pekko.http.server.stream-cancellation-delay` and `pekko.http.client.stream-cancellation-delay` from 100 to 1000 millis ([PR590](https://github.com/apache/pekko-http/pull/590))

### Additions
* Add UnsupportedContentTypeException Java DSL ([PR376](https://github.com/apache/pekko-http/pull/376))
Expand Down
4 changes: 2 additions & 2 deletions http-core/src/main/resources/reference.conf
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ pekko.http {
# In most cases, there should be no reason to change this setting.
#
# Set to 0 to disable the delay.
stream-cancellation-delay = 100 millis
stream-cancellation-delay = 1000 millis

http2 {
# The maximum number of request per connection concurrently dispatched to the request handler.
Expand Down Expand Up @@ -532,7 +532,7 @@ pekko.http {
# In most cases, there should be no reason to change this setting.
#
# Set to 0 to disable the delay.
stream-cancellation-delay = 100 millis
stream-cancellation-delay = 1000 millis
}
#client-settings

Expand Down
1 change: 1 addition & 0 deletions http-core/src/test/resources/reference.conf
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ pekko {
default-dispatcher.throughput = 1
}
stream.materializer.debug.fuzzing-mode = off
stream.testkit.all-stages-stopped-timeout = 20 s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout is related to Utils.assertAllStagesStopped which is designed to check that all stream stages finish when the test does. Since stream-cancellation-delay creates a stage in of itself this needs to be increased.

To me it makes sense to do this globally (for tests) rather than on a test by test basis because there is no issue in having it too long as it only increases the failure test case, not the positive.

The default value of this 5 s

Copy link
Contributor

@jrudolph jrudolph Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot remember the reason to choose 100ms for the delay cancellation limit. Having to increase the all-stages-stopped-timeout to higher values is somewhat of an indication of what kind of real world implications this change might have: cleanup of broken/closed connections might now take longer.

IIRC, the whole reason to introduce the delay was to deal with the fact that stream cancellation could not be well-configured in Akka 2.5. With 2.6, it's not 100% clear if the delay is even still needed or whether it could be solved in a better way (i.e. by resolving the cancellation race more intelligently, taking the cancellation reason into account at the right/more places) to avoid symptoms as shown in the ticket.

Copy link
Contributor Author

@mdedetrich mdedetrich Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot remember the reason to choose 100ms for the delay cancellation limit. Having to increase the all-stages-stopped-timeout to higher values is somewhat of an indication of what kind of real world implications this change might have: cleanup of broken/closed connections might now take longer.

iirc its only in a few tests which used Utils.assertAllStagesStopped started timing out so I can granularly update those tests. My ultimate reasoning for making this change global is was to reduce surprises for people writing future tests (i.e. giving more QoL to developers), more specifically avoiding the "I just updated/wrote a test and now its failing due to a finely tuned timeout and now I have to figure out that it was because of that said timeout"

I think that as long as you handle shutdown of pekko http gracefully I don't think it should cause issues in prod however I admit that many people may not even be handling shutdowns of pekko http correctly.

IIRC, the whole reason to introduce the delay was to deal with the fact that stream cancellation could not be well-configured in Akka 2.5. With 2.6, it's not 100% clear if the delay is even still needed or whether it could be solved in a better way (i.e. by resolving the cancellation race more intelligently, taking the cancellation reason into account at the right/more places) to avoid symptoms as shown in the ticket.

If this is the case then it should definitely be looked into, I am not aware of the updates to stream cancellation in akka 2.6 but judging by the date of the report at #422 (comment), if that stream cancellation update is now the default it didn't seem to fix the underlying issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ultimate reasoning for making this change global is was to reduce surprises for people writing future tests (i.e. giving more QoL to developers), more specifically avoiding the "I just updated/wrote a test and now its failing due to a finely tuned timeout and now I have to figure out that it was because of that said timeout"

I agree with the goal of helping test writers but it's not good if tests only succeed or fail after a long timeout. Event the old 2 second timeout might have been somewhat of a test fixing bankruptcy issue where we didn't have the time to make sure all tests run through in a timely fashion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the goal of helping test writers but it's not good if tests only succeed or fail after a long timeout. Event the old 2 second timeout might have been somewhat of a test fixing bankruptcy issue where we didn't have the time to make sure all tests run through in a timely fashion.

Should I go through and undo this global timeout increase and only apply it to the specific tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a nice middle ground would be to increase the timeout in the configuration for the test classes that are affected? Not as neat as specifying it for an individual test, but more granular than applying it globally.

Copy link
Contributor Author

@mdedetrich mdedetrich Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats definitely true, however its usually only one specific test per test classes that requires an override (as an example with WebSocketIntegrationSpec its only send back 100 elements and then terminate without error even when not ordinarily closed that even needs the override.

Basically its only tests which explicitly cancel some part of request/response as part of what is being tested and given how big test suites normally are in pekko-http it might not be enough of a difference to be useful.

}
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,7 @@ class H2SpecIntegrationSpec extends PekkoFreeSpec(
executable,
"-k", "-t",
"-p", port.toString,
"-o", "9",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/summerwind/h2spec?tab=readme-ov-file#usage for reference. The default value is 2 seconds, I got to 9 with manual testing by incrememting 1 each time.

"-j", junitOutput.getPath) ++
specSectionNumber.toList.map(number => s"http2/$number")

Expand Down
Loading