Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements to queue(s) management in Webserver #2704

Merged
merged 8 commits into from
Feb 4, 2021

Conversation

spericas
Copy link
Member

@spericas spericas commented Jan 29, 2021

The Webserver keeps track of a queue of queues to release buffers back to Netty. Every new request that comes in creates a queue to track the Netty buffers. The cost of removing one of these queues when no longer needed is O(N) where N is the number of active connections. When N is large (e.g. 16K) this housekeeping operation can be costly.

This PR uses a different approach that avoids the O(N) removal operation. It uses phantom references and lets the GC gather the queues when the associated publisher becomes ready for collection. A clearQueues method is called periodically to clean up the queue of queues. Thanks to @olotenko.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>
Signed-off-by: Santiago Pericasgeertsen <[email protected]>
Signed-off-by: Santiago Pericasgeertsen <[email protected]>
…e connections. Some copyright fixes.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>
@spericas spericas added the enhancement New feature or request label Jan 29, 2021
@spericas spericas self-assigned this Jan 29, 2021
Signed-off-by: Santiago Pericasgeertsen <[email protected]>
@spericas spericas requested a review from ljnelson January 29, 2021 18:54

IndirectReference(T referent, ReferenceQueue<? super T> q, R otherRef) {
super(referent, q);
this.otherRef.lazySet(otherRef);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lazySet?

Copy link

@olotenko olotenko Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it is a normal store (no expensive full barrier)

(think mov mm, r vs lock: mov mm, r)

in this case it is sufficient, because IndirectReference is published safely (becomes reachable from other threads only as a result of some operation that has full barriers).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure that the atomicity of the reference is even required here.

All accesses are meant to be single-threaded in the intended use. It is atomic only for the purpose of potential other uses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not needed, but seems like the safer option.

olotenko
olotenko previously approved these changes Feb 1, 2021
Copy link

@olotenko olotenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand all apart from the removal of ChannelHandlerContext ctx from HttpRequestScopedPublisher.

Also, I would do the handling of failPublisher for 4xx errors as a separate commit, but separating that is not essential.

As far as the reference queue handling is concerned, it is good to ship.

@spericas
Copy link
Member Author

spericas commented Feb 1, 2021

I understand all apart from the removal of ChannelHandlerContext ctx from HttpRequestScopedPublisher.

That context is no longer used in the publisher.

Also, I would do the handling of failPublisher for 4xx errors as a separate commit, but separating that is not essential.

As far as the reference queue handling is concerned, it is good to ship.

Cool, thx.

@spericas spericas changed the title WIP: Performance improvements to queue(s) management in Webserver Performance improvements to queue(s) management in Webserver Feb 1, 2021
@barchetta barchetta added this to the 2.2.1 milestone Feb 2, 2021
* this collection that cannot be fully released (some buffers still in
* use) will be added to {@code unreleasedQueues} for later retries.
*/
private final ReferenceQueue<Object> queues = new ReferenceQueue<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being careful: is Object the type argument you want to pass here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. That type-parameter is silly. (Reads: I don't get it) It is meant to represent the type of element returned by References enqueued in ReferenceQueue. But by the time a reference is enqueued the element is no longer reachable, as determined by GC. So...

What would make more sense as a type parameter for the ReferenceQueue, is the class of Reference that is going to be returned by poll.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't know about silly. Would ? super Reference<?> or something like that work better here?

Copy link

@olotenko olotenko Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/ReferenceQueue.html#poll() - ReferenceQueue<T>.poll() returns Reference<? extends T>. I am not interested in what T is, because that value is no longer accessible. I am only interested in the type of Reference. If it is something that has a method to release whatever resource it is associated with (ie instanceof IndirectReference in this case), I am going to call it - end of story.

I'd rather have a guarantee that I will not encounter other type of Reference here, so that I didn't have to type-cast. :)

I'd prefer:

public class ReferenceQueue<T extends Reference> {
...
   public T poll() {...}
}

Then I'd be able to declare ReferenceQueue<IndirectReference<?, ReferenceQueue<IndirectReference<?, DataChunk>>>> queues and get IndirectReference<ReferenceQueue<IndirectReference<?, DataChunk>>> r = queues.poll() and ReferenceQueue<IndirectReference<?, DataChunk>> rq = r.acquire(), and then IndirectReference<?, DataChunk> rr = rq.poll() and even DataChunk dc = rr.acquire() and dc.release() (gosh, this is actually what this is for).

Copy link

@olotenko olotenko Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of it this way: IndirectReference is about two things: some owner of type T that actually does not matter, and a resource R that is temporarily handed off to the owner of type T. The owner has the obligation to always terminate in a predictable path where it hands off the resource R back to some resource pool.

IndirectReference then is a safety net catching the cases when the owner T fails to fulfil the obligation (Entscheidungsproblem and a Turing-complete language), and dies before returning resource R. In this case IndirectReference ends up in the ReferenceQueue, and the queue processing routine is able to return resource R back to the pool. Note only the owner of type T is referenced phantomly. The resource R remains strongly reachable, at least through IndirectReference - unless, of course, the owner does fulfil the obligation, and returns the resource, first reclaiming it through IndirectReference.acquire().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having explained all this, I think more work is needed.

The focus has been to reduce the cost of maintaining resources for well-terminating responses. But need to consider what happens to strong references when the channel is closed (channelInactive fired) and there are unfinished responses.

}
unreleasedQueues.removeIf(ReferenceHoldingQueue::release);
} finally {
clearLock.lazySet(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being careful: lazySet is designed for very specialized use cases. Are you sure set isn't the right choice here?

Copy link

@olotenko olotenko Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is one case where lazySet makes sense.

We only have a lock that can either be taken immediately, or the contender goes away:
https://github.com/oracle/helidon/pull/2704/files/06c459f90972ba6a14a500bfaea7cd4b91beb3cb#diff-a0cac19f24ae85fbc99afd8c0b4e8375d3c036a0ad949c68befe56fb7346461dR100 - that yellow line

In this case we only need to ensure the get() and compareAndSet on that line synchronize-with this lazySet, which they do.

We don't need to "publish" any other changes, as queue poll and removeIf are thread-safe in their own right.

In this case you can even go as weak as VarHandle.setOpaque.

@spericas spericas merged commit 0c6749d into helidon-io:master Feb 4, 2021
spericas added a commit that referenced this pull request Feb 11, 2021
* Upgrade Netty to 4.1.58 (#2678)

Signed-off-by: Tomas Langer <[email protected]>

* Added overall timeout to evictable cache (#2659)

Signed-off-by: Tomas Langer <[email protected]>

* Fix copyright year for commits broken by squashing. (#2687)

Signed-off-by: Tomas Langer <[email protected]>

* Concat array enhancement (#2508)

* Concat array enhancement

Signed-off-by: Daniel Kec <[email protected]>

* Update Jackson to 2.12.1 (#2690)

* Update Jackson to 2.12.1
* Upgrade to latest Junit5 to get fix for junit-team/junit5#2198
* Manage junit4 version

* PokemonService template fixed in SE Database Archetype. (#2701)

Signed-off-by: Tomas Kraus <[email protected]>

* Fixed different output in DbClient SE archetype (#2703)

Signed-off-by: Tomas Kraus <[email protected]>

* Fix TODO application: (#2708)

- WebSecurity needs to be passed config.get("security") to take the "security.web-server" configuration
 - Added outbound configuration for the google login
 - Upgraded cassandra driver to fix issues with old guava dependencies
 - Removed metrics to avoid issues with cassandra driver.

Fixes #2707

* Update k8s descriptors to avoid using deprecated APIs. (#2719)

* Separate execution of DataChunkReleaseTest in its own VM to prevent leak messages in other test's logs. (#2716)

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Changes in this commit: (#2727)

1. Upgrade to Jersey 2.33
2. Configuration via system properties for the Jersey Client API. Any response in an exception will be mapped to an empty one to prevent data leaks. See eclipse-ee4j/jersey#4641.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Properly release underlying buffer before passing it to WebSocket handler (#2715)

* Properly release underlying buffer before passing it to handler.

* Releases data chunks after passing them to Tyrus without any copying. Reports an error and closes connection if Tyrus is unable to handle the data. Finally, fixed a problem related to subscription requests.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Removed unused logger.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed checkstyle.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fix issue with null value in JSON. (#2723)

Signed-off-by: Tomas Langer <[email protected]>

* Upgrade grpc to v1.35.0 (#2713)

* Upgrade grpc to v1.35.0

* Update copyright

* Upgrades OCI SDK to version 1.31.0 (#2699)

* Updated OCI to 1.31.0

Signed-off-by: Laird Nelson <[email protected]>

* Fix null array values in HOCON/JSON config parser. (#2731)

Resolves #2720 (follow-up)

* Performance improvements to queue(s) management in Webserver (#2704)

* Initial patch.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed some type params and improved comments.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* More cleanup and make sure to fail publisher on an error condition.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Suppress warnings.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Call clearQueues on every new request for proper cleanup of keep-alive connections. Some copyright fixes.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed checkstyle issues.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Force logging of LEAK error even if finalize does not get called on a DataChunk.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Upgrade Weld (#2668)

Signed-off-by: Tomas Langer <[email protected]>

* Rest client async header propagation with usage of Helidon Context (#2735)

Rest client header propagation with usage of Helidon Context

Signed-off-by: David Kral <[email protected]>

* Allow override of Jersey property via config (#2737)

* Allow the default value of property jersey.config.client.ignoreExceptionResponse to be overridden via config. New test.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed copyright year.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* New implementation of LazyValue (#2738)

* New implementation of LazyValue that lazily initializes a Semaphore instead of eagerly creating a ReentrantLock. Makes use of volatile guarantees and atomicity of VarHandle updates.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* New test for LazyValueImpl.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Reduced sleep time in test.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Update CHANGELOG for 2.2.1 release (#2743)

* 2.2.1 THIRD_PARTY_LICENSES update (#2746)

* Update THIRD_PARTY_LICENSES

* Support async invocations using optional synthetic SimplyTimed behavior (#2745)

* Add support for async invocations for optional inferred SimplyTimed behavior on JAX-RS endpoints

Signed-off-by: [email protected] <[email protected]>

* Do not attempt to access the request context in Fallback callback. If used together with Retry, it is possible for the fallback to be called in a fresh thread for which there is no current request scope. Instead just use the original value obtained in this class' constructor. Updated functional test (with some class renaming) to cover this use case. (#2748)

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fix for native image. (#2753)

Signed-off-by: Tomas Langer <[email protected]>

* Fixed checkstyle issues.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

Co-authored-by: Tomas Langer <[email protected]>
Co-authored-by: Daniel Kec <[email protected]>
Co-authored-by: Joe DiPol <[email protected]>
Co-authored-by: Tomáš Kraus <[email protected]>
Co-authored-by: Romain Grecourt <[email protected]>
Co-authored-by: Jonathan Knight <[email protected]>
Co-authored-by: Laird Nelson <[email protected]>
Co-authored-by: David Král <[email protected]>
Co-authored-by: Tim Quinn <[email protected]>
spericas added a commit that referenced this pull request Nov 22, 2021
* Fault Tolerance 3.0 Support (#2680)

* Initial changes to implement new metrics layer. Moving from complex names to simpler names and tags.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* More metric updates.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Migration of most unit tests to new metrics.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Completed migration of metrics test.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* New exception to discern timeouts during retries.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Implementation of retry metrics.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Cleanup metrics between tests.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Several changes related to execution of FT 3.0 TCKs. Adjusted initial size of executors and fixed a few other problems.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Copyright and checkstyle updates.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed copyright year.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed typos and some cleanup.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Created exclude file as a workaround for a sportbugs' bug.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Updated copyright year.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* MicroProfile Opentracing 2.0 (#2676)

* Microprofile Opentracing uprgated to 2.0
* Unused dependences removed
* Obsolete excludes removed

* Sync up of microprofile-4.0 with master branch (#2757)

* Upgrade Netty to 4.1.58 (#2678)

Signed-off-by: Tomas Langer <[email protected]>

* Added overall timeout to evictable cache (#2659)

Signed-off-by: Tomas Langer <[email protected]>

* Fix copyright year for commits broken by squashing. (#2687)

Signed-off-by: Tomas Langer <[email protected]>

* Concat array enhancement (#2508)

* Concat array enhancement

Signed-off-by: Daniel Kec <[email protected]>

* Update Jackson to 2.12.1 (#2690)

* Update Jackson to 2.12.1
* Upgrade to latest Junit5 to get fix for junit-team/junit5#2198
* Manage junit4 version

* PokemonService template fixed in SE Database Archetype. (#2701)

Signed-off-by: Tomas Kraus <[email protected]>

* Fixed different output in DbClient SE archetype (#2703)

Signed-off-by: Tomas Kraus <[email protected]>

* Fix TODO application: (#2708)

- WebSecurity needs to be passed config.get("security") to take the "security.web-server" configuration
 - Added outbound configuration for the google login
 - Upgraded cassandra driver to fix issues with old guava dependencies
 - Removed metrics to avoid issues with cassandra driver.

Fixes #2707

* Update k8s descriptors to avoid using deprecated APIs. (#2719)

* Separate execution of DataChunkReleaseTest in its own VM to prevent leak messages in other test's logs. (#2716)

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Changes in this commit: (#2727)

1. Upgrade to Jersey 2.33
2. Configuration via system properties for the Jersey Client API. Any response in an exception will be mapped to an empty one to prevent data leaks. See eclipse-ee4j/jersey#4641.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Properly release underlying buffer before passing it to WebSocket handler (#2715)

* Properly release underlying buffer before passing it to handler.

* Releases data chunks after passing them to Tyrus without any copying. Reports an error and closes connection if Tyrus is unable to handle the data. Finally, fixed a problem related to subscription requests.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Removed unused logger.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed checkstyle.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fix issue with null value in JSON. (#2723)

Signed-off-by: Tomas Langer <[email protected]>

* Upgrade grpc to v1.35.0 (#2713)

* Upgrade grpc to v1.35.0

* Update copyright

* Upgrades OCI SDK to version 1.31.0 (#2699)

* Updated OCI to 1.31.0

Signed-off-by: Laird Nelson <[email protected]>

* Fix null array values in HOCON/JSON config parser. (#2731)

Resolves #2720 (follow-up)

* Performance improvements to queue(s) management in Webserver (#2704)

* Initial patch.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed some type params and improved comments.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* More cleanup and make sure to fail publisher on an error condition.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Suppress warnings.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Call clearQueues on every new request for proper cleanup of keep-alive connections. Some copyright fixes.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed checkstyle issues.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Force logging of LEAK error even if finalize does not get called on a DataChunk.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Upgrade Weld (#2668)

Signed-off-by: Tomas Langer <[email protected]>

* Rest client async header propagation with usage of Helidon Context (#2735)

Rest client header propagation with usage of Helidon Context

Signed-off-by: David Kral <[email protected]>

* Allow override of Jersey property via config (#2737)

* Allow the default value of property jersey.config.client.ignoreExceptionResponse to be overridden via config. New test.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed copyright year.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* New implementation of LazyValue (#2738)

* New implementation of LazyValue that lazily initializes a Semaphore instead of eagerly creating a ReentrantLock. Makes use of volatile guarantees and atomicity of VarHandle updates.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* New test for LazyValueImpl.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Reduced sleep time in test.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Update CHANGELOG for 2.2.1 release (#2743)

* 2.2.1 THIRD_PARTY_LICENSES update (#2746)

* Update THIRD_PARTY_LICENSES

* Support async invocations using optional synthetic SimplyTimed behavior (#2745)

* Add support for async invocations for optional inferred SimplyTimed behavior on JAX-RS endpoints

Signed-off-by: [email protected] <[email protected]>

* Do not attempt to access the request context in Fallback callback. If used together with Retry, it is possible for the fallback to be called in a fresh thread for which there is no current request scope. Instead just use the original value obtained in this class' constructor. Updated functional test (with some class renaming) to cover this use case. (#2748)

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fix for native image. (#2753)

Signed-off-by: Tomas Langer <[email protected]>

* Fixed checkstyle issues.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

Co-authored-by: Tomas Langer <[email protected]>
Co-authored-by: Daniel Kec <[email protected]>
Co-authored-by: Joe DiPol <[email protected]>
Co-authored-by: Tomáš Kraus <[email protected]>
Co-authored-by: Romain Grecourt <[email protected]>
Co-authored-by: Jonathan Knight <[email protected]>
Co-authored-by: Laird Nelson <[email protected]>
Co-authored-by: David Král <[email protected]>
Co-authored-by: Tim Quinn <[email protected]>

* Fixed problems in RetryImpl after merge.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed problems with metrics after merge.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Updated version in suite file.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed problem retrieving registry for metrics.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed more problems after merge. All tests are passing now.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed checkstyle errors.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Fixed TODO.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Enabled TCK's by default and removed generated file.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* One more checkstyle violation.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

* Removed duplicate test after merge.

Signed-off-by: Santiago Pericasgeertsen <[email protected]>

Co-authored-by: Dmitry Aleksandrov <[email protected]>
Co-authored-by: Tomas Langer <[email protected]>
Co-authored-by: Daniel Kec <[email protected]>
Co-authored-by: Joe DiPol <[email protected]>
Co-authored-by: Tomáš Kraus <[email protected]>
Co-authored-by: Romain Grecourt <[email protected]>
Co-authored-by: Jonathan Knight <[email protected]>
Co-authored-by: Laird Nelson <[email protected]>
Co-authored-by: David Král <[email protected]>
Co-authored-by: Tim Quinn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants