Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dedicated health port for JITServer #19107

Merged
merged 3 commits into from
Mar 14, 2024

Conversation

mpirvu
Copy link
Contributor

@mpirvu mpirvu commented Mar 8, 2024

Kubernetes can use liveness and readiness probes to check
a pod's health. For a TCP port, the kubelet attempts to open a
socket to the container on the specified port. If a connection
can be established, the container is considered healthy.

The JITServer did not have a dedicated port for health probes;
instead, the main communication port (default 38400) could be
used for this purpose. However, this solution is not ideal
when communication between the client and server is encrypted
with OpenSSL. Starting with OpenSSL 3.0, if one of the participants
closes the connection without notifying the other party, the
OpenSSL library will log error messages like:
"A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320"
Since the health probes are sent periodically, over time, the log
of the JITServer container could be flooded with such messages.
While the error is benign in nature, these messages are annoying
and can create concerns for end-users.

This commit adds a dedicated health port for JITServer, where
communication is never encrypted. By default, the JITServer will
open a health port and the value of this port is 38600.
The following options were added:
-XX:JITServerHealthProbePort=NNN # change the health port value
-XX:-JITServerHealthProbes # disable the dedicated health port
-XX:+JITServerHealthProbes # enable the dedicated health port

Issue: OpenLiberty/open-liberty#27665

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Mar 8, 2024
@mpirvu mpirvu requested a review from dsouzai as a code owner March 8, 2024 15:45
@mpirvu mpirvu force-pushed the jitserveprobes branch 3 times, most recently from a56bc6a to ebaeb52 Compare March 8, 2024 16:26
@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 8, 2024

@dsouzai Could you please review/merge this PR? Thanks

@dsouzai dsouzai self-assigned this Mar 8, 2024
Copy link
Contributor

@dsouzai dsouzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just had some questions.

runtime/compiler/runtime/Listener.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/Listener.cpp Show resolved Hide resolved
runtime/compiler/runtime/Listener.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/Listener.cpp Show resolved Hide resolved
mpirvu added 2 commits March 8, 2024 15:42
Starting with OpenSSL version 3, the SSL code will generate
SSL_R_UNEXPECTED_EOF_WHILE_READING error messages if the other
party has closed the connection without a proper shutdown
notification.
This commit prevents such messages from being printed on
the console. However, these error messages will stil be printed
to the verbose log if -Xjit:verbose={JITServer} option is
given to the JITServer process.

Signed-off-by: Marius Pirvu <[email protected]>
Kubernetes can use liveness and readiness probes to check
a pod's health. For a TCP port, the kubelet attempts to open a
socket to the container on the specified port. If a connection
can be established, the container is considered healthy.

The JITServer did not have a dedicated port for health probes;
instead, the main communication port (default 38400) could be
used for this purpose. However, this solution is not ideal
when communication between the client and server is encrypted
with OpenSSL. Starting with OpenSSL 3.0, if one of the participants
closes the connection without notifying the other party, the
OpenSSL library will log error messages like:
"A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320"
Since the health probes are sent periodically, over time, the log
of the JITServer container could be flooded with such messages.
While the error is benign in nature, these messages are annoying
and can create concerns for end-users.

This commit adds a dedicated health port for JITServer, where
communication is never encrypted. By default, the JITServer will
open a health port and the value of this port is 38600.
The following options were added:
`-XX:JITServerHealthProbePort=NNN` # change the health port value
`-XX:-JITServerHealthProbes` # disable the dedicated health port
`-XX:+JITServerHealthProbes` # enable the dedicated health port

Signed-off-by: Marius Pirvu <[email protected]>
@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 9, 2024

@KostasTsiounis Could you please review the changes that involve OpenSSL usage, i.e. 5214536
Thanks

Copy link
Contributor

@dsouzai dsouzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

TR_VerboseLog::writeLineLocked(TR_Vlog_JITServer, "t=%lu %s: errno=%d sslError=%d errorString=%s",
(unsigned long)compInfo->getPersistentInfo()->getElapsedTime(),
errMsg, errnoCopy, sslError, errorString);
// Print the error strings for all errors that OpenSSL has recorded and empty the error queue
(*OERR_print_errors_fp)(stderr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this always print the errors in stderr? Unless you only care about what's printed on stdout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We print all errors except SSL_R_UNEXPECTED_EOF_WHILE_READING ones.
ERR_get_error() removes the oldest error from the SSL queue, so that ERR_print_errors_fp() will not find it and print it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you seem to be calling ERR_peek_error() a few lines above, which doesn't remove it from the queue. Are you calling the ERR_get_error() at some other point that I am missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intended logic is to print all errors, except SSL_R_UNEXPECTED_EOF_WHILE_READING.
I use ERR_peek_error() to determine the error reason, and if that is SSL_R_UNEXPECTED_EOF_WHILE_READING, then I use ERR_get_error() the eliminate from the queue and prevent it from being printed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool. I didn't see the ERR_get_error() in the if statement. LGTM otherwise.

@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 11, 2024

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 11, 2024

Lots of:

 [ERR] can't bind server address: Address already in use
 [ERR] Failed to open health socket on port 38600

I guess I have to fix the tests.

@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 12, 2024

jenkins test sanity xlinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 13, 2024

jenkins test sanity xlinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 14, 2024

jenkins test sanity plinuxjit,zlinuxjit,alinux64jit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Mar 14, 2024

Tests have been fixed. All tests pass now.

@dsouzai dsouzai merged commit e3197be into eclipse-openj9:master Mar 14, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:doc comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants