-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add a response mechanism to ML controller command processing #62823
Comments
Pinging @elastic/ml-core (:ml) |
Probably the biggest problem with making this change is coordination of changes between C++ and Java without breaking every ML test that uses native processes. There are basically two ways it could be done:
The first option would look like this:
The second option would look like this:
Given the complexity of the second option I am favouring the first. The risk is that while the ML tests are muted somebody else breaks something. This risk could be mitigated by merging the changes over a weekend. The PRs would need to be approved in advance, then the merge steps could be done with very little time spent during the weekend itself, and if everything went to plan the tests would be unmuted by Monday morning. |
This change makes the controller process respond to each command it receives with a document indicating whether that command was successfully executed or not. This response will be used by the Java side of the connection to determine when it is appropriate to move on to the next phase of the action that the controller command was part of. For example, when starting a process and connecting named pipes to it it is best that the named pipe connections are not attempted until the process is confirmed to be started. Relates elastic/elasticsearch#62823
This change makes threads that send a command to the ML controller process wait for it to respond to the command. Previously such threads would block until the command was sent, but not until it was actioned. This was on the assumption that the sort of commands being sent would be actioned almost instantaneously, but that assumption has been shown to be false when anti-malware software is running. Relates elastic/ml-cpp#1520 Fixes elastic#62823
This change makes the controller process respond to each command it receives with a document indicating whether that command was successfully executed or not. This response will be used by the Java side of the connection to determine when it is appropriate to move on to the next phase of the action that the controller command was part of. For example, when starting a process and connecting named pipes to it it is best that the named pipe connections are not attempted until the process is confirmed to be started. Relates elastic/elasticsearch#62823
This change makes threads that send a command to the ML controller process wait for it to respond to the command. Previously such threads would block until the command was sent, but not until it was actioned. This was on the assumption that the sort of commands being sent would be actioned almost instantaneously, but that assumption has been shown to be false when anti-malware software is running. Relates elastic/ml-cpp#1520 Fixes #62823
When the ML Java code needs to start one of the ML native processes (
autodetect
,normalize
ordata_frame_analyzer
) it sends a command to thecontroller
process telling it to spawn the required process. Currently the communications are one way only - the JVM sends a command to the controller and assumes it will be actioned immediately. There is no mechanism for the controller to respond when it has actioned the command. This seemed reasonable in the initial design because the controller is completely dedicated to starting and killing processes, and these were assumed to be very fast operations.We have observed that when security software is running on a machine spawning a new process can take a very long time - over 20 seconds has been observed between the command being received in the controller and the resulting
posix_spawn
call returning. This invalidates the assumption that commands issued tocontroller
by the JVM will be near instantaneous. It causes a problem because the timeout waiting for the named pipes to connect starts immediately after the command is issued, but the process may not actually start until considerably later.Therefore, there is a need for
controller
to be able to report back to the ES JVM when each command sent to it has been actioned. Then the ES JVM should not try to connect the named pipes to a process until the controller has reported that it has actually spawned that process. This will mean that the configured timeout for connecting the named pipes is measured from a more appropriate point in time.The text was updated successfully, but these errors were encountered: