-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart preempted jobs #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. I just had a few comments.
Also, can we have method headers and descriptions for the functions/classes? Will create less confusion down the line. I will need some time to think about the use case and what other features to include.
src/main/java/com/google/jenkins/plugins/computeengine/InstanceConfiguration.java
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineComputer.java
Outdated
Show resolved
Hide resolved
|
||
boolean getPreempted() { | ||
try { | ||
return preemptedFuture != null && preemptedFuture.isDone() && preemptedFuture.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this logic a bit further? Just want to make sure we are both thinking the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It basically checks if node was preempted :)
First we check if we have feature (if node is not preemptive we may not have it set)
Then we check if feature is done.
And if its done check if its true == preempted
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineComputerListener.java
Outdated
Show resolved
Hide resolved
@@ -89,13 +89,10 @@ protected void _terminate(TaskListener listener) throws IOException, Interrupted | |||
|
|||
// If the instance is running, attempt to terminate it. This is an asynch call and we | |||
// return immediately, hoping for the best. | |||
cloud.client.terminateInstanceWithStatus(cloud.projectId, zone, name, "RUNNING"); | |||
cloud.getClient().terminateInstance(cloud.getProjectId(), zone, name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fixes issue with leaving behind preempted instances.
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineInstance.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/client/ComputeClient.java
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/PreemptedCheckCallable.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineRetentionStrategy.java
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineRetentionStrategy.java
Outdated
Show resolved
Hide resolved
* @param retentionTimeMinutes number of minutes of idleness after which to kill the slave; serves a backup in case the strategy fails to detect the end of a task | ||
* @param oneShot create one shot instance strategy | ||
*/ | ||
ComputeEngineRetentionStrategy(int retentionTimeMinutes, boolean oneShot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this is package scoped and not public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It dont need to be public? :)
Any reason to make it public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been the convention throughout the rest of the code base. That said, I don't have any strong opinions one way or the other :)
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/PreemptedCheckCallable.java
Show resolved
Hide resolved
Thanks for digging in to this. Would like to see test coverage for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few inline comments.
…gle-compute-engine-plugin into pr-preemptive-merge
*/ | ||
@Override | ||
public Boolean call() throws IOException { | ||
HttpTransport transport = new NetHttpTransport(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method's content should exist within a client abstraction similar to the compute client. Also, it's client should be created with the ClientFactory. This de-duplicates much of this logic, and ensures the user agent string for the plugin is preserved. Also, a generated java client lib for the API should be used if it's available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we will never use this methods in any different places..
So we dont need extra (and rather dont want) client for it..
And the smaller its the faster it works, cause everything here must be serialised and send to slave node (cause its called from slave itself)
I didnt found any lib for that purpose.. I tried but i failed.. if you know any let me know..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with our libraries team, and there currently doesn't exist a client lib for this API. That said, it would still be preferable to maintain the consistency of client abstractions we've established throughout the code base, and move this API logic into its own client class. It allows for better testing and separation of concerns.
* - setting one shot instances {@link OnceRetentionStrategy} | ||
* - in case of preemption of GCP instance to restart preempted tasks | ||
*/ | ||
public class ComputeEngineRetentionStrategy extends RetentionStrategy<ComputeEngineComputer> implements ExecutorListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this class. We'll need to make sure it has proper unit test coverage.
return parent; | ||
} | ||
|
||
private void checkPreempted(Executor executor, Queue.Task task) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems to encapsulate two separate concerns: 1. Checking if a task was preempted 2. Rescheduling the task if 1. is true. Suggest separating the concerns for improved orthogonality and easier testing.
Pinging this to make sure it doesn't get dropped. Please feel free to reach out if there are any questions regarding the feedback. |
@craigatgoogle I didnt forget, just dont have time to finish it.. |
Looks like #54 is merged. How can we help with this change? I could take a whirl at resolving the conflicts if that helps. |
New PR covering same #113 .. |
This PR is "working prototype" for restarting jobs after being preempted by google.
Im testing in on our system for more than 2 months and its working, Its restarting all jobs that was preempted, but there are few things that should be done to make it production ready.
Missing features:
I can finish it but maybe you have some ideas how to do some of needed things.