-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the functionality to run TD distributed on Kubernetes. #12
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution!
Looks good overall! Clean code, automated tests, good inline documentation. We will need to improve the user docs too, but that can happen after merging the PR.
I have a few comments inline, most important ones are related to the threading and exception handling. That is why I'm currently requesting changes instead of Approving.
toughday/src/main/java/com/adobe/qe/toughday/internal/core/config/GlobalArgs.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
|
||
private static Representer getRepresenter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, SnakeYaml includes all empty collections when dumping an object (for instance, an empty Map is represented as {}). This class configures the Representer to skip all empty collections in order to keep the generated Yaml representation of the object as clean as possible.
return null; | ||
} | ||
} catch (Throwable t) { | ||
t.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do something else here beside printing the stack trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error should not impact the overall functionality. It would only mean that the empty collections are also included in the Yaml representation on an object so I believe we can simply ignore this exception.
* It is also assumed that the only action to be taken into consideration when dumping the configuration is Actions.ADD | ||
* since all the other actions were already processed before sending the execution query to the driver. | ||
*/ | ||
public class YamlDumpConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name of the class is a bit too generic for what it does. If I see this name, I'm inclined to use it anywhere I need to dump a yaml configuration, but it looks like it is only supposed to be used when sending messages between driver and agent. Anything that we could do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved this class into the distributedTd package and I have renamed it to reflect the case in which it should be used.
} | ||
} | ||
} catch (IllegalAccessException | InvocationTargetException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here regarding stack traces and handling exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These errors should only happen when there are no @ConfigArgSet and @ConfigArgGet methods properly defined for certain configurable properties of the object instance given as a parameter. I think we could handle this exceptions by logging a warning message specifying that these properties will be ignored when dumping the configuration.
public void merge(DistributedConfig other) { | ||
this.setHeartbeatInterval(other.getHeartbeatInterval()); | ||
this.setRedistributionWaitTime(other.redistributionWaitTime); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are only these properties overwritten at merge time? What is the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These properties can be set in the configuration specified in the Driver docker image, but these values can be modified when sending the TD configuration to be executed distributed. The rest of the properties should not be modified once the Driver/Agents are deployed in the cluster.
this.inactiveAgents.add(agentIdentifier); | ||
} | ||
|
||
private void waitUntilAllAgentsAreReadyForRebalancing(Queue<String> agents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would an agent not be ready for a redistribution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is that when an Agent receives a redistribution request, the Configuration object is already created so that we will be able to modify the properties of certain objects (RunMode, TestSuite). The following will explain this concern better:
- A new agent(10.0.0.1) joins the cluster.
- The work redistribution process begins => agent 10.0.0.1 receives a Configuration in YAML format
- A new agent(10.0.0.2) joins the cluster
- The work redistribution process begins => agent 10.0.0.1 receives instructions to modify certain properties of the run mode.
If agent (10.0.0.1) failed to create the Configuration before step 4, it will not be able to process the redistribution request.
} catch (JsonProcessingException e) { | ||
LOG.error("Unexpected exception while sending redistribution instructions", e); | ||
LOG.warn("Agent " + ipAddress + " will continue to run with the configuration it had before " + | ||
"the process of redistribution was triggered."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the impact here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The properties of the current Phase of the execution will no longer be respected(for instance, the desired load might not be achieved). The next phase will execute properly, so I believe it is not necessary to kill the entire distributed execution because of this.
This exception should never be thrown as long as the RedistributionInstructions class has a format that allows the ObjectMapper to dump it in JSON format.
...src/main/java/com/adobe/qe/toughday/internal/core/distributedtd/splitters/PhaseSplitter.java
Show resolved
Hide resolved
public void setLoad(String load) { | ||
checkNotNegative(Long.parseLong(load), "load"); | ||
this.load = Integer.parseInt(load); | ||
} | ||
|
||
@ConfigArgGet | ||
@ConfigArgGet(redistribute = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot why we need redistribute
if the work is done by the run mode rebalancer which knows the intrinsics of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The run mode balancer knows how to interpret the redistribution instructions received from the Driver and how to modify the properties of the run mode instance so that the desired configuration will be achieved (it adapts the number of threads or the current load, etc).
We need the redistribute
field to automatically detect which properties should be included in the redistribution instruction message when creating it.
f2f751a
to
40f461e
Compare
Extend ToughDay to be able to run distributed on Kubernetes.
Related Issue
Fix for #10.
Motivation and Context
Please refer to #10 for more context.
How Has This Been Tested?
JUnit & manually.
Types of changes
Checklist: