-
Notifications
You must be signed in to change notification settings - Fork 598
Support running Heron on Amazon Ecs. #1837
base: master
Are you sure you want to change the base?
Conversation
#heron.statemgr.connection.string: "xx.xx.xx.xx:2181" | ||
|
||
|
||
# path of the root address to store the state in a local file system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a local file system or it expects zookeeper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does expect zookeeper to be running on one of the ec2 instances. As suggested I will put up a document explaining the offline set ups on Amazon ec2 and the following steps on the scheduler.
|
||
heron.statemgr.zookeeper.retry.interval.ms: 10000 | ||
|
||
#################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we require tunneling? If not, please remove the tunnel configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tunneling could be needed for any of the schedulers and they're valid configs so I think it's ok to keep them in the config file. Maybe we set it to False by default though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done set default to false
@@ -155,7 +171,14 @@ def init_parsed_args(self, args): | |||
# id within docker, rather than the host's hostname. NOTE: this 'HOST' env variable is not | |||
# guaranteed to be set in all Docker executor environments (outside of Marathon) | |||
if is_docker_environment(): | |||
self.master_host = os.environ.get('HOST') if 'HOST' in os.environ else socket.gethostname() | |||
# Need to set the HOST environment vairable if docker is for AWS ECS tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding code to this function, can we make a overall function called get_host - in which we can check whether it is running in docker or ECS or use socket.gethostname() and return the hostname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for ways to do this without one-off logic to support ECS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added gethost function
public static final String ECS_CLUSTER_BINARY = "heron-examples.jar"; | ||
public static final String COMPOSE_CMD = "ecs-cli compose --project-name "; | ||
public static final String UP = " up"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of hard coding of values here. Not sure why you need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing as much of this hard-coding as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@ananthgs - can you write a good concise PR description about how it works with ECS? |
@ananthgs - you might want to add a detailed documentation of how to set it up and use, as described in other schedulers for heron.io. However, this can be done in a different PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the submission! Please add unit tests and accept the CLA:
https://twitter.github.io/heron/docs/contributors/community/
|
||
heron.statemgr.zookeeper.retry.interval.ms: 10000 | ||
|
||
#################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tunneling could be needed for any of the schedulers and they're valid configs so I think it's ok to keep them in the config file. Maybe we set it to False by default though.
@@ -155,7 +171,14 @@ def init_parsed_args(self, args): | |||
# id within docker, rather than the host's hostname. NOTE: this 'HOST' env variable is not | |||
# guaranteed to be set in all Docker executor environments (outside of Marathon) | |||
if is_docker_environment(): | |||
self.master_host = os.environ.get('HOST') if 'HOST' in os.environ else socket.gethostname() | |||
# Need to set the HOST environment vairable if docker is for AWS ECS tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for ways to do this without one-off logic to support ECS.
public static final String ECS_CLUSTER_BINARY = "heron-examples.jar"; | ||
public static final String COMPOSE_CMD = "ecs-cli compose --project-name "; | ||
public static final String UP = " up"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing as much of this hard-coding as possible.
+ " herondata:\n" | ||
+ " driver: local"; | ||
public static final String DESTINATION_JVM = "/usr/lib/jvm/java-8-oracle"; | ||
public static final String ECS_CLUSTER_BINARY = "heron-examples.jar"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this code require a dep on heron-examples.jar?
+ "volumes:\n" | ||
+ " herondata:\n" | ||
+ " driver: local"; | ||
public static final String DESTINATION_JVM = "/usr/lib/jvm/java-8-oracle"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does jvm location need to be hardcoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your Docker image contains a pre-installed version of the Java 8 JRE, you should be able to reference $JAVA_HOME here. You should also be able to set that from within the scheduler configuration as well, via the heron.directory.sandbox.java.home
option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
public String replacePortNumbers(int container, String content) { | ||
int basePortnumber = 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 5000 here and the logic in this method looks duplicative of other logic in this class. Can you consolidate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
public String getDockerFileContent(String execCommand, int container) { | ||
String commandBuiler = EcsContext.PART1 + EcsContext.CMD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Favor StringBuilder over string concat. Or better yet, use String.format(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Will modify
public String replacePortNumbers(int container, String content) { | ||
int basePortnumber = 5000; | ||
String localContent = new String(content); | ||
for (int i = 0; i < SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please clarify in the javadocs what this method does. Also I suspect StringBuilder might be a better fit here.
} | ||
builder.append(string); | ||
} | ||
String stringToReturn = builder.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return builder.toString();
return null; | ||
} | ||
|
||
public boolean onKill(Scheduler.KillTopologyRequest request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods all need to be implemented and annotated with @OverRide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please implement those methods in the following commits
@@ -108,6 +109,21 @@ def log_pid_for_process(process_name, pid): | |||
def is_docker_environment(): | |||
return os.path.isfile('/.dockerenv') | |||
|
|||
def isEcsAmiInstance(): | |||
meta = 'http://169.254.169.254/latest/meta-data/ami-id' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ananthgs - why is this URL hard coded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove hard coded url addresses
pass it from out side if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, Passing it as argument
+ "services:\n" | ||
+ " container_number:\n" | ||
+ " image: ananthgs/onlyheronandubuntu\n"; | ||
public static final String CMD = " command: [\"sh\", \"-c\", \"mkdir /s3; cd /s3 ;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ananthgs - Is this image a generic ubuntu image downloaded from docker hub? or specially created image? If this is specially created image - what does it contain other than the basic OS?
public static final String ECSNETWORK = " networks:\n" | ||
+ " - heron\n" | ||
+ " ports:\n" | ||
+ " - \"5000:5000\"\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these ports are hard coded? Can we ask ECS to provide a dynamic port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done removed hard coded ports
self.master_host = subprocess.Popen(["curl", | ||
"http://169.254.169.254/latest/meta-data/local-ipv4"] | ||
, stdout=subprocess.PIPE).communicate()[0] | ||
os.environ['HOST'] = self.master_host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this line since you won't be grabbing os.environ['HOST']
after this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
+ " image: ananthgs/onlyheronandubuntu\n"; | ||
public static final String CMD = " command: [\"sh\", \"-c\", \"mkdir /s3; cd /s3 ;" | ||
+ "aws s3 cp s3://herondockercal/TOPOLOGY_NAME/topology.tar.gz /s3 ;" | ||
+ "aws s3 cp s3://herondockercal/heron-core-testbuild-ubuntu14.04.tar.gz /s3 ;cd /s3;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That paths to the core binaries will need to be more dynamic than this and will need to feed out of the heron.package.core.uri
configuration option in the scheduler library right?
String content = null; | ||
try { | ||
tempDockerFile = File.createTempFile("docker", ".yml"); | ||
content = getDockerFileContent(finalExecCommand, container); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain what the process is here for creating this docker.yml file just so I understand what's going on? What is it used for and what is it composed of -- just so I can understand a little better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AWS ECS task can be triggered via a docker style compose command. This Task is then run on the AWS Cluster in a Container instance. The overall approach is explained here :
https://docs.google.com/document/d/1ecbCuA46cIKPfY0SP0F1dcRlei4DIPz3pZ6ZSZ5zZgc/edit?usp=sharing.
Please feel free to comment on the approach.
@@ -108,6 +109,21 @@ def log_pid_for_process(process_name, pid): | |||
def is_docker_environment(): | |||
return os.path.isfile('/.dockerenv') | |||
|
|||
def isEcsAmiInstance(): | |||
meta = 'http://169.254.169.254/latest/meta-data/ami-id' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove hard coded url addresses
pass it from out side if necessary
# Need to set the HOST environment vairable if docker is for AWS ECS tasks | ||
if isEcsAmiInstance(): | ||
self.master_host = subprocess.Popen(["curl", | ||
"http://169.254.169.254/latest/meta-data/local-ipv4"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same hard code url problem
return null; | ||
} | ||
|
||
public boolean onKill(Scheduler.KillTopologyRequest request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please implement those methods in the following commits
} | ||
|
||
public List<String> getJobLinks() { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an empty ArrayList is better than a null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the get joblink feature to return the ecs tasks
private String[] getExecutorCommand(int container) { | ||
List<Integer> freePorts = new ArrayList<>(SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR); | ||
for (int i = 0; i < SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR; i++) { | ||
//freePorts.add(SysUtils.getFreePort()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any specific reason not using this getFreePort() method but hard code the port number?
LOG.info("Starting to deploy topology: " + EcsContext.topologyName(config)); | ||
LOG.info("Starting executor for TMaster"); | ||
startExecutor(0); | ||
// for each container, run its own executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log the event of starting normal container here
@ajorgensen you guys also run on EC2, correct? If so could you review this to see if your approach and this approach could coexist in this codebase? |
modified: ecs-heron-role.json
modified: ecs-heron-policy.json
# Conflicts: # heron/executor/src/python/heron_executor.py
Pre requisite: AWS and ECS-CLI needs to be installed on the submitter machine as per guidelines from Amazon.