-
Notifications
You must be signed in to change notification settings - Fork 76
Wrapping this cookbook
This cookbook is intended to be wrapped with another cookbook, which sets up this cookbook's attributes and orchestrates initialization and service states. This is due to the complex nature of Hadoop, where certain actions require other actions to take place, many times not on the same machine. There are many execute
and service
resources with action :nothing
to facilitate these functions.
For example, you must format the NameNode, start the NameNode, and start (at least) one DataNode before you can create a directory in HDFS. Since it is very likely that the NameNode and DataNode reside on different machines, orchestration is required and automatically performing these actions could be dangerous. Recipes/resources need to be executed as follows:
- recipe[hadoop::hadoop_hdfs_namenode] on NameNode machine
- recipe[hadoop::hadoop_hdfs_datanode] on all DataNode machines
- execute[hdfs-namenode-format] with action :run from recipe[hadoop::hadoop_hdfs_namenode] on NameNode machine
- service[hadoop-hdfs-namenode] with action :start from recipe[hadoop::hadoop_hdfs_namenode] on NameNode machine
- service[hadoop-hdfs-datanode] with action :start from recipe[hadoop::hadoop_hdfs_datanode] on all DataNode machines
At this point, you will have a functional HDFS cluster and can perform hdfs
commands.
Chef allows one to call a resource using the resources
collection via the #run_action
method. While this can be done in Ruby in a recipe, we recommend putting these calls within a named ruby_block
in a recipe. This causes the resource to be called during the execution phase of a Chef run, versus during the compile phase. This is necessary as all resources may not be available during the compile phase.
Here is an example, taken from continuuity/hadoop_wrapper_cookbook's hive_init recipe.
dfs = node['hadoop']['core_site']['fs.defaultFS']
ruby_block 'initaction-create-hive-hdfs-homedir' do
block do
resources('execute[hive-hdfs-homedir]').run_action(:run)
end
not_if "hdfs dfs -test -d #{dfs}/user/hive", :user => 'hdfs'
end
First, we're setting the dfs
variable from the attribute which points to the location of our HDFS NameNode. Next, is a ruby_block which calls the execute[hive-hdfs-homedir]
resource from the resources collection with the :run
action. We guard execution of this ruby_block via a call to HDFS, using the hdfs
shell command. This block can only be executed during the execution phase of a Chef run, and only after HDFS is fully operational. Otherwise, the hdfs
shell command may not be available, may fail, or may not return the actual value.
Service resources are similar to execute resources. The main difference is that service resources support multiple actions. This is an example that simply starts a service.
ruby_block 'service-hadoop-hdfs-namenode-start' do
block do
resources('service[hadoop-hdfs-namenode]').run_action(:start)
end
end
Since service resources are always idempotent, we do not need to add a guard to the ruby_block. However, what if we wanted to both start the service, and enable it to start at boot? The Chef #run_action
method only supports a single action, so we must perform both actions, ourselves.
ruby_block 'service-hadoop-hdfs-namenode-start-and-enable' do
block do
%w(enable start).each do |action|
resources('service[hadoop-hdfs-namenode]').run_action(action.to_sym)
end
end
end
This causes our ruby_block to signal the service[hadoop-hdfs-namenode] resource from the resource collection to enable, then start.