This cookbook installs and configures Datastax Enterprise. More info is here (DataStax Enterprise).
It uses officially released Datastax packages. It can tweak the Cassandra config files, but has no way of adding data or creating keyspaces in Cassandra (yet).
This cookbook is designed to be used in conjuction with a wrapper cookbook. Used alone, a single node cluster can be created, but in order to create a multiple node cluster a wrapper is recommended.
Example in a wrapper:
node.default['java']['jdk_version'] = "7"
node.default['cassandra']['seeds'] = "192.168.1.1, 192.168.1.2"
node.default['cassandra']['dse_version'] = "4.0.3-1"
node.default['cassandra']['max_heap_size'] = "12G"
node.default['cassandra']['heap_newsize'] = "1200M"
include_recipe "dse::cassandra"
##Scope
This cookbook attempts to manage almost all Apache Cassandra configuration settings. It can also create Hadoop and Solr nodes, with less attribute to manage their config.
This cookbook currently provides
- Datastax 4.x.x (Datastax Enterprise Edition) via packages.
- Chef 11 or higher
Tested on:
- RHEL 6.3, 6.4
- Ubuntu 14.04.1 LTS
- Slight testing done on Ubuntu 12.04 (will require some edits)
The provided recipes are dse::cassandra
, dse::solr
, and dse::hadoop
dse::cassandra
will provision DSE as a cassandra node.dse::solr
will provision DSE with solr enabled.dse::hadoop
will provision DSE with hadoop enabled.
There are also recipes that should not be called directly that are used for configuration.
dse::default
sets up the templatesdse::datastax
sets up the datastax reposdse::datstax-agent
configures the datastax-agent if neededdse::ssl
(work in progress) sets up SSL keys on all nodes
This cookbook will install DSE Cassandra by default. Other attributes you can set are:
-
node["cassandra"]["cluster_name"]
(default:Test Cluster
): The name of the cluster to provision -
node["cassandra"]["vnodes"]
(default:true
): enable or disable vnodes -
node["cassandra"]["intial_token"]
(default:nil
): the initial token to use. leave blank for vnodes -
node["cassandra"]["num_tokens"]
(default:256
): set the number of tokens to use -
node["cassandra"]["solr"]
(default:false
): enable solr or not -
node["cassandra"]["hadoop"]
(default:false
): enable hadoop or not -
node["cassandra"]["dse_version"]
(default:4.0.3-1
): dse version to install -
node["cassandra"]["user"]
(default:cassandra
): the cassandra user -
node["cassandra"]["group"]
(default:cassandra
): the cassandra group
node["cassandra"]["listen_address"]
(default:node['ipaddress']
): the ipaddress to use for listen addressnode["cassandra"]["rpc_address"]
(default:node['ipaddress']
): the ipaddress to use for rpc addressnode["cassandra"]["broadcast_address"]
(default:nil
): the ipaddress to use for broadcast addressnode["cassandra"]["seeds"]
(default:node['ipaddress']
): the ipaddress to use for the seed listnode["cassandra"]["concurrent_reads"]
(default:32
): concurrent reads settingnode["cassandra"]["concurrent_writes"]
(default:32
): concurrent writes settingnode["cassandra"]["compaction_thruput"]
(default:16
): limit the throughput of compactionsnode["cassandra"]["multithreaded_compaction"]
(default:false
): enable or disable multithreaded compactionnode["cassandra"]["in_memory_compaction_limit"]
(default:64
): size limit for in-memory compactionsnode["cassandra"]["trickle_fsync"]
(default:false
): enable trickle fsync, usually for ssdnode["cassandra"]["range_request_timeout_in_ms"]
(default:10000
): default timeout on range requestsnode["cassandra"]["thrift_framed_transport_size_in_mb"]
(default:15
): the max size of a thrift framenode["cassandra"]["thrift_max_message_length_in_mb"]
(default:nil
): the max message length of a thrift callnode["cassandra"]["concurrent_compactors"]
(default:nil
): the number of concurrent compactors to allow
node["cassandra"]["role_based_seeds"]
(default:false
): set to true to assign seeds based on members of dse-seed rolenode['cassandra']['seed_role']
(default:role:dse-seed
): set to a diffrent role to select seeds
node["cassandra"]["CMSInitiatingOccupancyFraction"]
(default:65
): cms occupancy fraction to use for gcnode["cassandra"]["max_heap_size"]
(default:8192M
): default max heap size for cassandranode["cassandra"]["heap_newsize"]
(default:800M
): default new gen size for heap
node["cassandra"]["authentication"]
(default:false
): enable or disable authenticationnode["cassandra"]["authorization"]
(default:false
): enable or disable authorizationnode["cassandra"]["authenticator"]
(default: ``): the authenticator to use (eg org.apache.cassandra.auth.AllowAllAuthenticator)node["cassandra"]["authorizor"]
(default: ``): the authorizor to use (eg org.apache.cassandra.auth.AllowAllAuthorizer)
node["cassandra"]["log_level"]
(default:INFO
): the log level for cassandra (or solr/hadoop)node["cassandra"]["audit_logging"]
(default:false
): turn on audit loggingnode["cassandra"]["audit_dir"]
(default:/var/log/cassandra
): the directory to put audit logs innode["cassandra"]["active_categories"]
(default:ADMIN,AUTH,DDL,DCL
): the categories to audit on
node['cassandra']['metrics_reporter']['enabled']
(default:false
): enable or disable the metrics reporter jarnode['cassandra']['metrics_reporter']['name']
(default:metrics-graphite
): the name of the jar to use, graphite is a popular onenode['cassandra']['metrics_reporter']['jar_url']
(default:http://search.maven.org/remotecontent?filepath=com/yammer/metrics/metrics-graphite/2.2.0/metrics-graphite-2.2.0.jar
): where the jar isnode['cassandra']['metrics_reporter']['sha256sum']
(default:6b4042aabf532229f8678b8dcd34e2215d94a683270898c162175b1b13d87de4
): checksum of the jarnode['cassandra']['metrics_reporter']['jar_name']
(default:metrics-graphite-2.2.0.jar
): full name of the jarnode['cassandra']['metrics_reporter']['config']
(default:{}
): hash of the conf to use, example below:
node.default['cassandra']['metrics_reporter'] = {
'enabled' => true,
'name' => 'metrics-graphite',
'jar_url' => 'http://search.maven.org/remotecontent?filepath=com/yammer/metrics/metrics-graphite/2.2.0/metrics-graphite-2.2.0.jar',
'sha256sum' => '6b4042aabf532229f8678b8dcd34e2215d94a683270898c162175b1b13d87de4',
'jar_name' => 'metrics-graphite-2.2.0.jar',
'config' => {
'graphite' => [{
'timeunit' => 'SECONDS',
'hosts' => [{
'host' => 'graphite.host.com',
'port' => 2003
}],
'prefix' => "servers.#{node.name}.cassandra",
'period' => 60,
'predicate' => {
'color' => 'white',
'useQualifiedName' => true,
'patterns' => [
'^org.apache.cassandra.metrics.Cache.+',
]
}
}]
}
}
node["cassandra"]["dse"]["delegated_snitch"]
(default:org.apache.cassandra.locator.SimpleSnitch
): the snitch to use for dsenode["cassandra"]["dse"]["snitch"]
(default:com.datastax.bdp.snitch.DseDelegateSnitch
): the snitch to use in dse.yamlnode["cassandra"]["dse"]["service_name"]
(default:dse
): the name of the servicenode["cassandra"]["dse"]["conf_dir"]
(default:/etc/dse
): the directory of dse config filesnode["cassandra"]["dse"]["repo_user"]
(default: ``): the datastax username for the reponode["cassandra"]["dse"]["repo_pass"]
(default: ``): the datastax password for the reponode["cassandra"]["dse"]["rhel_repo_url"]
(default:http://#{node['cassandra']['dse']['repo_user']}:#{node['cassandra']['dse']['repo_pass']}@rpm.datastax.com/enterprise
): the rhel reponode["cassandra"]["dse"]["debian_repo_url"]
(default:http://#{node['cassandra']['dse']['repo_user']}:#{node['cassandra']['dse']['repo_pass']}@debian.datastax.com/enterprise
): the debian repo
node["hadoop"]["max_heap_size"]
(default:10G
): the heap size for hadoopnode["hadoop"]["heap_newsize"]
(default:800M
): the heap newgen size for hadoopnode["hadoop"]["map_child_java_opts"]
(default:4G
): the size of the map child java heapnode["hadoop"]["reduce_child_java_opts"]
(default:4G
): the size of the reduce child java heapnode["hadoop"]["map_red_localdir"]
(default:/data/mapredlocal
): the directory to use for map/reducenode["hive"]["scratch_dir"]
(default:/data/hive
): the directory to use for hivenode["hadoop"]["map_reduce_parallel_copies"]
(default:20
): the number of map reduce copiesnode["hadoop"]["mapred_tasktracker_map_tasks_max"]
(default:23
): the max number of map tasksnode["hadoop"]["mapred_tasktracker_reduce_tasks_max"]
(default:12
): the max number of reduce tasksnode["hadoop"]["io_sort_mb"]
(default:512M
): the size of iosortnode["hadoop"]["io_sort_factor"]
(default:64
): the iosort factor
node["solr"]["max_heap_size"]
(default:14G
): the heap size for solrnode["solr"]["heap_newsize"]
(default:2400M
): the newgen heap size
These are generic java settings. Datastax recommends oracle java, so override openjdk default and download from a specific location.
node["dse"]["manage_java"]
(default:true
): whether or not to use the java recipe to manage the java installnode["java"]["install_flavor"]
(default:oracle
): the flavor of java to installnode["java"]["jdk_version"]
(default:7
): the version of java to usenode['java']['jdk']['7']['x86_64']['url']
(default: ``): the url to get the java 7 file from
This portion is under construction. SSL does not currently 100% work.
node["cassandra"]["dse"]["cassandra_ssl_dir"]
(default:/etc/cassandra
): the directory to use for pem filesnode["cassandra"]["dse"]["password_file"]
(default:cassandra_pass.txt
): the file to store the keystore pass innode["cassandra"]["dse"]["internode_encyption"]
(default:none
): the encyption to use (all, dc, rack)node["cassandra"]["dse"]["keystore"]
(default:#{node["cassandra"]["dse"]["cassandra_ssl_dir"]}/#{node["hostname"]}.keystore
): keystore namenode["cassandra"]["dse"]["truststore"]
(default:#{node["cassandra"]["dse"]["cassandra_ssl_dir"]}/#{node["hostname"]}.truststore
): truststore name
These attributes are used to conigure the datastax-agent. This is used with Datastax Opscenter.
node["datastax-agent"]["enabled"]
(default:false
): whether to install the datastax agent and configurenode["datastax-agent"]["version"]
(default:4.1.1-1
): the version of the datastax agent to installnode["datastax-agent"]["conf_dir"]
(default:/var/lib/datastax-agent/conf
): where the datastax-agent conf file isnode["datastax-agent"]["opscenter_ip"]
(default:192.168.32.3
): the Opscenter IP to connect to
- java
- yum
- apt
Datastax recommends to use the Oracle jdk version. You can do this by setting an attribute in your environment or run list.
The integration test environment consists of :
- Chef-DK 0.4.0
- VirtualBox 4.3.24
- Vagrant 1.7.2
- vagrant-omnibus
- vagrant-berkshelf
- vagrant-share
- vagrant-login
Edit the .kitchen.yml file in the root of the cookbook and set your Datastax repository username and password in order to run the tests. Run 'rake' in the root of the cookbook to test the full automated testing suite.
- Author: Daniel Parker ([email protected])
- Reviewer: Eric Helgeson ([email protected])
Released under the Apache 2.0 License.