Need help with hadoop-ansible?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

analytically
418 Stars 130 Forks Apache License 2.0 329 Commits 6 Opened issues

Description

Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.

Services available

!
?

Need anything else?

Contributors list

Hadoop Ansible Playbook Build Status

Ansible playbook that installs a CDH 4.6.0 Hadoop cluster (running on Java 7, supported from CDH 4.4), with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.

Follow @analytically. Browse the CI build screenshots.

Requirements

  • Ansible 1.5 or later (
    pip install ansible
    )
  • 6 + 1 Ubuntu 12.04 LTS/13.04/13.10 or Debian "wheezy" hosts - see ubuntu-netboot-tftp if you need automated server installation
  • Mandrill username and API key for sending email notifications
  • ansibler
    user in sudo group without sudo password prompt (see Bootstrapping section below)

Cloudera (CDH4) Hadoop Roles

If you're assembling your own Hadoop playbook, these roles are available for you to reuse:

Facebook Presto Roles

Configuration

Set the following variables using

--extra-vars
or editing
group_vars/all
:

Required:

  • site_name
    - used as Hadoop nameservices and various directory names. Alphanumeric only.

Optional:

  • Network interface: if you'd like to use a different IP address per host (eg. internal interface), change
    site.yml
    and change
    set_fact: ipv4_address=...
    to determine the correct IP address to use per host. If this fact is not set,
    ansible_default_ipv4.address
    will be used.
  • Email notification:
    notify_email
    ,
    postfix_domain
    ,
    mandrill_username
    ,
    mandrill_api_key
  • roles/common
    :
    kernel_swappiness
    (0),
    nofile
    limits, ntp servers and
    rsyslog_polling_interval_secs
    (10)
  • roles/2_aggregated_links
    :
    bond_mode
    (balance-alb) and
    mtu
    (9216)
  • roles/cdh_hadoop_config
    :
    dfs_blocksize
    (268435456),
    max_xcievers
    (4096),
    heapsize
    (12278)

Adding hosts

Edit the hosts file and list hosts per group (see Inventory for more examples):

[datanodes]
hslave010
hslave[090:252]
hadoop-slave-[a:f].example.com

Make sure that the

zookeepers
and
journalnodes
groups contain at least 3 hosts and have an odd number of hosts.

Ganglia nodes

Since we're using unicast mode for Ganglia (which significantly reduces chatter), you may have to wait 60 seconds after node startup before it is seen/shows up in the web interface.

Installation

To run Ansible:

./site.sh

To e.g. just install ZooKeeper, add the

zookeeper
tag as argument (available tags: apache, bonding, configuration, elasticsearch, elasticsearch_curator, fluentd, ganglia, hadoop, hbase, hive, java, kibana, ntp, postfix, postgres, presto, rsyslog, tdagent, zookeeper):
./site.sh zookeeper

What else is installed?

URL's

After the installation, go here:

Performance testing

Instructions on how to test the performance of your CDH4 cluster.

  • SSH into one of the machines.
  • Change to the
    hdfs
    user:
    sudo su - hdfs
  • Set HADOOPMAPREDHOME:
    export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
  • cd /usr/lib/hadoop-mapreduce
TeraGen and TeraSort
  • hadoop jar hadoop-mapreduce-examples.jar teragen -Dmapred.map.tasks=1000 10000000000 /tera/in
    to run TeraGen
  • hadoop jar hadoop-mapreduce-examples.jar terasort /tera/in /tera/out
    to run TeraSort
DFSIO
  • hadoop jar hadoop-mapreduce-client-jobclient-2.0.0-cdh4.6.0-tests.jar TestDFSIO -write

Bootstrapping

Paste your public SSH RSA key in

bootstrap/ansible_rsa.pub
and run
bootstrap.sh
to bootstrap the nodes specified in
bootstrap/hosts
. See
bootstrap/bootstrap.yml
for more information.

What about Pig, Flume, etc?

You can manually install additional components after running this playbook. Follow the official CDH4 Installation Guide.

Screenshots

zookeeper

hmaster01

ganglia

kibana

smokeping

License

Licensed under the Apache License, Version 2.0.

Copyright 2013-2014 Mathias Bogaert.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.