Support Resources

LucidWorks Support Portal
LucidWorks Big Data Forum

LucidWorks Big Data

PDF Versions

This is the documentation for LucidWorks Big Data v1.1.

Skip to end of metadata
Go to start of metadata

These instructions outline the steps for installing LucidWorks Big Data on a local cluster using Chef Server. This is the preferred method of installation, and is the simplest approach. However, if you cannot use Chef Server because your nodes do not have access to the internet, please see the instructions on Provisioning a Cluster with Chef Solo.

These instructions are designed for a three-node cluster, but can be easily extended for larger clusters.

Before starting the installation, please review the section Planning Your Cluster. It is important to have a plan for how the cluster will be structured, and which services will be co-located, before starting this process.

The installation process is simpler with Chef Server than with Chef Solo, but still requires a number of steps. After reviewing the system requirements and setting up each node of the cluster, you will install Chef Server, configure Knife, and then finally, set up LucidWorks Big Data with a series of knife commands.

System Requirements

Ubuntu

  • Ubuntu 12.04 installed on all virtual machines (VMs) and hosting boxes, using ubuntu-12.04-server-amd64.iso. During installation:
    • Install SSH Server
    • Create user 'ubuntu' with password of 'lucid'
    • Set UTC set as time zone
  • Oracle Java JDK v6.x
  • Internet access is required on all nodes for the installation of system packages

CentOS

  • CentOS 5.8 installed on all virtual machines and hosting boxes. During installation:
    • Install xfsprogs
    • Install zip and unzip
    • Set UTC set as time zone
  • Oracle Java JDK v6.x
  • Internet access is required on all nodes for the installation of system packages

Installing Oracle JDK 6.x

  1. Download the Java JDK .bin file from http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html.
  2. Copy .bin file to the server.
  3. Install it as ./filename.bin. Accept the license and proceed with the installation.
Due to compatibility requirements, OpenJDK is not supported.

Pre-Requisites

Virtualization Software

The cluster installation is done with a prepared virtual machine with Chef Server and the cookbooks to install LucidWorks Big Data. To use the VM, you should have Oracle VM VirtualBox installed on the hosting box. Other virtualization packages may work, but are not tested and not guaranteed.

Create Appliances

Create two appliances, a small and a large. The small should have 4Gb of memory and 20Gb of disk, while the large should have 12Gb of memory and 100Gb of disk.

VNC

Install x11vnc on the hosting box with the following steps:

  1. Install VNC packages with sudo apt-get install x11vnc vnc-java
  2. Set up a client password with x11vnc -storepasswd
  3. Open ports 5800 and 5900 through your firewall
  4. Run sudo x11vnc -auth guess -forever -usepw -httpdir /usr/share/vnc-java/ -httpport 5800 &

Downloads and Credentials

After you have signed up for LucidWorks Big Data, you will be given instructions separately on how to access the virtual machine disk images.

Chef Server VM is distributed along with Lucidworks Big Data package to help provision a cluster on premise.

Requirements

Hardware Requirements

Following are the hardware requirements for the VM only. This is in addition to the hardware required to run the operating system and existing applications.

  • CPU: Dual Core or higher
  • Memory: 2 GB RAM or higher
  • Disk Space: 20 GB Disk Space available for the VM

The VM is configured with a bridged network mode, which will automatically take an IP address from the local network. Accessing the web-based user interfaces or the APIs will require using either the IP or a hostname you specify for the VM.

Set Up VM with Chef Server

We have included general instructions that can be used with any virtualization software package, and specific instructions using VirtualBox.

General Instructions

The general steps to install the virtual machine package are:

  1. Import the VM according to the instructions for your virtualization software.
  2. Login to the new machine with username 'ubuntu' and password 'lucid123'.
  3. Validate Chef Server with the steps in Validate Chef Server Installation.
  4. Copy the LucidWorks Big Data installation package to the other nodes of the cluster.
  5. Prepare Chef Server with the steps in Prepare Chef Server.

Specific Instructions for Some Virtualization Tools

Importing VM with VirtualBox

If you are using VirtualBox, follow these specific instructions to launch the VM.

  1. From the File menu, select "Import Appliance". This will open a Appliance Import Wizard. Click on Choose and browse to the directory which you downloaded the package, select the .ovf file.



  2. The next screen will show "Appliance Import Settings" where you could customize the VM configurations. Its recommended to assign two or more CPUs and 2GB or more memory for this VM. Choose Import after reviewing the configurations. It will take a few minutes for the complete VM to be imported. Once done, it will show up in your VMs list in Oracle VM VirtualBox Manager.



  3. Start the VM and wait for it to boot. Once booted, you can login with username 'ubuntu' and password 'lucid123'.

Validate Chef Server Installation

Once logged in, check the client list by running the knife command knife client list. You should get a list of 3 clients, like this:

chef-validator
chef-webui
root

Prepare Chef Server

  1. Copy and extract LWBD installation package

  2. Change into top level directory of LWBD installation package and run the following script to upload chef code onto chef server:
    ./chef_upload.sh
    
  3. Validate by listing roles and cookbooks
    1. knife role list
      1. Expected output: hadoop, sda, lwe roles
    2. knife cookbooks list
      1. Expected output: hadoop, zookeeper cookbooks

Installation

Prepare Servers for Cluster Node

If you create snapshots, it will be easy to repeat these steps across each client node.

  1. Set password-less sudo:
    1. sudo visudo
    2. Add ubuntu ALL=(ALL) NOPASSWD: ALL as the last line of the file. If there is another line with NOPASSWD above this one, remove it.
  2. Synchronize the date using ntpdate -s -b -p 8 -u 129.132.2.21
  3. Run sudo apt-get update
  4. Run sudo apt-get curl

Prepare Client Nodes Before First Boot

On each client box, before the first time they are booted, perform the following steps:

  1. Select Bridged Adapter under "Network -> Attached to"
  2. Select Paravirtualized Network under "Network -> Adapter Type"
  3. Reinitialize MAC address under "Network -> Mac Address"

Configure Network Access

  1. Set Static IP. For example, assuming 172.16.10.199 as the 'master' node, /etc/network/interfaces might look like:
    iface eth0 inet static
    address 172.16.10.199
    netmask 255.255.255.0
    network 172.16.10.0
    broadcast 172.16.10.255
    gateway 172.16.10.1
    


  2. Make sure the DNS can be resolved by updating /etc/resolv.conf:
    nameserver 4.2.2.2
    nameserver 66.7.224.17
    


  3. Assign a unique hostname to each host in /etc/hostname

  4. Modify /etc/hosts so that each node can look up the other nodes and the Chef server:
    172.16.10.197 sda
    172.16.10.198 slave
    172.16.10.199 master
    172.16.10.189 chef
    

System Usage Monitor

By default, the LucidWorks System Usage Monitor, which sends anonymous usage statistics to LucidWorks, will be enabled during provisioning. For more information about this feature, see the section System Usage Monitor. If you would prefer to disable the monitor, you can edit the Chef cookbook on the master node that enables it during installation. In cookbooks/hadoop/attributes/default.rb, find the line:

default["heartbeat"]["enabled"] = true

Change true to false. Save the file, and then proceed with the rest of the installation.

Launch Nodes

Ubuntu

Run knife commands from chef server VM to install the LucidWorks Big Data services. Run the following commands in the order shown below, replacing the <IP_ADDRESS> in each command with the actual IP of each node. If multiple services will be installed on the same node ("kafka" and "sda", for example), you should still run the commands separately as below.

knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \
   -r "role[zabbix_server]" -N env01-zabbix-server
knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \
   -r "role[zookeeper]" -N env01-zk
knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \
   -r "role[hadoop_namenode],role[hadoop_jobtracker],role[hbase_master]" \
   -N env01-NameNode-JobTracker-HbaseMaster
knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \
   -r "role[hadoop_slave]" -N env01-Hadoop_Slave
knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \
   -r "role[kafka]" -N env01-kafka
knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \
   -r "role[lwe-core],role[lwe-ui],role[lwe-connectors]" -N env01-lwe
knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \
   -r "role[sda]" -N env01-sda
If expanding beyond three nodes, you would add the additional IPs to /etc/hosts, then run the knife command to install the services for the additional node.

CentOS

Run knife commands from chef server VM to install the LucidWorks Big Data services. Run the following commands in the order shown below, replacing the <IP_ADDRESS> in each command with the actual IP of each node. If multiple services will be installed on the same node ("kafka" and "sda", for example), you should still run the commands separately as below.

knife bootstrap <IP_ADDRESS> -x root -E env01 \
   -r "role[zabbix_server]" -N env01-zabbix-server
knife bootstrap <IP_ADDRESS> -x root -E env01 \
   -r "role[zookeeper]" -N env01-zk
knife bootstrap <IP_ADDRESS> -x root -E env01 \
   -r "role[hadoop_namenode],role[hadoop_jobtracker],role[hbase_master]" -N env01-NameNode-JobTracker-HbaseMaster
knife bootstrap <IP_ADDRESS> -x root -E env01 \
   -r "role[hadoop_slave]" -N env01-Hadoop_Slave
knife bootstrap <IP_ADDRESS> -x root -E env01 \
   -r "role[kafka]" -N env01-kafka
knife bootstrap <IP_ADDRESS> -x root -E env01 \
   -r "role[lwe-core],role[lwe-ui],role[lwe-connectors]" -N env01-lwe
knife bootstrap <IP_ADDRESS> -x root -E env01 \
   -r "role[sda]" -N env01-sda
If expanding beyond three nodes, you would add the additional IPs to /etc/hosts, then run the knife command to install the services for the additional node.

Attach 'sdaevents'

The 'sdaevents' role (and associated service) should run on each node that hosts the HBase services (hbasemaster and regionserver). However, it can only be launched after all the other services. Once the other nodes have been launched, run this command:

knife node run_list add env01-sda "role[sdaevents]"

Validating the Installation

Once the knife commands have been completed, you can validate the installation with any of the approaches described in Validating Installations.

Related Topics

Labels

on-premise on-premise Delete
installation installation Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.