|
These instructions outline the steps for installing LucidWorks Big Data on a local cluster using Chef Server. This is the preferred method of installation, and is the simplest approach. However, if you cannot use Chef Server because your nodes do not have access to the internet, please see the instructions on Provisioning a Cluster with Chef Solo.
Before starting the installation, please review the section Planning Your Cluster. It is important to have a plan for how the cluster will be structured, and which services will be co-located, before starting this process. The installation process is simpler with Chef Server than with Chef Solo, but still requires a number of steps. After reviewing the system requirements and setting up each node of the cluster, you will install Chef Server, configure Knife, and then finally, set up LucidWorks Big Data with a series of knife commands. |
System Requirements
Ubuntu
- Ubuntu 12.04 installed on all virtual machines (VMs) and hosting boxes, using ubuntu-12.04-server-amd64.iso. During installation:
- Install SSH Server
- Create user 'ubuntu' with password of 'lucid'
- Set UTC set as time zone
- Oracle Java JDK v6.x
- Internet access is required on all nodes for the installation of system packages
CentOS
- CentOS 5.8 installed on all virtual machines and hosting boxes. During installation:
- Install xfsprogs
- Install zip and unzip
- Set UTC set as time zone
- Oracle Java JDK v6.x
- Internet access is required on all nodes for the installation of system packages
Installing Oracle JDK 6.x
- Download the Java JDK .bin file from http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html.
- Copy .bin file to the server.
- Install it as ./filename.bin. Accept the license and proceed with the installation.
| Due to compatibility requirements, OpenJDK is not supported. |
Pre-Requisites
Virtualization Software
The cluster installation is done with a prepared virtual machine with Chef Server and the cookbooks to install LucidWorks Big Data. To use the VM, you should have Oracle VM VirtualBox installed on the hosting box. Other virtualization packages may work, but are not tested and not guaranteed.
Create Appliances
Create two appliances, a small and a large. The small should have 4Gb of memory and 20Gb of disk, while the large should have 12Gb of memory and 100Gb of disk.
VNC
Install x11vnc on the hosting box with the following steps:
- Install VNC packages with sudo apt-get install x11vnc vnc-java
- Set up a client password with x11vnc -storepasswd
- Open ports 5800 and 5900 through your firewall
- Run sudo x11vnc -auth guess -forever -usepw -httpdir /usr/share/vnc-java/ -httpport 5800 &
Downloads and Credentials
After you have signed up for LucidWorks Big Data, you will be given instructions separately on how to access the virtual machine disk images.
Chef Server VM is distributed along with Lucidworks Big Data package to help provision a cluster on premise.
Requirements
Hardware Requirements
Following are the hardware requirements for the VM only. This is in addition to the hardware required to run the operating system and existing applications.
- CPU: Dual Core or higher
- Memory: 2 GB RAM or higher
- Disk Space: 20 GB Disk Space available for the VM
The VM is configured with a bridged network mode, which will automatically take an IP address from the local network. Accessing the web-based user interfaces or the APIs will require using either the IP or a hostname you specify for the VM.
Set Up VM with Chef Server
We have included general instructions that can be used with any virtualization software package, and specific instructions using VirtualBox.
General Instructions
The general steps to install the virtual machine package are:
- Import the VM according to the instructions for your virtualization software.
- Login to the new machine with username 'ubuntu' and password 'lucid123'.
- Validate Chef Server with the steps in Validate Chef Server Installation.
- Copy the LucidWorks Big Data installation package to the other nodes of the cluster.
- Prepare Chef Server with the steps in Prepare Chef Server.
Specific Instructions for Some Virtualization Tools
Importing VM with VirtualBox
If you are using VirtualBox, follow these specific instructions to launch the VM.
- From the File menu, select "Import Appliance". This will open a Appliance Import Wizard. Click on Choose and browse to the directory which you downloaded the package, select the .ovf file.

- The next screen will show "Appliance Import Settings" where you could customize the VM configurations. Its recommended to assign two or more CPUs and 2GB or more memory for this VM. Choose Import after reviewing the configurations. It will take a few minutes for the complete VM to be imported. Once done, it will show up in your VMs list in Oracle VM VirtualBox Manager.

- Start the VM and wait for it to boot. Once booted, you can login with username 'ubuntu' and password 'lucid123'.
Validate Chef Server Installation
Once logged in, check the client list by running the knife command knife client list. You should get a list of 3 clients, like this:
chef-validator chef-webui root
Prepare Chef Server
- Copy and extract LWBD installation package
- Change into top level directory of LWBD installation package and run the following script to upload chef code onto chef server:
./chef_upload.sh
- Validate by listing roles and cookbooks
- knife role list
- Expected output: hadoop, sda, lwe roles
- knife cookbooks list
- Expected output: hadoop, zookeeper cookbooks
- knife role list
Installation
Prepare Servers for Cluster Node
If you create snapshots, it will be easy to repeat these steps across each client node.
- Set password-less sudo:
- sudo visudo
- Add ubuntu ALL=(ALL) NOPASSWD: ALL as the last line of the file. If there is another line with NOPASSWD above this one, remove it.
- Synchronize the date using ntpdate -s -b -p 8 -u 129.132.2.21
- Run sudo apt-get update
- Run sudo apt-get curl
Prepare Client Nodes Before First Boot
On each client box, before the first time they are booted, perform the following steps:
- Select Bridged Adapter under "Network -> Attached to"
- Select Paravirtualized Network under "Network -> Adapter Type"
- Reinitialize MAC address under "Network -> Mac Address"
Configure Network Access
- Set Static IP. For example, assuming 172.16.10.199 as the 'master' node, /etc/network/interfaces might look like:
iface eth0 inet static address 172.16.10.199 netmask 255.255.255.0 network 172.16.10.0 broadcast 172.16.10.255 gateway 172.16.10.1 - Make sure the DNS can be resolved by updating /etc/resolv.conf:
nameserver 4.2.2.2 nameserver 66.7.224.17
- Assign a unique hostname to each host in /etc/hostname
- Modify /etc/hosts so that each node can look up the other nodes and the Chef server:
172.16.10.197 sda 172.16.10.198 slave 172.16.10.199 master 172.16.10.189 chef
System Usage Monitor
By default, the LucidWorks System Usage Monitor, which sends anonymous usage statistics to LucidWorks, will be enabled during provisioning. For more information about this feature, see the section System Usage Monitor. If you would prefer to disable the monitor, you can edit the Chef cookbook on the master node that enables it during installation. In cookbooks/hadoop/attributes/default.rb, find the line:
default["heartbeat"]["enabled"] = true
Change true to false. Save the file, and then proceed with the rest of the installation.
Launch Nodes
Ubuntu
Run knife commands from chef server VM to install the LucidWorks Big Data services. Run the following commands in the order shown below, replacing the <IP_ADDRESS> in each command with the actual IP of each node. If multiple services will be installed on the same node ("kafka" and "sda", for example), you should still run the commands separately as below.
knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \ -r "role[zabbix_server]" -N env01-zabbix-server knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \ -r "role[zookeeper]" -N env01-zk knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \ -r "role[hadoop_namenode],role[hadoop_jobtracker],role[hbase_master]" \ -N env01-NameNode-JobTracker-HbaseMaster knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \ -r "role[hadoop_slave]" -N env01-Hadoop_Slave knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \ -r "role[kafka]" -N env01-kafka knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \ -r "role[lwe-core],role[lwe-ui],role[lwe-connectors]" -N env01-lwe knife bootstrap <IP_ADDRESS> -x ubuntu --sudo -E env01 \ -r "role[sda]" -N env01-sda
| If expanding beyond three nodes, you would add the additional IPs to /etc/hosts, then run the knife command to install the services for the additional node. |
CentOS
Run knife commands from chef server VM to install the LucidWorks Big Data services. Run the following commands in the order shown below, replacing the <IP_ADDRESS> in each command with the actual IP of each node. If multiple services will be installed on the same node ("kafka" and "sda", for example), you should still run the commands separately as below.
knife bootstrap <IP_ADDRESS> -x root -E env01 \ -r "role[zabbix_server]" -N env01-zabbix-server knife bootstrap <IP_ADDRESS> -x root -E env01 \ -r "role[zookeeper]" -N env01-zk knife bootstrap <IP_ADDRESS> -x root -E env01 \ -r "role[hadoop_namenode],role[hadoop_jobtracker],role[hbase_master]" -N env01-NameNode-JobTracker-HbaseMaster knife bootstrap <IP_ADDRESS> -x root -E env01 \ -r "role[hadoop_slave]" -N env01-Hadoop_Slave knife bootstrap <IP_ADDRESS> -x root -E env01 \ -r "role[kafka]" -N env01-kafka knife bootstrap <IP_ADDRESS> -x root -E env01 \ -r "role[lwe-core],role[lwe-ui],role[lwe-connectors]" -N env01-lwe knife bootstrap <IP_ADDRESS> -x root -E env01 \ -r "role[sda]" -N env01-sda
| If expanding beyond three nodes, you would add the additional IPs to /etc/hosts, then run the knife command to install the services for the additional node. |
Attach 'sdaevents'
The 'sdaevents' role (and associated service) should run on each node that hosts the HBase services (hbasemaster and regionserver). However, it can only be launched after all the other services. Once the other nodes have been launched, run this command:
knife node run_list add env01-sda "role[sdaevents]"
Validating the Installation
Once the knife commands have been completed, you can validate the installation with any of the approaches described in Validating Installations.