INTRODUCTION:
StackIQ has partnered with IBM to simplify the process of deploying IBM InfoSphere BigInsights. StackIQ Cluster Manager is ideal for deploying the hardware infrastructure and application stacks of heterogeneous data center environments. For InfoSphere BigInsights this includes proper configuration of disk, InfoSphere BigInsights accounts, and passwordless SSH required for a fully functioning InfoSphere BigInsights cluster.
In this post, we’ll discuss how this is done, followed by a step-by-step guide to installing InfoSphere BigInsights with StackIQ.
Components:
The hardware used for this deployment was a small cluster: 1 node (i.e., 1 server) is used for the StackIQ Cluster Manager, 1 node serves as the BigInsights manager, and 4 nodes are used as backend or data nodes. In the simplest example, each node has 1 disk and all nodes are connected together via 1Gb Ethernet on a private network. StackIQ Cluster Manager and the InfoSphere BigInsights manager server are also connected to a corporate public network using a second NIC. Additional networks dedicated to Hadoop services can also be connected but are not used for purposes of this example. StackIQ Cluster Manager has been used in similar deployments whether with 2 nodes or well over 4,000+ nodes.
Step 1: Getting Started
The StackIQ Cluster Manager node is installed from bare metal (i.e., there is no software pre-installed) by burning the StackIQ Cluster Core Roll ISO to DVD and booting from it (the StackIQ Cluster Core Roll can be obtained from the “Rolls” section after registering at http://www.stackiq.com/download/).
Let’s pause for a moment. For those of you unfamiliar with StackIQ, Rolls are additional software packages that allow for extending the base system through mass installation and configuration of many nodes in parallel. These Rolls are what makes our automation platform flexible and easily customizable.
The Cluster Core Roll leads the user through a few short forms (e.g., what is the IP address of StackIQ Cluster Manager, what is the gateway, DNS server, etc.) and then asks for a base OS DVD (for example, Red Hat Enterprise Linux 6.5; other Red Hat-like distributions such as CentOS are supported as well, but for Red Hat Enterprise Linux, only certified media is acceptable). The installer copies all the bits from both DVDs and automatically generates a new Red Hat distribution by blending the packages from both DVDs together.
The remainder of the StackIQ Cluster Manager installation requires no further manual steps and this entire step takes between 30 to 40 minutes.
A detailed description of StackIQ Cluster Manager can be found in section 3 of the StackIQ Users Guide. It is highly recommended that you familiarize yourself with at least this section before proceeding. (The print is large and there are plenty of pictures so it isn’t that bad.)
https://s3.amazonaws.com/stackiq-release/stack3/roll-cluster-core-usersguide.pdf
If you have further questions, please contact [email protected] for additional information.
This is what you'll need:
- An installed StackIQ Cluster Manager frontend. See the above documentation.
- An ISO of CentOS or RHEL 6.5. NOT 6.6, really, seriously, 6.5 or InfoSphere BigInsights won't install.
- The InfoSphere BigInsights tar file, either Community or Enterprise edition. Community can be downloaded from IBM:
- The InfoSphere BigInsights Bridge Roll from StackIQ. It can be downloaded from here:
- Patience and the support email: [email protected] should you run into difficulties.
Step 2: Install the Biginsights Bridge Roll.
StackIQ has developed software that “bridges” our core infrastructure management solution to InfoSphere BigInsights named the BigInsights Bridge Roll (now there's a surprise). The BigInsights Bridge Roll is used to create the biadmin/bigsql/catalog user accounts, passwordless SSH access for these accounts, and other critical configuration steps as indicated in the InfoSphere BigInsights documentation. The BigInsights Bridge roll prepares the cluster to allow the deployment of InfoSphere BigInsights via the BigInsights installer without any further configuration from you. (We do recommend that you set site-specific passwords, and we'll show you how this is done shortly.) This allows you to leverage the InfoSphere BigInsights manager to install a fully functioning Hadoop and Analytics cluster with minimal interaction.
StackIQ Cluster Manager uses “Rolls” to combine packages (RPMs) and configuration (XML files which are used to build custom kickstart files) to dynamically add and automatically configure software services and applications.
The first step is to install a StackIQ Cluster Manager as a deployment machine. This requires that you use, at a minimum, the cluster-core and RHEL 6.5 ISOs. It’s not possible to add StackIQ Cluster Manager on an already existing RHEL 6.5 machine. You must begin with the installation of StackIQ Cluster Manager. The biginsights-bridge roll can be added once the StackIQ Cluster Manager is up and running or during installation of the frontend.
Please be aware RHEL/CentOS 6.5 is a hard requirement for IBM InfoSphere BigInsights. As of this writing, RHEL/CentOS 6.6 is not supported by InfoSphere BigInsights.
It is highly recommended that you check the MD5 checksums of the downloaded media.
You must burn the cluster-core roll and RHEL Server 6.5 ISOs to disk, or, if installing via virtual CD/DVD, simply mount the ISOs on the machine's virtual media via the BMC.
Then follow this https://s3.amazonaws.com/stackiq-release/stack3/roll-cluster-core-usersguide.pdf for instructions on how to install StackIQ Cluster Manager in section 3. (Yes! I mentioned it again.)
What You’ll Need:
Copy the roll to a directory on the StackIQ Cluster Manager. "/export" is a good place as it should be the largest partition.
Verify the MD5 checksums:
# md5sum biginsights-bridge-1.1-stack4.x86_64.disk1.iso
Should return:
7f6b9e9d5008e6833d7cc9e1b1862c6b biginsights-bridge-1.1-stack4.x86_64.disk1.iso
Then execute the following commands on the frontend:
# rocks add roll biginsights-bridge*.iso # rocks enable roll biginsights-bridge# rocks create distro# rocks run roll biginsights-bridge | bash
The BigInsights Bridge roll will enable you to set-up a BigInsights Manager node from which to install BigInsights on the rest of the cluster.
Step 3: Install BigInsights Manager and backend nodes
The next step is to install the BigInsights Manager. Before we do this, however, it is advisable to change the default passwords that were installed for the biadmin, bigsql, and catalog users.
StackIQ Cluster Manager drives infrastructure automation via key/value pairs called "attributes" in the cluster manager database. These attributes can be set and over-ridden at the global, appliance, and host level. There are several attributes, including these user passwords, for InfoSphere BigInsights.
To see these attributes do the following:
# rocks list attr attr=biginsights.

(The period at the end of that is required. You can also do "rocks list attr | grep biginsights" but it's not as cool.)
At the moment we will deal with the three password values. You'll notice that these values are not shown in the output, this is because they are set to "shadow" and are only available to the root and apache users during kickstart.
The current passwords are all set to "biadmin." You want to change this largely because everyone who reads this blog post now knows what your passwords are. (I admit this is not likely to approach millions of views, but it will be searchable so....)
Change them like this:

The "rocks set attr" command will change the password as given on the command line. The SSL command will hash these passwords in the database and hide them with the "shadow=true" flag. You can use different passwords for each account or the same password. You will need to know these passwords when configuring BigInsights with the BigInsights UI installation. It is highly recommended that you clear your history after running the above commands. (Also, don't use "mynewpassword" as the password because, well, you know, millions of views and all that.)
You'll also notice two other attributes. These control partitioning schemas. For the default installation, these are fine as is. The default partitioning uses only the system disk (assuming it's large). The biginsights.data_mountpoint only matters when bigsinsights.partitioning is set to "multidisk." In the default "singledisk" case, only the sytem disk is used. In the multidisk case, any disks other than the system disk will have /hadoop0X where X is the number of the disk in the array. You can change this mountpoint by changing the attribute value. Further elucidation for more advanced configuration will be found in a follow-up blog post. If you need to know how to do it now, send email to [email protected], and we'll walk you through the proper changes.
So let's install some backend nodes.
There are two ways to do this: using "discovery" mode or using a properly formatted host CSV file. We will use discovery and leave the configuration of a host CSV file for later. The discovery mode assumes you have full control of your network and can set the frontend into promiscuous DHCP mode. If you don't have this control over the network, you'll have to add hosts via spreadsheet. Instructions for configuration with a host.csv fall under more advanced configuration and will be covered in a further blog post.
Go to the StackIQ Cluster Manager Web UI via the public hostname or the public IP. In this example the public IP is 192.168.1.50

Go to the Discovery tab.

Verify "Automatically Detect" is chose and click "Continue."

Click "Enable" to start discovery mode.

Click Start, it will ask you to login. Login as "root" with the root password you supplied during installation.

On the "Appliance" drop down, choose "bi-manager" and click "Start."
Note: You only need one bi-manager appliance. Don't install more than one.

Turn on the machine that will act as the bi-manager. It should be set to PXE boot first. This will be discovered and installed. Once the button turns gray and the visualization starts. It has been kickstarted. It will look something like this:

You can now install the rest of the backend nodes. Click "Stop"

Choose "Compute" in the Appliance dropdown.

Click "Start." Now boot all the other machines which also should have been set to PXE boot first.

These will be discovered and start installing. The visualization shows the peer-to-peer installer sharing of the RPM packages. This allows for scaling during installation, 2 or 1000 nodes takes about the same time.

Once the buttons next to the compute nodes turn green. Click "Stop." You are ready for the next step, Installing BigInsights.

Step 4: Installing IBM InfoSphere BigInsights
This tutorial assumes the use of the BigInsights community edition. The Enterprise Edition should be similar but requires a purchased license from IBM. It is our goal to make certain to automate as much as possible but allow the full use of a product's capabilities. Sometimes this means automation takes a back seat to allow for the full use of a given product. This allows both users (you) and the vendor (IBM) to be able to have the correct set of tools required to fully deploy and support the application. This means there are some steps that must be done by hand. Using the IBM BigInsights Installer in following the manner allows for greater site customization and better support capabilities.
Before we begin, to get to the BigInsights Installer WebUI, we want to have an IP address we can get to. You can do this on the private subnet IP for the bi-manager, but it may be easier to assign it a public IP, well, public to your subnet, so we'll add an interface on our public network.
Set the IP:
# rocks set host interface ip bi-manager-0-0 eth1 192.168.1.51
Set the network subnet:
# rocks set host interface subnet bi-manager-0-0 eth1 public
Verify:
# rocks list host interface bi-manager-0-0
Sync the network on the bi-manager.
# rocks sync host network bi-manager-0-0

This machine should now have an interface on the public subnet.
Now we need to install the BigInsights installer tar file. You should have downloaded this from IBM. Here is a link to the community edition. The Enterprise Edition must be purchased and downloaded with a license.
http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/downloads.html
Copy the tar file to either the frontend or the bi-manager machine directly.
# scp iibi3001_QuickStart_x86_64.tar.gz bi-manager-0-0:/home/biadmin
Log into the bi-manager machine and change the permissions on the tar file.
Change to the biadmin user.
# su - biadmin
Untar it.
# tar -xvf iibi3001_QuickStart_x86_64.tar.gz
Change to the BigInsights installer directory and start the installer.
# cd biginsights-3.0.0.1-quickstart-nonproduction-Linux-amd64-b20140918_1248
# ./start.sh
This is what the above steps look like:

When the installer is started, it will list public and private URLS to continue the BigInsight Web UI installation. Go to any of the URLs you have access to and follow the installation promps.
Step 5: Installing BigInsights using the BigInsights Installer Web UI.
In our example case, the bi-manager is at 192.168.1.51 so we'll open a browser at 192.168.1.51:8300

Click "Next" on the intro page. It's likely a good idea to read it.

Accept the license and choose "Next."

Since this is a "singledisk" install and the system disk is large, we can accept the default directory structure. If we had chosen "multidisk," we would reconsider this. But for now, defaults are fine.

On this next screen choose the second option: "Use the current user biadmin with passwordless sudo privileges on all nodes." Trust me on this. The biginsights-bridge roll sets up passwordless sudo access on all nodes for biadmin, catalog, and bigsql. This greases the install and cuts some time off the installation. Then click "Next."

Let's add the nodes we've installed to the BigInsights instance. Choose "Add Nodes."

You'll get a window in which to add nodes. Make it simple, use a regex and click "OK."

You can never have to many buttons to push, so make sure your nodes are correct and available and then "Accept" them. They have fragile egos and need your validation.

And then do the "Next" thing.

Now add the passwords you defined above in the biginsights.bigsql_password and the biginsights.biadmin_password attributes and click next.

Accept the defaults on this screen and hit "Next."

In the default install, the bi-manager is the name node. Change these per your site specifications warrant it and hit "Next."

Since this is a pretty basic install we'll set-up PAM with flat file authentication." If you have LDAP, please send email to [email protected] on the process to make LDAP work. Then hit "Next."

Check over everything to see if it meets your site criteria and then click "Install."

The installation is going to take a bit, far longer then the installation of the actual machines. The more machines you have in the cluster, the longer the BigInsights installation will take because not all aspects of the installer are parallel. However, when it succeeds you'll have a bullet-proof BigInsights installation.

The log can be watched during installation from a terminal window on the bi-manager or from the Log tab in the StackIQ UI. Cut and paste the log path and you can watch it there. This will be more fully covered in a following post.
Sooner or later it will be done:

Click "Finish."

Go the BigInsights Console URL on the bi-manager at port 8080. In this case it is 192.168.1.51:8080

Log in as the "biadmin" user with the password you supplied for the biginsights.biadmin_password url. If you kept the defaults (really?) it will be "biadmin."

Which should bring you to the BigInsights Console. From here, consult the IBM BigInsights documentation for further use.

CONCLUSION:
With the help of StackIQ and IBM you should now have a functioning IBM InfoSphere BigInsights installation on the pile of machines that have been glaring at you in your data center. StackIQ is ideal for automating some of the more tedious parts of cluster installation and allow you to fully deploy a functioning Hadoop and analytics cluster to further your business needs.