IBM InfoSphere User Manual

Installing and configuring InfoSphere Streams on a virtual machine
RedHat Enterprise Linux on VMware
Skill Level: Intermediate
Edward J Pring (pring@us.ibm.com)
Senior Software Engineer IBM
08 Apr 2010
Section 1. Introduction
IBM InfoSphere Streams provides a highly scalable platform for analyzing structured and unstructured data while it is in motion. InfoSphere Streams provides an intuitive and extensible development environment for creating, compiling, and deploying streaming applications.
Streaming applications are composed of streams (reliable, ordered, one-way message flows), operators (configurable functions that filter, aggregate, enrich, or transform the messages in streams) and adapters (specialized operators that continuously ingest data and output analysis results).
InfoSphere Streams provides a rich set of general-purpose operators, plus
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 1 of 37
developerWorks® ibm.com/developerWorks
containers for reusing existing C/C++ and Java® code as streaming operators. InfoSphere Streams can also be extended with toolkits of domain-specific operators.
Streaming applications are declared as a data flow graph with the Stream Processing Language. The flow graph specifies the data types the application's streams will carry, which adapters and operators will process the data as it flows through the application, and how the operators will be interconnected by streams. Figure 1 illustrates the data flow graph for a streaming application.
Figure 1. Streaming application flow graph
Large streaming applications can span more than a hundred Linux server machines. When developing applications for InfoSphere Streams, you may find it more convenient to install it onto a virtual machine. Installing onto a virtual machine enables you to design and test streaming applications from your regular laptop or workstation computer.
This tutorial guides you through a step-by-step procedure for creating a self-contained InfoSphere Streams development environment on a virtual machine. To accomplish this, you install and configure these four software products:
• VMware provides a virtual machine capability for Microsoft Windows and Apple Mac computers. (Refer to http://www.vmware.com/products/.)
• Red Hat Enterprise Server provides the operating system for IBM InfoSphere Streams. (Refer to https://www.redhat.com/rhel/server/.)
• IBM InfoSphere Streams provides a streaming runtime and application development tools. (Refer to
http://www.ibm.com/software/data/infosphere/streams/.)
• Eclipse provides the integrated application development platform for the InfoSphere Streams Studio tools. (Refer to http://www.eclipse.org/.)
This tutorial outlines the specific installation steps you need to take with each
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 2 of 37
ibm.com/developerWorks developerWorks®
product and suggests specific values for many configuration steps. However, you should refer to the official documentation for each product for details, options, and clarification. Refer to the Resources section of this tutorial for links to the products' documentation.
Following are the main tasks covered by the tutorial:
• Obtain product distribution packages
• Install VMware
• Install and configure Red Hat Enterprise Linux
• Install IBM InfoSphere Streams
• Install Eclipse and InfoSphere Streams Studio
• Verify the install
Many of the steps depend on previous steps, so you should execute all the steps in the order in which they are presented.
Section 2. Obtain product distribution packages
Before you begin, you need to obtain each of the software products listed below. You should have at least 30GB of available disk space on your computer for the distribution packages and the virtual machine that you will create.
You can obtain the distribution packages for these products and technologies through your company, or download them from the Web sites that are provided. In either case, you need to obtain licenses for the products. Free time-limited licenses are available for the first three products in the list and the Eclipse license is free with no time limit. Refer to the Resources section of this tutorial for additional links for each of the products.
Note: Make sure you have the same version (either the 32-bit or 64-bit) for Red Hat Enterprise Linux, IBM InfoSphere Streams, and Eclipse.
• VMware Workstation for Windows, release 7, or VMware Fusion for Mac OS X, version 3. Refer to http://www.vmware.com/products/ to obtain VMware products. The distribution package is an executable install program of about 400MB. Depending on your operating system, the package has a name
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 3 of 37
developerWorks® ibm.com/developerWorks
similar to either VMware-workstation-full-7.0.1-227600.exe for Microsoft Windows, or Vmware-Fusion-3.0.0-204229.dmg for Mac OS X.
• Red Hat Enterprise Linux, release 5. Refer to https://www.redhat.com/rhel/server/ to obtain the Red Hat Enterprise Linux product. The distribution package is a DVD disc image of about 3,330MB. The 64-bit version has a name similar to RHEL5.4-Server-20090819.0-x86_64-DVD.iso.
• IBM InfoSphere Streams, release 1.2. Refer to
https://www14.software.ibm.com/webapp/iwm/web/reg/pick.do?lang=en_US&source=SWG-STREAMS_TRIAL
to obtain a trial version of IBM InfoSphere Streams. The distribution package is a compressed directory archive of about 300MB. It has a name of either Streams-1.2.0-i386-el5-trial.tar.gz for the 32-bit version, or Streams-1.2.0-x86_64-el5-trial.tar.gz for the 64-bit version.
If you use the trial version of IBM InfoSphere Streams, you also need to download the license file from the same Web site as the distribution package. The license file is named LicenseCert_1.0.0.0.trial.txt.
• Eclipse integrated development platform, release 3.5, plus the IMP technology for Eclipse, version 0.1.v201001291500. The Eclipse distribution package is a compressed directory archive of about 160MB. The 64-bit version has a name similar to eclipse-SDK-3.5.2-linux-gtk-x86_64.tar.gz.
You also need the IMP technology for the Eclipse platform, which is available from http://download.eclipse.org/technology/imp/. InfoSphere Streams requires IMP technology release v0.1.v201001291500. The IMP technology distribution package is a compressed directory archive of about 45MB with the name org.eclipse.imp.update_0.1.v201001291500.zip.
Section 3. Install VMware
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 4 of 37
ibm.com/developerWorks developerWorks®
VMware Workstation (for Microsoft Windows operating system) or VMware Fusion (for Mac OS X operating system) allows you to create a virtual machine on your computer. Within that virtual machine, you can then run Red Hat Enterprise Linux, which is the operating system that InfoSphere Streams requires.
This section of the tutorial provides a summary of the VMware install procedure. For more details, refer to the Resources section of this tutorial for links to the VMware Workstation User's Manual or the Getting Started with VMware Fusion manual.
Locate the VMware distribution package
Locate your VMware distribution package. Depending on your operating system, the package has a name similar to either
VMware-workstation-full-7.0.1-227600.exe for Microsoft Windows, or Vmware-Fusion-3.0.0-204229.dmg for Mac OS X.
This file contains the VMware install program.
Install VMware Workstation or VMware Fusion
Install the VMware Workstation or VMware Fusion product from the distribution package as you would any other software product for your computer.
Follow the instructions that accompany your evaluation or purchase license to obtain a license key. To activate the product, launch the VMware application. Then, from
the menu bar select VMware > License ....
Copy and paste your license key into the Serial Number field of the "Licensing" dialog.
Section 4. Install and configure Red Hat Enterprise Linux
Red Hat Enterprise Linux provides the operating system for InfoSphere Streams and Eclipse.
Follow the steps in this section to install Red Hat Enterprise Linux in a virtual machine provided by VMware. For more details, refer to the Resources section of this tutorial for links to the Red Hat Enterprise Linux Installation Guide and Deployment Guide.
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 5 of 37
developerWorks® ibm.com/developerWorks
Note: Red Hat Enterprise Linux, InfoSphere Streams, and Eclipse are available in both 32-bit and 64-bit versions. You may use either version, but you must use the same version for all three products.
Locate the Red Hat Enterprise Linux distribution package
Locate your Red Hat Enterprise Linux distribution package file. This file contains a DVD disc image, which contains the Red Hat Enterprise Linux install program. The 64-bit version has a name similar to RHEL5.4-Server-20090819.0-x86_64-DVD.iso.
Start installing Red Hat Enterprise Linux
Follow these steps to create a virtual machine within your computer and begin to install Red Hat Enterprise Linux in it. you need about 20 gigabytes of free space on your computer's disk drive for the virtual machine's disk.
1. Launch the VMware application that you installed in the previous section.
2. From the VMware menu bar, select File > New ....
3. On the "Create a new virtual machine" dialog, click continue without a disc.
4. On the "Installation Media" dialog, select Use operating system installation disc image file, select the .iso file that contains your Red Hat Enterprise Linux distribution package (Figure 2), and then click
Continue. Figure 2. VMware Installation Media is RHEL DVD image
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 6 of 37
ibm.com/developerWorks developerWorks®
5. On the "Choose Operating System" dialog, verify that the Operating System field is set to Linux.
6. Also on the "Choose Operating System" dialog, verify that the Version field is set to either Red Hat Enterprise Linux 5 or Red Hat Enterprise Linux 5 64-bit, depending on whether you downloaded the 32-bit or 64-bit version, and click Continue.
7. When you see a dialog that offers to install Linux automatically, choose to install manually instead. Do this by either deselecting the Use Easy
Install option (Figure 3), or by selecting I will install the operating system later. This ensures that you see all of the Red Hat Enterprise
Linux install dialogs described below.
Figure 3. VMware Linux Easy Install option disabled
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 7 of 37
developerWorks® ibm.com/developerWorks
8. On the "Finish" dialog, accept the default virtual machine configuration.
9. On the "RED HAT ENTERPRISE LINUX 5" dialog, go to the boot prompt, and press your Enter/Return key.
10. On the "CD Found" dialog, verify that Skip is selected (with the keyboard, not the mouse), and press your Enter/Return key.
11. On the "Language Selection" dialogs, click Next.
12. On the "Installation Number" dialog, select Skip entering installation number, click OK, and then click Skip.
13. On the "Partition Table" warning dialog, click Yes.
14. On the "Partitioning Layout" dialog, verify that Remove Linux partitions on selected drive and create default layout is selected, click Next, and then click Yes.
15. On the "Network Devices" dialog, verify that a virtual ethernet device named eth0 is defined and active (Figure 4), and then click Next.
Figure 4. RHEL verifying ethernet interface
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 8 of 37
ibm.com/developerWorks developerWorks®
16. On the "Region" dialog, select your local time zone and click Next.
17. On the "Root Account" dialog, enter a password twice and click Next. Make sure you remember this password — you will need to enter it several times in subsequent steps of this tutorial.
18. On the "Software Customization" dialog, select the Software
Development option (Figure 5), select Customize now, and then click Next. Figure 5. RHEL selecting Software Development packages
19. On the "Software Packages" dialog, accept at least the default packages in each category plus any additional packages you want and click Next.
20. Click Next again to start the Linux install process. You can expect the Linux install process to continue for about 15 to 20 minutes without requiring any further interaction.
21. When the Linux install process prompts you to reboot, do so.
Finish installing Red Hat Enterprise Linux
After the Linux install process reboots, follow these steps to finish installing Red Hat
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 9 of 37
developerWorks® ibm.com/developerWorks
Enterprise Linux.
1. On the "Welcome" dialog, accept the defaults and click Forward.
2. On the "License Agreement" dialog, accept the defaults and click Forward.
3. On the "Firewall" dialog, verify that the SSH service is selected (Figure 6), click Forward, and then click Yes.
Figure 6. RHEL enabling SSH service
4. On the "SELinux Setting" dialog, select Permissive (Figure 7). (Do not select Enforcing or Disabled.) Click Forward, and then click Yes. (For more information on SELinux, see the A note about SELinux section of this tutorial.)
Figure 7. RHEL changing SELinux
5. On the "Kdump" dialog, accept the default and click Forward.
6. On the "Date and Time" dialog, set the date and local time, and click Forward.
7. On the "Software Updates" dialog, select No, I prefer to register at a later time, click Forward, click No, thanks, and then click Forward again.
8. On the "Create User" dialog, do not enter any names or passwords, just click Forward, and then click Continue.
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 10 of 37
ibm.com/developerWorks developerWorks®
9. On the "Sound Card" dialog, click Play to test, and then click Forward.
10. On the "Additional CDs" dialog, click Finish.
11. Reboot Linux again if you are prompted to do so.
12. After Linux reboots, when it prompts you to log in, log in as username root with the password you specified on the "Root Account" dialog in the steps of the previous section. The following steps for configuring Linux must be executed while logged in as root. However, InfoSphere Streams does not require root privileges. Subsequent steps in this tutorial instruct you on how to create a Linux user account for InfoSphere Streams.
Install VMware Tools
By installing the VMware Tools package on your Linux virtual machine, you get access to convenient connections between Linux and Windows or Mac OS X for common user tasks.
Before installing the VMware Tools package, make sure the Red Hat Enterprise Linux disc image has been disconnected from your virtual machine's CD/DVD drive. If not, you can disconnect it by right-clicking its Linux Desktop icon, and selecting Eject from the context menu.
After the disc image has been ejected from the virtual CD/DVD drive, follow these steps to install the VMware Tools package.
1. From the VMware menu bar, select Virtual Machine > Install VMware
Tools (Figure 8). Figure 8. RHEL mounting VMware Tools DVD image
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 11 of 37
developerWorks® ibm.com/developerWorks
2. When the "VMware Tools" window appears on the Linux Desktop, open the VMwareTools -xxxx.tar.gz package with the Archive Manager by double-clicking its icon.
3. In the Archive Manager, select the vmware-tools-distrib package and extract it onto the Linux Desktop.
4. Open the vmware-tools-distrib folder on the desktop by double-clicking its icon.
5. Run the vmware-install.pl program by double-clicking its icon in the folder, and then clicking run in terminal (Figure 9).
Figure 9. RHEL executing VMware Tools install program
Installing and configuring InfoSphere Streams on a virtual machine Trademarks
© Copyright IBM Corporation 2010. All rights reserved. Page 12 of 37
Loading...
+ 25 hidden pages