Sas IML STUDIO User Manual

SAS/IML®Studio 3.3 User’s Guide
SAS®Documentation
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2010. SAS/IML®Studio 3.3 User’s Guide. Cary, NC: SAS Institute Inc.
SAS/IML®Studio 3.3 User’s Guide
Copyright © 2010, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-60764-676-1
All rights reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st electronic book, November 2010
1st printing, November 2010
SAS®Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.
SAS®and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.

Contents

Chapter 1. Introduction to SAS/IML Studio . . . . . . . . . . . . . . . . 1
Chapter 2. Getting Started with SAS/IML Studio . . . . . . . . . . . . . . . 13
Chapter 3. Creating and Editing Data . . . . . . . . . . . . . . . . . . . 31
Chapter 4. Interacting with the Data Table . . . . . . . . . . . . . . . . . 39
Chapter 5. Exploring Data in One Dimension . . . . . . . . . . . . . . . . 63
Chapter 6. Exploring Data in Two Dimensions . . . . . . . . . . . . . . . 83
Chapter 7. Exploring Data in Three Dimensions . . . . . . . . . . . . . . . 111
Chapter 8. Interacting with Plots . . . . . . . . . . . . . . . . . . . . . 139
Chapter 9. General Plot Properties . . . . . . . . . . . . . . . . . . . . 153
Chapter 10. Axis Properties . . . . . . . . . . . . . . . . . . . . . . . 175
Chapter 11. Techniques for Exploring Data . . . . . . . . . . . . . . . . . 183
Chapter 12. Plotting Subsets of Data . . . . . . . . . . . . . . . . . . . . 211
Chapter 13. Distribution Analysis: Descriptive Statistics . . . . . . . . . . . . 229
Chapter 14. Distribution Analysis: Location and Scale Statistics . . . . . . . . . 239
Chapter 15. Distribution Analysis: Distributional Modeling . . . . . . . . . . . 247
Chapter 16. Distribution Analysis: Frequency Counts . . . . . . . . . . . . . 261
Chapter 17. Distribution Analysis: Outlier Detection . . . . . . . . . . . . . . 271
Chapter 18. Data Smoothing: Loess . . . . . . . . . . . . . . . . . . . . 279
Chapter 19. Data Smoothing: Thin-Plate Spline . . . . . . . . . . . . . . . 295
Chapter 20. Data Smoothing: Polynomial Regression . . . . . . . . . . . . . 305
Chapter 21. Model Fitting: Linear Regression . . . . . . . . . . . . . . . . 315
Chapter 22. Model Fitting: Robust Regression . . . . . . . . . . . . . . . . 337
Chapter 23. Model Fitting: Logistic Regression . . . . . . . . . . . . . . . . 351
Chapter 24. Model Fitting: Generalized Linear Models . . . . . . . . . . . . . 373
Chapter 25. Multivariate Analysis: Correlation Analysis . . . . . . . . . . . . 403
Chapter 26. Multivariate Analysis: Principal Component Analysis . . . . . . . . . 415
Chapter 27. Multivariate Analysis: Common Factor Analysis . . . . . . . . . . . 433
Chapter 28. Multivariate Analysis: Canonical Correlation Analysis . . . . . . . . 453
Chapter 29. Multivariate Analysis: Canonical Discriminant Analysis . . . . . . . . 465
Chapter 30. Multivariate Analysis: Discriminant Analysis . . . . . . . . . . . . 483
Chapter 31. Multivariate Analysis: Correspondence Analysis . . . . . . . . . . . 495
Chapter 32. Variable Transformations . . . . . . . . . . . . . . . . . . . 509
Chapter 33. Running Custom Analyses . . . . . . . . . . . . . . . . . . . 543
Chapter 34. Configuring the SAS/IML Studio Interface . . . . . . . . . . . . . 551
Appendix A. Sample Data Sets . . . . . . . . . . . . . . . . . . . . . 571
Appendix B. SAS/INSIGHT Features Not Available in SAS/IML Studio . . . . . . 585
Index 587
iv
Release Notes
The following release notes pertain to SAS/IML®Studio 3.3:
SAS/IML Studio was formerly named SAS®Stat Studio. SAS/IML Studio can run SAS
Stat Studio programs and modules without modification. For information about how to mi­grate your SAS Stat Studio files and directories to SAS/IML Studio, see the “Changes and Enhancements” topic in the online Help.
SAS/IML Studio requires the second maintenance of SAS 9.2 or any later release.
SAS/IML Studio includes interface to the R language. The IMLPlus language includes func-
tions that transfer data between SAS data sets and R data frames, and between SAS/IML matrices and R matrices.
You can now run portions of a program by highlighting certain statements and clicking
Program IRun. Only the highlighted statements are run.
SAS/IML Studio contains a new program editor.
SAS/IML Studio can now read and write JMP®data files.
The SAS/IML Studio user interface is available in the following languages: English,
Japanese, Korean, and Simplified Chinese.
If you need to open a data set that contains Chinese, Japanese, or Korean characters, it is
important that you configure the “Regional and Language Options” in the Windows Control Panel for the appropriate country. It is not necessary to change the Windows setting called “Language for non-Unicode programs,” which is also referred to as the system locale.
When you are running SAS/IML Studio on a Windows system configured for a language
other than English, you can still use English fonts. For details, search for the term “IMLStudio_ForceEnglishUI” in the online Help.
vi
Chapter 1

Introduction to SAS/IML Studio

Contents
What Is SAS/IML Studio? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Related Software and Documentation . . . . . . . . . . . . . . . . . . . . . . . . 2
Exploratory and Confirmatory Data Analysis . . . . . . . . . . . . . . . . . . . . 3
How Many Observations Can You Analyze? . . . . . . . . . . . . . . . . . . . . . 4
Summary of Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Comparison with SAS/INSIGHT Software . . . . . . . . . . . . . . . . . . . . . 6
Accessibility Features of SAS/IML Studio . . . . . . . . . . . . . . . . . . . . . . 10
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

What Is SAS/IML Studio?

SAS/IML Studio is a tool for data exploration and analysis. Figure 1.1 shows a typical SAS/IML Studio analysis. You can use SAS/IML Studio to do the following:
explore data through graphs linked across multiple windows
subset data
analyze univariate distributions
fit explanatory models
investigate multivariate relationships
In addition, SAS/IML Studio provides an integrated development environment that enables you to write, debug, and execute programs that combine the following:
the flexibility of the SAS/IML®matrix language
the analytical power of SAS/STAT®procedures
the data manipulation capabilities of Base SAS®software
the dynamically linked graphics of SAS/IML Studio
2 F Chapter 1: Introduction to SAS/IML Studio
the functions and user-contributed packages of the open-source R language
The programming language in SAS/IML Studio, which is called IMLPlus, is an enhanced version of the SAS/IML programming language. The “Plus” part of the name refers to new features that extend the SAS/IML language, including the ability to create and manipulate statistical graphs, to call SAS procedures, and to call functions in the R programming language.
SAS/IML Studio requires that you have a license for Base SAS, SAS/STAT, and SAS/IML software. SAS/IML Studio runs on a PC in the Microsoft Windows operating environment.
Figure 1.1 The SAS/IML Studio Interface

Related Software and Documentation

This book is one of three documents about SAS/IML Studio. In this book you learn how to use the SAS/IML Studio GUI to conduct exploratory data analysis and standard statistical analyses.
A second book, SAS/IML Studio for SAS/STAT Users, is intended for SAS/STAT programmers. In it, you learn how to use SAS/IML Studio in conjunction with SAS/STAT software in order to
Exploratory and Confirmatory Data Analysis F 3
explore data and visualize statistical models. In particular, you learn to call procedures in other SAS products such as SAS/STAT or Base SAS software by using the SUBMIT statement.
The third source of documentation is the SAS/IML Studio online Help. You can display the online Help by selecting Help IHelp Topics from the main menu. The online Help includes documenta- tion for all IMLPlus classes and associated methods.
SAS/IML Studio is part of SAS/IML software. The language used to write programs in SAS/IML Studio is called IMLPlus. This language contains SAS/IML functions and statements implemented in the IML procedure and documented in the SAS/IML User’s Guide. The IML procedure runs en­tirely on a SAS Workspace Server, whereas IMLPlus switches dynamically between a SAS server (for computations) and the PC client (for graphics). In short, the IMLPlus language consists of SAS/IML functions and subroutines “plus” additional syntax to support the creation and manipu­lation of statistical graphics. The SAS/IML Studio program windows uses color coding to display keywords in the IMLPlus language.
Most SAS/IML programs run without modification in the IMLPlus environment. The SAS/IML Studio online Help includes a list of differences between the SAS/IML language and IMLPlus.
For your convenience in referencing related SAS software, the SAS/IML User’s Guide, the SAS/STAT User’s Guide, and the Base SAS Procedures Guide are available from the SAS/IML Studio Help menu.
Exploratory and Confirmatory Data Analysis
Data analysis often falls into two phases: exploratory and confirmatory. The exploratory phase “isolates patterns and features of the data and reveals these forcefully to the analyst” (Hoaglin,
Mosteller, and Tukey 1983). If a model is fit to the data, exploratory analysis finds patterns that
represent deviations from the model. These patterns lead the analyst to revise the model, and the process is repeated.
In contrast, confirmatory data analysis “quantifies the extent to which [deviations from a model] could be expected to occur by chance” (Gelman 2004). Confirmatory analysis uses the traditional statistical tools of inference, significance, and confidence.
Exploratory data analysis is sometimes compared to detective work: it is the process of gathering evidence. Confirmatory data analysis is comparable to a court trial: it is the process of evaluating evidence. Exploratory analysis and confirmatory analysis “can—and should—proceed side by side” (Tukey 1977).
4 F Chapter 1: Introduction to SAS/IML Studio

How Many Observations Can You Analyze?

SAS/IML Studio provides the data analyst with interactive and dynamic statistical graphics. By definition, interactive graphics must respond quickly to the changes and manipulations of the ana­lyst. This quick response restricts the size of data sets that can be handled while still maintaining interactivity.
Wegman (1995) points out that the number of observations you can analyze depends on the algorith-
mic complexity of the statistical algorithms you are using. For example, if you have n observations, computing a mean and variance is O.n/, sorting is O.n log n/, and solving a least squares regres­sion on p variables is O.np2/: Furthermore, visualization of individual observations is limited by the number of pixels that can be represented on a display device.
Wegman’s conclusion is that “visualization of data sets say of size 106or more is clearly a wide open field.” More recently, Unwin, Theus, and Hofmann (2006) discuss the challenges of “visualizing a million,” including a chapter dedicated to interactive graphics.
On a typical PC (for example, a 1.8 GHz CPU with 512 MB of RAM), SAS/IML Studio can help you analyze dozens of variables and tens of thousands of observations. Visualization of data with graphics such as histograms and box plots remains feasible for hundreds of thousands of ob­servations, although the interactive graphics become less responsive. Scatter plots of this many observations suffer from overplotting.
SAS/IML Studio uses the RAM on your PC to facilitate interaction and linking between plots and data tables. If you routinely analyze large data sets, increasing the RAM on your PC might increase SAS/IML Studio’s interactivity. For example, if you routinely examine hundreds of thousands of observations in dozens of variables, 1 GB of RAM is preferable to 512 MB.

Summary of Features

SAS/IML Studio provides tools for exploring data, analyzing distributions, fitting parametric and nonparametric regression models, and analyzing multivariate relationships. In addition, you can extend the set of available analyses by writing programs.
To explore data, you can do the following:
identify observations in plots
select observations in linked data tables, bar charts, box plots, contour plots, histograms, line
plots, mosaic plots, and two- and three-dimensional scatter plots
exclude observations from graphs and analyses
search, sort, subset, and extract data
transform variables
Summary of Features F 5
change the color and shape of observation markers based on the value of a variable
To analyze distributions, you can do the following:
compute descriptive statistics
create quantile-quantile plots
create mosaic plots of cross-classified data
fit parametric and kernel density estimates for distributions
detect outliers in contaminated Gaussian data
To fit parametric and nonparametric regression models, you can do the following:
smooth two-dimensional data by using polynomials, loess curves, and thin-plate splines
add confidence bands for mean and predicted values
create residual and influence diagnostic plots
fit robust regression models and detect outliers and high-leverage observations
fit logistic models
fit the general linear model with a wide variety of response and link functions
include classification effects in logistic and generalized linear models
To analyze multivariate relationships, you can do the following:
calculate correlation matrices and scatter plot matrices with confidence ellipses for relation-
ships among pairs of variables
reduce dimensionality with principal component analysis
examine relationships between a nominal variable and a set of interval variables with discrim-
inant analysis
examine relationships between two sets of interval variables with canonical correlation anal-
ysis
reduce dimensionality by computing common factors for a set of interval variables with factor
analysis
reduce dimensionality and graphically examine relationships between categorical variables in
a contingency table with correspondence analysis
To extend the set of available analyses, you can do the following:
6 F Chapter 1: Introduction to SAS/IML Studio
write, debug, and execute IMLPlus programs in an integrated development environment
add legends, curves, maps, or other custom features to statistical graphics
create new static graphics
animate graphics
execute SAS procedures or DATA steps from within your IMLPlus programs
develop interactive data analysis programs that use dialog boxes
call computational routines written in C, FORTRAN, Java, R, or the SAS/IML language
Comparison with SAS/INSIGHT Software
SAS/IML Studio and SAS/INSIGHT®Software have the same goal: to be a tool for data explo­ration and analysis. Both have dynamically linked statistical graphics. Both come with pre-written statistical analyses for analyzing distributions, regression models, and multivariate relationships.
Figure 1.2 shows a typical SAS/INSIGHT analysis. Figure 1.3 shows the same analysis performed
in SAS/IML Studio. You can see that the analyses are qualitatively similar.
Figure 1.2 A SAS/INSIGHT Analysis
Comparison with SAS/INSIGHT Software F 7
8 F Chapter 1: Introduction to SAS/IML Studio
Figure 1.3 A Comparable SAS/IML Studio Analysis
However, there are three major differences between the two products. The first is that SAS/IML Studio runs on a PC in the Microsoft Windows operating environment. It is client software that can connect to SAS servers. The SAS server might be running on a different computer than SAS/IML Studio. In contrast, SAS/INSIGHT software runs on the same computer on which the SAS software is installed.
A second major difference is that SAS/IML Studio is programmable, and therefore extensible. SAS/INSIGHT software contains standard statistical analyses that are commonly used in data analy­sis, but you cannot create new analyses. In contrast, you can write programs in SAS/IML Studio that call any licensed SAS procedure, and you can include the results of that procedure in graphics, ta­bles, and data sets. Because of this, SAS/IML Studio is sometimes referred to as the “programmable successor to SAS/INSIGHT software.”
A third major difference is that the SAS/IML Studio statistical graphics are programmable. You can add legends, curves, and other features to the graphics in order to better analyze and visualize your data.
SAS/IML Studio contains many features that are not available in SAS/INSIGHT software. General features that are unique to SAS/IML Studio include the following:
Comparison with SAS/INSIGHT Software F 9
SAS/IML Studio can connect to multiple SAS servers simultaneously.
SAS/IML Studio can run multiple programs simultaneously in different threads; each pro-
gram has its own WORK library.
SAS/IML Studio sessions can be driven by a program and rerun.
SAS/IML Studio provides the following features of data views (tables and plots) which are not included in SAS/INSIGHT software:
modern dialog boxes with a native Windows look and feel
a line plot in which the lines can be defined by specifying a single X variable and a single Y
variable, and one or more grouping variables
a polygon plot that can be used to build interactive regions such as maps
programmatic methods to draw legends, curves, or other decorations on any plot
programmatic methods to attach a menu to any plot. After the menu is selected, a user-
specified program is run.
arbitrary unions and intersections of observations selected in different views
SAS/IML Studio also provides the following analyses and options that are not included in SAS/INSIGHT software:
a programming language that can call any licensed SAS analytical procedure and any
SAS/IML function or subroutine.
outlier detection in contaminated Gaussian data
robust regression models and detection of outliers and high-leverage observations
the generalized linear model with a multinomial response
graphical results for the analysis of logistic models with one continuous effect and a small
number of levels for classification effects
parametric and nonparametric methods of discriminant analysis
common factor analysis for interval variables
correspondence analysis for nominal variables
Features of SAS/INSIGHT software that are not included in SAS/IML Studio are presented in Appendix B, “SAS/INSIGHT Features Not Available in SAS/IML Studio.”
10 F Chapter 1: Introduction to SAS/IML Studio

Accessibility Features of SAS/IML Studio

The user interface of SAS/IML Studio includes accessibility and compatibility features that improve the usability of the product for users with disabilities, with exceptions noted below. These features are related to accessibility standards for electronic information technology that were adopted by the U.S. Government under Section 508 of the U.S. Rehabilitation Act of 1973, as amended.
If you have questions or concerns about the accessibility of SAS products, send e-mail to
accessibility@sas.com.
SAS/IML Studio supports Section 508 standards with the following exceptions:
When you type data into a data table, the JAWS screen-reading software does not indicate
which cell in the table contains the focus.
As a partial workaround, you can access the data set in Base SAS software and create an accessible HTML version of the data table, which is viewable in a browser. A SAS Note that provides this code as a macro is available from SAS Technical Support.
In the New Data Set dialog box, the labels of the Width and Decimal boxes are not read
properly by JAWS screen-reading software.
You can view SAS/IML Studio in high-contrast mode. In high-contrast mode, text is displayed in a larger font and is usually represented by white text on a black background. High-contrast modes and themes are provided by the Microsoft Windows operating system for users who cannot easily see subtle differences in shade.
You can turn on high-contrast mode by completing the following steps:
1. Open the Control Panel by selecting Start ! Settings !Control Panel.
2. Double-click Accessibility Options. The Accessibility Options dialog box appears.
3. Select the Display tab, and then select Use High Contrast.
4. Click OK to accept the high-contrast setting and close the Accessibility Options dialog box.

References

Gelman, A. (2004), “Exploratory Data Analysis for Complex Models,” Journal of Computational
and Graphical Statistics, 13(4), 755–779.
Hoaglin, D. C., Mosteller, F., and Tukey, J. W., eds. (1983), Understanding Robust and Exploratory
Data Analysis, Wiley series in probability and mathematical statistics, New York: John Wiley &
Sons.
References F 11
Tukey, J. W. (1977), Exploratory Data Analysis, Reading, MA: Addison-Wesley.
Unwin, A., Theus, M., and Hofmann, H. (2006), Graphics of Large Datasets, New York: Springer.
Wegman, E. J. (1995), “Huge Data Sets and the Frontiers of Computational Feasibility,” Journal of
Computational and Graphical Statistics, 4(4), 281–295.
12
Chapter 2

Getting Started with SAS/IML Studio

Contents
Overview of SAS/IML Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Overview of the Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Open the Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Create a Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Exclude Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Create a Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Create a Box Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Create a Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Model Variable Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Overview of SAS/IML Studio

SAS/IML Studio provides a powerful programming environment that enables you to combine SAS/IML statements with calling SAS procedures, and also enables you to create and manipu­late the attributes of dynamically linked statistical graphics. SAS/IML Studio also provides a GUI that enables you to visualize the results of statistical analyses. Furthermore, SAS/IML Studio pro­vides several prewritten analyses (all implemented in IMLPlus, the SAS/IML Studio programming language) that you can access from the Analysis menu.
This chapter describes how you can use the SAS/IML Studio GUI for exploratory data analysis. The example in this chapter uses a sample data set, Hurricanes, that is distributed with SAS/IML Studio. The example covers the following activities:
1. Opening a data set. When you open a data set, the data are displayed in a data table. Features of the data table are described in Chapter 4, “Interacting with the Data Table.”
2. Creating graphical views of the data, such as a bar chart, a histogram, a box plot, and a scatter plot. SAS/IML Studio plots and data tables are collectively known as data views. All data views are dynamically linked, which means that observations that you select in one data view are displayed as selected in all other views of the same data. Several chapters of this book are devoted to describing the SAS/IML Studio plots and how you can interact with them.
14 F Chapter 2: Getting Started with SAS/IML Studio
Especially relevant to this example are Chapter 5, “Exploring Data in One Dimension,” and Chapter 6, “Exploring Data in Two Dimensions.”
3. Modeling relationships between variables. The example uses the correlation analysis and the polynomial regression analysis. These analyses are described further in Chapter 20, “Data
Smoothing: Polynomial Regression,” and Chapter 25, “Multivariate Analysis: Correlation Analysis.”

Overview of the Sample Data

This example shows how you can use SAS/IML Studio to explore data about North Atlantic tropical cyclones. (A cyclone is a large system of winds that rotate about a center of low atmospheric pressure.) The data were recorded by the U.S. National Hurricane Center at six-hour intervals during the years 1988 to 2003.
The example analyzes the following variables:
category indicator variable that corresponds to the Saffir-Simpson wind intensity scale
latitude latitude of observation, in degrees north latitude
min_pressure minimum central sea-level pressure, in hPa
radius_eye radius of eye (if an eye exists), in nautical miles
wind_kts maximum low-level sustained wind speed, in knots
The category variable is a measure of wind intensity, corresponding to the Saffir-Simpson wind intensity scale in Table 2.1.
Table 2.1 The Saffir-Simpson Intensity Scale
Category Description Wind Speed (Knots)
TD Tropical depression 22–33 TS Tropical storm 34–63 Cat1 Category 1 hurricane 64–82 Cat2 Category 2 hurricane 83–95 Cat3 Category 3 hurricane 96–113 Cat4 Category 4 hurricane 114–134 Cat5 Category 5 hurricane 135 or greater
The analysis presented in this chapter is based on Mulekar and Kimball (2004) and Kimball and
Mulekar (2004). A full description of the Hurricanes data set is included in Chapter A, “Sample
Data Sets.”
Open the Data Set F 15

Open the Data Set

This chapter analyzes the Hurricanes data set, which is distributed with SAS/IML Studio.
To use the GUI to open the data set:
1 Select File IOpen IFile from the main menu. The Open File dialog box appears. (See Fig-
ure 2.1.)
2 Click Go to Installation directory near the bottom of the dialog box.
3 Double-click the Data Sets folder.
4 Select the Hurricanes.sas7bdat file.
Figure 2.1 Opening a Sample Data Set
5 Click Open.
The data table in Figure 2.2 appears.
16 F Chapter 2: Getting Started with SAS/IML Studio
Figure 2.2 The Hurricanes Data
The row heading of the data table includes two special cells for each observation: one that shows the location of the observation in the data set, and the other that shows the status of the observation in analyses and plots. The status of each observation is indicated by the presence or absence of a marker and a 2symbol. The presence of a marker (by default, a filled square) indicates that the observation is included in plots; observations that are excluded from plots do not display a marker. Similarly, the 2symbol indicates that the observation is included in analyses. The Hurricanes data initially has all observations included in plots and analyses. See Chapter 4, “Interacting with the
Data Table,” for more information about the data table symbols.

Create a Bar Chart

To create a bar chart of the category variable:
1 Select Graph IBar Chart from the main menu.
The Bar Chart dialog box appears. (See Figure 2.3.)
2 Select the variable category, and click Set X.
NOTE : In most dialog boxes, double-clicking a variable name adds the variable to the next ap-
propriate box.
Figure 2.3 Bar Chart Dialog Box
Create a Bar Chart F 17
3 Click OK.
The bar chart in Figure 2.4 appears. The bar chart shows the number of observations for storms in each Saffir-Simpson intensity category.
Figure 2.4 A Bar Chart
18 F Chapter 2: Getting Started with SAS/IML Studio

Exclude Observations

To exclude observations of less than tropical storm intensity (wind speeds less than 34 knots):
1 In the bar chart, click the bar labeled with the symbol .
This selects observations for which the category variable has a missing value. For these data, “missing” is equivalent to an intensity of less than tropical depression strength (wind speeds less than 22 knots).
2 Hold down the CTRL key and click the bar labeled “TD.”
When you hold down the CTRL key and click, you extend the set of selected observations. In this example, you select observations with tropical depression strength (wind speeds of 22–34 knots) without deselecting previously selected observations. The bars that contain selected observations are shown as crosshatched in Figure 2.5.
Figure 2.5 A Bar Chart with Selected Observations
3 In the data table, right-click in the row heading (to the left) of any selected observation, and select
Exclude from Plots from the pop-up menu (shown in Figure 2.6).
Notice that the bar chart redraws itself to reflect that all observations being displayed in the plots now have at least 34-knot winds. Notice also that the square symbol in the data table is removed from observations with wind speeds less than 34 knots.
Create a Histogram F 19
Figure 2.6 Data Table Pop-up Menu
4 In the data table, right-click in the row heading of any selected observation, and select Exclude
from Analyses from the pop-up menu.
Notice that the 2symbol is removed from observations with wind speeds less than 34 knots. Fu­ture analysis (for example, correlation analysis and regression analysis) will not use the excluded observations.
5 Click any data table cell to clear the selected observations.
NOTE : You can also exclude selected observations by using a keyboard shortcut. Select a plot and press the ‘e’ key to exclude selected observations from plots and from analyses. Additional keyboard shortcuts are described in Chapter 8, “Interacting with Plots.”

Create a Histogram

In this section you create a histogram of the latitude variable and examine relationships between the category and latitude variables. The figures in this section assume that you have excluded observations with low wind speeds as described in the section “Exclude Observations” on page 18.
To create a histogram:
1 Select Graph IHistogram from the main menu.
The Histogram dialog box appears. (See Figure 2.7.)
2 Select the variable latitude, and click Set X.
20 F Chapter 2: Getting Started with SAS/IML Studio
Figure 2.7 Histogram Dialog Box
3 Click OK.
A histogram (Figure 2.8) appears, which shows the distribution of the latitude variable for the storms that are included in the plots. Move the histogram so that it does not cover the bar chart or data table.
Figure 2.8 Histogram of Latitudes of Storms
Create a Histogram F 21
You have seen that you can select observations in a plot by clicking bars or observation markers. You can also select observations by drawing a selection rectangle. To draw a selection rectangle, click in a graph and hold down the left mouse button while you move the mouse pointer to a new location.
4 Draw a selection rectangle in the bar chart to select all storms of category 3, 4, and 5.
The bar chart looks like the one in Figure 2.9.
Figure 2.9 Selecting the Most Intense Storms
Note that these selected observations are also shown in the histogram in Figure 2.10. The his­togram shows the conditional distribution of latitude, given that a storm is greater than or equal to category 3 intensity. The conditional distribution shows that very strong hurricanes tend to occur between 11 and 37 degrees north latitude, with a median latitude of about 22 degrees. If these data are representative of all Atlantic hurricanes, you might conjecture that it would be relatively rare for a category 3 hurricane to strike north of the North Carolina-Virginia border (roughly
36:5ınorth latitude).
22 F Chapter 2: Getting Started with SAS/IML Studio
Figure 2.10 Latitudes of Intense Storms

Create a Box Plot

The data set contains several variables that measure the size of a tropical cyclone. One of these is the radius_eye variable, which contains the radius of a cyclone’s eye in nautical miles. (The eye of a cyclone is a calm, relatively cloudless central region.) The radius_eye variable has many missing values, because not all storms have well-defined eyes.
The following steps create a box plot that shows how the radius of a cyclone’s eye varies with the Saffir-Simpson category. The figures in this section assume that you have excluded observations with low wind speeds as described in the section “Exclude Observations” on page 18.
1 Select Graph IBox Plot from the main menu.
The Box Plot dialog box appears. (See Figure 2.11.)
Figure 2.11 Box Plot Dialog Box
Create a Box Plot F 23
2 Select the variable radius_eye, and click Set Y.
3 Select the variable category, and click Add X.
4 Click OK.
A box plot appears as in Figure 2.12. Move the box plot so that it does not cover the data table or other plots.
The box plot summarizes the distribution of eye radii for each Saffir-Simpson category. The plot indicates that the median eye radius tends to increase with storm intensity for tropical storms, category 1, and category 2 hurricanes. Category 2–4 storms have similar distributions, while the most intense hurricanes (category 5) in this data set tend to have eyes that are small and compact. The box plot also indicates considerable spread in the radii of eyes.
Recall that the radius_eye variable contains many missing values. These missing values are not displayed by the box plot. You might wonder what percentage of all storms of a given Saffir­Simpson intensity have well-defined eyes. You can determine this percentage by selecting all observations in one of the box plots and noting the proportion of observations that are selected in the bar chart.
5 Draw a selection rectangle in the box plot around the category 1 storms.
24 F Chapter 2: Getting Started with SAS/IML Studio
In the bar chart in Figure 2.12, note that approximately 25% of the bar for category 1 storms is displayed as selected, which means that approximately one quarter of the category 1 storms in this data set have nonmissing measurements for radius_eye.
Figure 2.12 Proportion of Category 1 Storms with Well-Defined Eyes
6 Drag the selection rectangle to select eye radii in other categories.
The selected observations displayed in the bar chart reveal the proportion of storms in each Saffir­Simpson category that have nonmissing values for radius_eye. Note in particular that very few tropical storms have eyes, whereas almost all category 4 and 5 storms have well-defined eyes.
7 Click outside the plot area in any plot to deselect all observations.

Create a Scatter Plot

The following steps examine the relationship between wind speed and atmospheric pressure for tropical cyclones. The National Hurricane Center routinely reports both of these quantities as indi-
Loading...
+ 574 hidden pages