SPSS 13 BASE BASE USERS GUIDE

Page 1
SPSS® 13.0 Base User's Guide
Page 2
For more information about SPSS®software products, please visit our Web site at http://www.spss.com or contact
SPSS is a registered trademark and t he other product names are the t rademarks of SPSS Inc. for its proprietary computer software. No m the trademark and license rights in the software and the copyrights in the published materials.
The SOFTWARE and documentation are provided with RESTRICTED R IGHTS. Use, duplication , or disclosure by the Governmentis clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker Drive, 11th Floor, Chicago, IL 60606-6412.
General notice: Other product names mentioned herein are used for identification purposes only and may be trademarks of their respec
TableLook is a trademark of SPSS Inc. Windows is a registered trademark of Microsoft Corporation. DataDirect, D Portions of this product were created using LEADTOOLS © 1991–2000, LEAD Technologies, Inc. ALL RIGHTS RESERVED. LEAD, LEADTOOLS, and LEADVIEW are registered trademarks of LEAD Technologies, Inc. Sax Basic is a t All rights reserved. Portions of this product were based on the work of the FreeType Team (http://www.freetype.org). A portion of th software is provided “as is,” without express or implied warranty. A portion of the SPSS software contains Sun Java Runtime libraries. Copyright © 2003 by Sun Microsystems, Inc. All rights reserved. The licensed from IBM and are available at http://oss.software.ibm.com/icu4j/.
SPSS® Base 13.0 User’s Guide Copyright © 20 All rights reserved. Printed in the United States of America.
No part of this p electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.
1234567890 060504
ISBN 0-13-185
-3000
aterial describing such software may be produced or distributed without the written permission of the owners of
subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software
tive companies.
ataDirect Connect, INTERSOLV,and SequeLink are registered trademarks of DataDirect Technologies.
rademark of Sax Software Corporation. Copyright © 1993–2004 by Polar Engineering and Consulting.
e SPSS software contains zlib technology. Copyright © 1995–2002 by Jean-loup Gailly and Mark Adler. The zlib
Sun Java Runtime libraries include code licensed from RSA Security, Inc. Some portions of the libraries are
04 by SPSS Inc.
ublication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means,
723-1
Page 3
SPSS 13.0
Preface
SPSS 13.0 is a comprehensive system for analyzing data. SPSS can take data from almost any distributions and trends, descriptive statistics, and complex statistical analyses.
This manual, the SPSS® Base 13.0 User’s Guide, documents the graphical user interfac in SPSS Base 13.0 are provided in the Help system, installed with the software. Algorithms used in the statistical procedures are available on the product CD-ROM.
In additi Some extended features of the system can be accessed only via command syntax. (Those features are not available in the Student Version.) Complete command syntax i Help menu.
type of file and use them to generate tabulated reports, charts and plots of
e of SPSS for Windows. Examples using the statistical procedures found
on, beneath the menus and dialog boxes, SPSS uses a command language.
s documented in the SPSS 13.0 Command Syntax Reference, available on the
SPSS Opt
ions
The foll Version) SPSS Base system:
SPSS Regression Models™ provides techniques for analyzing data that do not fit
tradit regression, weight estimation, two-stage least-squares regression, and general nonlinear regression.
SPSS Ad
experimental and biomedical research. It includes procedures for general linear models (GLM), linear mixed models, variance components analysis, loglinear analy and basic and extended Cox regression.
owing options are available as add-on enhancements to the full (not Student
ional linear statistical models. It includes procedures for probit analysis, logistic
vanced Models™
sis, ordinal regression, actuarial life tables, Kaplan-Meier survival analysis,
focuses on techniques often used in sophisticated
iii
Page 4
SPSS Tables™ creates a variety of presentation-quality tabular reports, including
complex stub-and-banner tables and displays of multiple response data.
SPSS Trends™
performs comprehensive forecasting and time series analyses with multiple curve-fitting models, smoothing models, and methods for estimating autoregressive functions.
ies®
SPSS Categor
performs optimal scaling procedures, including correspondence
analysis.
SPSS Conjoint™ performs conjoint analysis.
ests™
SPSS Exact T
calculates exact p values for statistical tests when small or very
unevenly distributed samples could make the usual tests inaccurate.
SPSS Missin g Value Analysis™ describes patterns of missing data, estimates means
and other st
SPSS Maps™ turns your geographically distributed data into high-quality maps with
atistics, and imputes values for missing observations.
symbols, colors, bar charts, pie charts, and combinations of themes to present not only what i
SPSS Complex Samples™ allows survey, market, health, and public opinion
s happening but where it is happening.
researchers, as well as social scientists who use sample survey methodology, to incorpora
SPSS Classification Trees™ creates a tree-based classification model. It classifies
te their complex sample designs into data analysis.
cases into groups or predicts values of a dependent (target) variable based on values of indepe
ndent (predictor) variables. The procedure provides validation tools for
exploratory and confirmatory classification analysis. The SPSS family of products also includes applications for data entry, text analysis,
cation, neural networks, and flowcharting.
classifi
Installation
To install the Base system, run the License Authorization Wizard using the
zation code that you received from SPSS Inc. For more information, see the
authori installation instructions supplied with the SPSS Base system.
iv
Page 5
Compatibility
SPSS is designed to run on many computer systems. See the installation instructions that came with your system for specific information on minimum and recommended requirement
Serial Numbers
s.
Your serial number is your identification number with SPSS Inc. You will need this serial numb
er when you contact SPSS for information regarding support, payment, or
an upgraded system. The serial number was provided with your Base system.
Customer Se
If you have
rvice
any questions concerning your shipment or account, contact your local office, listed on the SPSS Web site at http://www.spss.com/worldwide. Please have your serial number ready for identification.
Training Seminars
SPSS Inc. provides both public and onsite training seminars. All seminars feature hands-on workshops. Seminars will be offered in major cities on a regular basis. For more info
rmation on these seminars, contact your local office, listed on the SPSS
Web site at http://www.spss.com/worldwide.
Technica
The servi
l Support
ces of SPSS Technical Support are available to registered customers. Customers may contact Technical Support for assistance in using SPSS or for installation help for one of the supported hardware environments. To reach Technical Support
, see the SPSS Web site at http://www.spss.com, or contact your local office, listedontheSPSSWebsiteathttp://www.spss.com/worldwide.Bepreparedto identify yourself, your organization, and the serial number of your system.
v
Page 6
Additional Publications
Additional copies of SPSS product manuals may be purchased directly from SPSS Inc. Visit the SPSS Web Store at http://www.spss.com/estore, or contact your local SPSS office,
listed on the SPSS Web site at http://www.spss.com/worldwide.For telephone orders in the United States and Canada, call SPSS Inc. at 800-543-2185. For telephone orders outside of North America, contact your local office, listed on the SPSS W
eb site.
The SPSS Statistical Procedures Companion, by Marija Norušis, has been published by Prentice Hall. A new version of this book, updated for SPSS 13.0, is planned. T
he SPSS Advanced Statistical Procedures Companion,alsobasedon SPSS 13.0, is forthcoming. The SPSS Guide to Data Analysis for SPSS 13.0 is also in development. Announcements of publications available exclusively through Prentice Hall will b your home country, and then click
e available on the SPSS Web site at http://www.spss.com/estore (select
Books).
Tel l Us Yo
Your comm
ur Thoughts
ents are important. Please let us know about your experiences with SPSS products. We especially like to hear about new and interesting applications using the SPSS system. Please send e-mail to suggest@spss.com or write to SPSS Inc., Attn.: D
irector of Product Planning, 233 South Wacker Drive, 11th Floor, Chicago,
IL 60606-6412.
About Th
This man
is Manual
ual documents the graphical user interface for the procedures included in the Base system. Illustrations of dialog boxes are taken from SPSS for Windows. Dialog boxes in other operating systems are similar. Detailed information about the comman
d syntax for features in this module is provided in the SPSS Command Syntax
Reference, available from the Help menu.
ting SPSS
Contac
If you w
ould like to be on our mailing list, contact one of our offices, listed on our
Web site at http://www.spss.com/worldwide.
vi
Page 7
Contents
1Overview 1
What’s New
Windows ................................................... 4
Menus..................................................... 6
Status Bar
Dialog Box
Variable N
Dialog Box
Subdialog
Selecting V
Getting Inf
Getting Inf
Basic Steps
Statistics
Finding Out
inSPSS13.0?...................................... 2
.................................................. 7
es ................................................ 7
ames and Variable Labels in Dialog Box Lists . . . . . . . . . . . . . . . . 8
Controls ........................................... 8
Boxes.............................................. 9
ariables............................................ 9
ormation about Variables in Dialog Boxes . . . . . . . . . . . . . . . . . . 10
ormationaboutDialogBoxControls...................... 10
inDataAnalysis ................................... 11
Coach............................................. 12
MoreaboutSPSS................................... 12
2 Getting Help 13
Using the H
Using the H
Getting He
Getting He
Using Case
Copying He
elpTableofContents................................. 14
elpIndex.......................................... 15
lponDialogBoxControls.............................. 16
lponOutputTerms................................... 17
Studies........................................... 18
lpTextfromaPop-UpWindow......................... 18
vii
Page 8
3 Data Files 19
Opening a Da
To Open Data
Data File Ty
Opening Fil
Reading Exc
How the Data
How the Data
Reading Data
Selecting a D
Database Log
Selecting Da
Creating a Pa
Aggregating
Defining Var
Sorting Case
Results ................................................... 38
TextWizard ................................................ 4
File Informat
Saving Data Fi
To S a v e Modifi
Saving Data Fi
Saving Data Fi
Saving Data Fi
Saving Data: D
Saving Subset
Saving File Op
Protecting Or
Virtual Active
taFile .......................................... 19
Files........................................... 19
pes.............................................. 20
eOptions.......................................... 21
elFiles........................................... 21
Editor Reads Older Excel Files and Other Spreadsheets . . . . 22
EditorReadsdBASEFiles........................... 22
baseFiles ....................................... 23
ataSource....................................... 24
in.............................................. 25
taFields ......................................... 26
rameterQuery.................................... 34
Data............................................ 35
iables ........................................... 36
s............................................... 37
ion.............................................. 51
les ............................................ 51
edDataFiles.................................... 51
lesinExcelFormat................................ 52
lesinSASFormat................................. 53
lesinOtherFormats............................... 55
ataFileTypes.................................... 55
sofVariables.................................... 57
tions........................................... 58
iginalData....................................... 58
File............................................ 59
0
viii
Page 9
4 Distributed
DistributedversusLocalAnalysis................................ 63
Analysis Mode 63
5 Data Edito
Data View
Variable
Entering
Editing D
GotoCase ................................................. 9
Case Selec
Data Edito
Data Edito
r75
.................................................. 75
View............................................... 76
Data............................................... 88
ata ................................................ 90
tionStatusintheDataEditor........................... 95
rDisplayOptions..................................... 95
rPrinting........................................... 96
6 Data Preparation 99
Defining
Copying D
Identify
Visual Ba
Banding V
Automati
Copying B
User-Mis
VariableProperties................................... 100
ataProperties...................................... 107
ingDuplicateCases ................................... 116
nder.............................................. 119
ariables .......................................... 121
callyGeneratingBandedCategories...................... 124
andedCategories................................... 127
singValuesintheVisualBander......................... 128
4
ix
Page 10
7 Data Transfo
rmations 129
Computing V
Functions................................................. 1
Missing Val
Random Num b
Count Occur
Recoding Va
Recode into
Recode into D
RankCases................................................ 14
Automatic Re
Date and Time
Time Series D
Scoring Data
ariables......................................... 129
32
uesinFunctions................................... 133
erGenerators .................................. 133
rencesofValueswithinCases........................ 135
lues............................................ 137
SameVariables................................... 137
ifferentVariables ................................ 140
code .......................................... 147
Wizard ....................................... 150
ataTransformations............................... 166
withPredictiveModels............................ 174
8 File Handling and File Transformations 175
SortCases................................................ 1
Transpose................................................. 1
Merging Dat
AddCases ................................................ 17
Add Variabl
Aggregate D
SplitFile.................................................. 1
Select Case
Weight Case
Restructur
aFiles.......................................... 177
es.............................................. 181
ata ............................................ 184
s .............................................. 190
s.............................................. 195
ingData.......................................... 197
75
76
89
3
7
x
Page 11
9 Working with
Viewer................................................... 221
UsingOutputinOtherApplications.............................. 229
PastingObjectsintotheViewer ................................ 233
PasteSpecial.............................................. 233
PastingObjectsfromOtherApplicationsintotheViewer.............. 234
ExportOutput.............................................. 234
ViewerPrinting............................................. 245
SavingOutput.............................................. 251
Output 221
10 Draft Viewe
To C r e a t e D
Controlli
Fonts in Dr
To P r i n t Dra
To S a v e Draf
ngDraftOutputFormat................................. 255
r253
raftOutput ....................................... 254
aftOutput......................................... 260
ftOutput......................................... 260
tViewerOutput................................... 262
11 Pivot Tables 263
ManipulatingaPivotTable.................................... 263
WorkingwithLayers......................................... 268
Bookmarks................................................ 272
ShowingandHidingCells..................................... 273
EditingResults............................................. 275
ChangingtheAppearanceofTables............................. 275
TableProperties............................................ 278
xi
Page 12
ToChangePivotTableProperties............................... 278
TableProperties:General..................................... 279
ToChangeGeneralTableProperties............................. 279
TableProperties:Footnotes................................... 280
ToChangeFootnoteMarkerProperties........................... 280
TableProperties:CellFormats ................................. 281
ToChangeCellFormats ...................................... 282
TableProperties:Borders..................................... 282
ToChangeBordersinaTable.................................. 283
ToDisplayHiddenBordersinaPivotTable........................ 284
TableProperties:Printing..................................... 284
ToControlPivotTablePrinting ................................. 284
Font ..................................................... 285
DataCellWidths............................................ 286
CellProperties............................................. 287
ToChangeCellProperties..................................... 287
CellProperties:Value........................................ 288
ToChangeValueFormatsinaCell............................... 288
ToChangeValueFormatsforaColumn........................... 288
CellProperties:Alignment .................................... 289
ToChangeAlignmentinCells.................................. 289
CellProperties:Margins...................................... 290
ToChangeMarginsinCells.................................... 290
CellProperties:Shading...................................... 291
ToChangeShadinginCells.................................... 291
FootnoteMarker............................................ 291
SelectingRowsandColumnsinPivotTables....................... 292
ToSelectaRoworColumninaPivotTable........................ 293
ModifyingPivotTableResults.................................. 293
PrintingPivotTables......................................... 294
xii
Page 13
ToPrintHiddenLayersofaPivotTable........................... 294
Controlling Table Breaks for W ide and Long Tables. . . . . . . . . . . . . . . . . . 295
12 Working wit
SyntaxRules............................................... 298
PastingSyntaxfromDialogBoxes .............................. 299
CopyingSyntaxfromtheOutputLog............................. 300
EditingSyntaxinaJournalFile................................. 302
ToRunCommandSyntax...................................... 303
MultipleExecuteCommands................................... 304
13 Frequenci
FrequenciesStatistics ....................................... 310
FrequenciesCharts.......................................... 312
FrequenciesFormat......................................... 312
14 Descript
DescriptivesOptions......................................... 317
es 307
ives 315
h Command Syntax 297
15 Explore
ExploreStatistics........................................... 323
319
xiii
Page 14
ExplorePlots .............................................. 324
ExploreOptions ............................................ 325
16 Crosstabs 3
Crosstabs
Crosstabs
Crosstabs
Crosstabs
Crosstabs
Layers........................................... 329
ClusteredBarCharts ................................ 330
Statistics......................................... 330
CellDisplay ....................................... 334
TableFormat ...................................... 335
27
17 Summarize 337
SummarizeOptions.......................................... 339
SummarizeStatistics ........................................ 340
18 Means 343
MeansOptions............................................. 346
19 OLAP Cu
ubesStatistics ....................................... 352
OLAP C
ubesDifferences...................................... 355
OLAP C
OLAP Cu
besTitle ........................................... 356
bes 349
xiv
Page 15
20 T Tests 357
Independen
Paired-Sam
One-Sample
t-SamplesTTest................................... 357
plesTTest....................................... 361
TTest.......................................... 364
21 One-Way ANOVA 367
One-Way A
One-Way A
One-Way AN
NOVAContrasts................................... 370
NOVAPostHocTests............................... 371
OVAOptions..................................... 374
22 GLM Univariate Analysis 377
GLM Mode
GLM Cont
GLM Prof
GLM Post H
GLMSave................................................. 3
GLM Optio
UNIANOVA
l................................................ 381
rasts............................................. 383
ilePlots........................................... 385
ocComparisons................................... 386
89
ns............................................... 391
CommandAdditionalFeatures........................ 392
23 Bivariate Correlations 395
BivariateCorrelationsOptions ................................. 398
CORRELATIONS and NONPAR CORR Command Additional Features . . . . . 399
xv
Page 16
24 Partial Corr
elations 401
Partial Cor
relationsOptions ................................... 404
25 Distances 405
DistancesDissimilarityMeasures............................... 407
DistancesSimilarityMeasures................................. 408
26 Linear Regression 409
LinearRegressionVariableSelectionMethods..................... 414
LinearRegressionSetRule.................................... 415
LinearRegressionPlots ...................................... 416
LinearRegression:SavingNewVariables......................... 417
LinearRegressionStatistics................................... 420
LinearRegressionOptions .................................... 422
27 Curve Estimation 425
CurveEstimationModels ..................................... 429
CurveEstimationSave....................................... 430
28 Discri
DiscriminantAnalysisDefineRange............................. 434
minant Analysis 431
xvi
Page 17
DiscriminantAnalysisSelectCases ............................. 435
DiscriminantAnalysisStatistics................................ 435
DiscriminantAnalysisStepwiseMethod.......................... 437
DiscriminantAnalysisClassification............................. 438
DiscriminantAnalysisSave.................................... 440
29 Factor Anal
FactorAnalysisSelectCases.................................. 446
FactorAnalysisDescriptives................................... 447
FactorAnalysisExtraction .................................... 448
FactorAnalysisRotation...................................... 450
FactorAnalysisScores....................................... 451
FactorAnalysisOptions ...................................... 452
30 Choosing a
31 TwoStep C
TwoStepClusterAnalysisOptions............................... 459
TwoStepClusterAnalysisPlots................................. 462
TwoStepClusterAnalysisOutput ............................... 463
ysis 441
Procedure for Clustering 453
luster Analysis 455
32 Hierarc
HierarchicalClusterAnalysisMethod............................ 469
hical Cluster Analysis 465
xvii
Page 18
HierarchicalClusterAnalysisStatistics........................... 470
HierarchicalClusterAnalysisPlots.............................. 471
HierarchicalClusterAnalysisSaveNewVariables.................. 471
33 K-Means Clu
K-MeansClusterAnalysisEfficiency............................. 477
K-MeansClusterAnalysisIterate............................... 478
K-MeansClusterAnalysisSave ................................ 478
K-MeansClusterAnalysisOptions.............................. 479
34 Nonparame
Chi-SquareTest............................................ 482
BinomialTest.............................................. 487
RunsTest................................................. 489
One-SampleKolmogorov-SmirnovTest........................... 492
Two-Independent-SamplesTests............................... 495
Two-Related-SamplesTests................................... 499
TestsforSeveralIndependentSamples .......................... 503
TestsforSeveralRelatedSamples.............................. 507
tric Tests 481
ster Analysis 473
35 Multiple
MultipleResponseDefineSets................................. 512
MultipleResponseFrequencies................................ 513
MultipleResponseCrosstabs.................................. 516
Response Analysis 511
xviii
Page 19
MultipleResponseCrosstabsDefineRanges ...................... 518
MultipleResponseCrosstabsOptions............................ 518
MULTRESPONSECommandAdditionalFeatures ................... 519
36 Reporting R
ReportSummariesinRows.................................... 521
ReportSummariesinColumns ................................. 529
REPORTCommandAdditionalFeatures........................... 534
37 Reliabili
Reliability Analysis Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
RELIABILITYCommandAdditionalFeatures ....................... 541
38 Multidim
MultidimensionalScalingShapeofData.......................... 545
MultidimensionalScalingCreateMeasure........................ 546
MultidimensionalScalingModel................................ 547
MultidimensionalScalingOptions............................... 548
ALSCALCommandAdditionalFeatures........................... 549
ty Analysis 537
ensional Scaling 543
esults 521
39 Ratio St
tatistics............................................. 553
Ratio S
atistics 551
xix
Page 20
40 Overview of t
CreatingandModifyingaChart................................. 555
ChartDefinitionOptions...................................... 561
he Chart Facility 555
41 ROC Curves
ROC Curve
Options.......................................... 572
569
42 Utilities 573
VariableInformation......................................... 573
DataFileComments......................................... 574
VariableSets .............................................. 575
DefineVariableSets......................................... 575
UseSets.................................................. 576
ReorderingTargetVariableLists................................ 577
43 Options 579
GeneralOptions............................................ 580
ViewerOptions............................................. 582
DraftViewerOptions ........................................ 583
OutputLabelOptions ........................................ 585
ChartOptions.............................................. 587
InteractiveChartOptions ..................................... 591
PivotTableOptions.......................................... 593
xx
Page 21
DataOptions............................................... 594
CurrencyOptions........................................... 595
ScriptOptions.............................................. 597
44 Customizin
MenuEditor............................................... 599
CustomizingToolbars........................................ 600
ShowToolbars............................................. 600
ToCustomizeToolbars ....................................... 601
45 Productio
Using the Production Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
ExportOptions ............................................. 610
UserPrompts.............................................. 612
ProductionMacroPrompting.................................. 614
ProductionOptions.......................................... 614
FormatControlforProductionJobs.............................. 615
RunningProductionJobsfromaCommandLine.................... 618
PublishtoWeb............................................. 619
SmartViewerWebServerLogin................................ 621
n Facility 607
g Menus and Toolbars 599
46 SPSS Scri
ToRunaScript............................................. 623
ScriptsIncludedwithSPSS ................................... 624
pting Facility 623
xxi
Page 22
Autoscripts................................................ 625
CreatingandEditingScripts................................... 626
ToEditaScript............................................. 627
ScriptWindow............................................. 628
StarterScripts ............................................. 631
CreatingAutoscripts......................................... 632
HowScriptsWork........................................... 636
TableofObjectClassesandNamingConventions................... 638
NewProcedure(Scripting).................................... 643
AddingaDescriptiontoaScript................................ 646
ScriptingCustomDialogBoxes................................. 646
DebuggingScripts .......................................... 650
ScriptFilesandSyntaxFiles................................... 653
47 Output Manag
OutputObjectTypes......................................... 661
CommandIdentifiersandTableSubtypes......................... 663
TableLabels............................................... 664
OMSOptions .............................................. 666
Logging .................................................. 671
ExcludingOutputDisplayfromtheViewer......................... 671
RoutingOutputtoSPSSDataFiles .............................. 672
OXMLTableStructure........................................ 682
OMSIdentifiers ............................................ 686
ement System 657
xxii
Page 23
Appendices
A Database Access Administrator 689
B Customizing HTML Documents 691
To Add Customized HTML Code to Exported Output Documents . . . . . . . . 691
ContentandFormatoftheTextFileforCustomizedHTML............. 692
ToUseaDifferentFileorLocationforCustomHTMLCode ............ 692
Index 695
xxiii
Page 24
Page 25

Overview

SPSS for Windows provides a powerful statistical analysis and data management system in a g to do most of the work for you. Most tasks can be accomplished simply by pointing and clicking the mouse.
Chapter
1
raphical environment, using descriptive menus and simple dialog boxes
In additio Windows provides:
Data Editor. A versatile spreadsheet-like system for defining, entering, editing, and
displayin
Viewer. The Viewer makes it easy to browse your results, selectively show and hide
output, change the display order results, and move presentation-quality tables and charts be
Multidimensional pivot tables. Your results come alive with multidimensional pivot
tables. Explore your tables by rearranging rows, columns, and layers. Uncover importan splitting your table so that only one group is displayed at a time.
High-resolution graphics. High-resolution, full-color pie charts, bar charts, histograms,
scatter
Database access. Retrieve information from databases by using the Database Wizard
instead of complicated SQL queries.
Data tra
You can easily subset data, combine categories, add, aggregate, merge, split, and transpose files, and more.
Electro
export tables and charts in HTML format for Internet and intranet distribution.
n to the simple point-and-click interface for statistical analysis, SPSS for
gdata.
tween SPSS and other applications.
t findings that can get lost in standard reports. Compare groups easily by
plots, 3-D graphics, and more are included as standard features in SPSS.
nsformations.
nic distribution.
Transformation features help get your data ready for analysis.
Send e-mail reports to others with the click of a button, or
1
Page 26
2
Chapter 1
Online Help. Detailed tutorials provide a comprehensive overview; context-sensitive
Help topics in dialog boxes guide you through specific tasks; pop-up definitions in pivot table
results explain statistical terms; the Statistics Coach helps you find the procedures that you need; and Case Studies provide hands-on examples of how to use statistical procedures and interpret the results.
Command lan
guage.
Although most tasks can be accomplished with simple point-and-click gestures, SPSS also provides a powerful command language that allows you to save and automate many common tasks. The command language also provides s
ome functionality not found in the menus and dialog boxes.
Complete command syntax documentation is automatically installed when you
install SPSS. To access the syntax documentation:
E From the menus choose
Help
Command Sy
ntax Reference

What’s New in SPSS 13.0?

There are many new features in SPSS 13.0.
Data Management
The Date and Time Wizard makes it easy to p erform many calculations with
dates and or subtracting a duration from a date. For more information, see “Date and Time Wizard ” in Chapter 7 on p. 150.
You can append aggregated results to the working data file. For more information,
see “Agg
You can c
files. For more information, see “Automatic Recode” in Chapter 7 on p. 147.
Stringvaluescanbeupto32,767byteslong. Previously, the limit was 255 bytes.You can create multiple panes in Data View of the Data Editor. For more
information, see “Data Editor Display Options” in Chapter 5 on p. 95.
times, including calculating the difference between dates and adding
regateData”inChapter8onp.184.
reate consistent autorecoding schemes for multiple variables and data
Page 27
Charts
3-D bar charts.Population pyramids.Dot plots.Paneled charts.
3
Overview
Statistical
New ClassifNew GLM procNew LogistiNew MultiplAICandBICs
Enhancements
ication Tree option for building tree models.
edure in the Complex Samples option.
c Regression procedure in the Complex Samples option.
e Correspondence procedure in the Categories option.
tatistics added to Multinomial Logistic Regression in the Regression Models option, plus the ability to specify the type of statistic used for determining the addition and removal of model terms when using various stepwise met
hods.
Output
Output Management System Control Panel. For more information, see “Output
Management System” in Chapter 47 on p. 657.
Export Viewer output to PowerPoint. For more information, see “Export Output”
in Chapter 9
More outpu
on p. 234.
t sorting options and the ability to hide subtotaled categories in
Custom Tables (Tables option).
Pivot table output for Curve Estimation and Multiple Response in the Base
system, all time series procedures (including the Trends option), Kaplan-Meier and Life Ta
bles in the Advanced Models option, and Nonlinear Regression in the
Regression Models option.
Command Syntax
You can control the working directory with the new CD command.
Page 28
4
Chapter 1
You can use t
More information about command syntax is available in the SPSS Command Syntax Reference PDF file, which you can access from the Help menu.
SPSS Server
You can score data based on models built with many SPSS procedures. For more
You can wor

Windows

You can control the treatment of error conditions in included command syntax
files with t
informatio
he new
n, see “Scoring Data with Predictive Models” in Chapter 7 on p. 174.
INSERT command.
FILE HANDLE command to define directory paths.
he
k with database sources more efficiently by preaggregating and/or presorting data in the database before reading it into SPSS. For more information, see “Aggregating Data” in Chapter 3 on p. 35.
There are a
Data Editor. This window displays the contents of the data file. You can create n ew
number of different types of windows in SPSS:
data files or modify existing ones with the Data Editor. The Data Editor window opens auto
matically when you start an SPSS session. You can have only one data
file open at a time.
Viewer. All statistical results, tables, and charts are displayed in the Viewer. You can
edit the o
utput and save it for later use. A Viewer window opens automatically the
first time you run a procedure that generates output.
Draft Viewer. You can display output as simple text (instead of interactive pivot tables)
in the Dra
Pivot Table Editor. Output displayed in pivot tables can be modified in many ways with
ft Viewer.
the Pivot Table Editor. You can edit text, swap data in rows and columns, add color, create m
Chart Editor. You can modify high-resolution charts and plots in chart windows. You
ultidimensional tables, and selectively hide and show results.
can change the colors, select different type fonts or sizes, switch the horizontal and
l axes, rotate 3-D scatterplots, and even change the chart type.
vertica
Page 29
Overview
Text Output Editor. Text output not displayed in pivot tables can be modified with the
Text Output Editor. You can edit the output and change font characteristics (type, style, colo
Syntax Editor. You can paste your dialog box choices into a syntax window, where
r, size).
your selections appear in the form of command syntax. You can then edit the command syn
tax to use special features of SPSS not available through dialog boxes.
You can save these commands in a file for use in subsequent SPSS sessions.
Script Editor. Scripting and OLE automation allow you to customize and automate
many tasks
Figure 1-1
Data Editor and Viewer
in SPSS. Use the Script Editor to create and modify basic scripts.
5
Page 30
6
Chapter 1
Designated versus Active Window
If you have more than one open Viewer window, output is routed to the designated Viewer window. If you have more than one open Syntax Editor window, command syntax is pa are indicated by an exclamation point (!) in the status bar. You can change the designated windows at any time.
The designa the currently selected window. If you have overlapping windows, the active window appears in the foreground. If you open a window, that window automatically becomes the active
Changing the Designated Window
E Make the window that you want to designate the active window (click anywhere
in the window).
E Click the Designate Window tool on the toolbar (the one with the exclamation point).
or
sted into the designated Syntax Editor window. The designated windows
ted window should not be confused with the active window, which is
window and the designated window.
E From the menus choose:

Menus

Utilities
Designate Window
Figure 1-2
Designate Window tool
Many of the tasks that you want to perform with SPSS start with menu selections. Each win
dow in SPSS has its own menu bar with menu selections appropriate for
that window type.
The Analyze and Graphs menus are available in all windows, making it easy to generat
e new output without having to switch windows.
Page 31

Status Bar

7
Overview
The status b
Command status. For each procedure or command that you run, a case counter
indicates the number of cases processed so far. For statistical procedures that require iterative p
Filter status. If you have selected a random sample or a subset of cases for analysis,
the message Filter on indicates that some type of case filtering is currently in effect and not all
Weight status. The message Weight on indicates that a weight variable is being used
to weight cases for analysis.
Split File
separate groups for analysis, based on the values of one or more grouping variables.
Showing an
E From the m
View
Status Bar
Dialog Bo
ar at the bottom of each SPSS windowprovides the followinginformation:
rocessing, the number of iterations is displayed.
cases in the data file are included in the analysis.
status.
The message Split File on indicates that the data file has been split into
d Hiding the Status Bar
enus choose:
xes
Most menu and options for analysis.
Dialog boxes for statistical procedures and charts typically have two basic components:
Source va
allowed by the selected procedure are displayed in the source list. Use of short string and long string variables is restricted in many procedures.
Target v
for the analysis, such as dependent and independent variable lists.
selections open dialog boxes. You use dialog boxes to select variables
riable list.
ariable list(s).
A list of variables in the working data file. Only variable types
One or more lists indicating the variables that you have chosen
Page 32
8
Chapter 1
Variable Nam
You can disp
To control t
menu in any window.
To define or modify variable labels, use Variable View in the Data Editor.For data imported from database sources, field names a re used as variable labels.For long labels, position the mouse pointer over the label in the list to view
theentirelabel.
If no variable label is defined, the variable name is displayed.
Figure 1-3
Variable la
es and Variable Labels in Dialog Box Lists
lay either variable names or variable labels in dialog box lists.
he display of variable names or labels, choose
bels displayed in a dialog box
Options from the Edit

Dialog Box Controls

There are five standard controls in most dialog boxes:
OK. Runs the procedure. After you select your variables and choose any additional
cations, click
specifi
Paste. Generates command syntax from the dialog box selections and pastes the
syntax into a syntax window. You can then customize the commands with additional
s not available from dialog boxes.
feature
OK to run the procedure. This also closes the dialog box.
Page 33
Reset. Deselects any variables in the selected variable list(s) and resets all
specifications in the dialog box and any subdialog boxes to the default state.
9
Overview
Cancel. Can
opened and closes the dialog box. Within a session, dialog box settings are persistent. A dialog box retains your last set of specifications until you override them.
Help. Conte
information about the current dialog box. You can also get help on individual dialog box controls by clicking the control with the right mouse button.
cels any changes in the dialog box settings since the last time it was
xt-sensitive Help. This takes you to a Help window that contains

Subdialog Boxes

Since most procedures provide a great deal of flexibility, not all of the possible choices can be contained in a single dialog box. The main dialog box usually contains the minimu are made in subdialog boxes.
In the main dialog box, controls with an ellipsis (...) after the name indicate that a
subdialo
m information required to r un a procedure. Additional specifications
g box will be displayed.

Selecting Variables

To select a single variable, you simply highlight it on the source variable list and click the variable list, you can double-click individual variables to move them from the source list to the target list.
right arrow button next to the target variable list. If there is only one target
You can also select multiple variables:
To select multiple variables that are grouped together on the variable list, click the
first one and then Shift-click the last one in the group.
To select multiple variables that are not grouped together on the variable list, use
the Ctrl and so on.
-click method. Click the first variable, then Ctrl-click the next variable,
Page 34
10
Chapter 1
Getting Info
E Right-cli E Select Var
Figure 1-4
Variable information with right mouse button
rmation about Variables in Dialog Boxes
ck on a variable in the source or target variable list.
iable Information
from the pop-up context menu.

Getting Information about Dialog Box Controls

E Right-click the control you want to know about. E Select What’s This? on the pop-up context menu.
A pop-up window displays information about the control.
Page 35
Figure 1-5
Right mouse button “What’s This?” pop-up Help for dialog box controls
11
Overview

Basic Steps in Data Analysis

Analyzing data with SPSS is easy. All you have to do is:
Get your data into SPSS. You can open a previously saved SPSS data file; read a
spreadsheet, database, or text data file; or enter your data directly in the Data Editor.
Select a procedure. Select a procedure from the menus to calculate statistics or
to create a chart.
Select the variables for the analysis. Thevariablesinthedatafilearedisplayedina
dialog box for the procedure.
Run the procedure and look at the results. Results are displayed in the Viewer.
Page 36
12
Chapter 1
Statistics C
If you are un the Statistics Coach can help you get started by prompting you with simple questions, nontechnical language, and visual examples that help you select the basic statistical and chartin
To use the S
Help

Statistics Coach

The Statistics Coach covers only a selected subset of procedures in the SPSS Base system. It used statistical techniques.
Finding Ou
For a compr SPSS menu choose:
Help
Tutorial
oach
familiar with SPSS or with the statistical procedures available in SPSS,
g features that are best suited for your data.
tatistics Coach, from the menus in any SPSS window choose:
is designed to provide general assistance for many of the basic, commonly
t More about SPSS
ehensive overview of SPSS basics, see the online tutorial. From any
Page 37

Getting H elp

Help is provided in many different forms:
Help menu. The Help menu in most SPSS windows provides access to the main Help
system, plus tutorials and technical reference material.
Topics. Provides access to the Contents, Index, and Search tabs, which you
can use to find specific Help topics.
Tutoria l. Illustrated, step-by-step instructions on how to use many of the basic
features in SPSS. You don’t have to view the whole tutorial from start to finish. You can choose the topics you want to view, skip around and view topics in any order, and use the index or table of contents to find specific topics.
Case Studies. Hands-on examples of how to create various types of statistical
analyses and how to interpret the results. The sample data files used in the examples are also provided so that you can work through the examples to see exactly how the results were produced. You can choose the specific procedure(s) you want to learn about from the table of contents or search for relevant topics in the index.
Statistics Coach. A wizard-like approach to guide you through the process of
finding the procedure that you want to use. After you make a series of selections, the Statistics Coach opens the dialog box for the statistical, reporting, or charting procedure that meets your selected criteria. The Statistics Coach provides access to most statistical and reporting procedures in the Base system and many charting procedures.
Command Syntax Reference. Detailed command syntax reference information is
provided in the SPSS Command Syntax Reference, available from the Help menu.
Chapter
2
Context-sensitive Help. In many places in the user interface, you can get
context-sensitive Help.
13
Page 38
14
Chapter 2
Dialog box Help buttons. Most dialog boxes have a Help button that takes you
directly to
a Help topic for that dialog box. The Help topic provides general
information and links to related topics.
Dialog box context menu Help. Many dialog boxes provide context-sensitive Help
for individual controls and features. Right-click on any control in a dialog box
hat’s This?
and select
W
control and directions for its use. (If
from the context menu to display a description of the
What’s This? does not appear on the context
menu, then this form of Help is not available for that dialog box.)
Pivot table context menu Help. Right-click on terms in an activated pivot table in
the Viewer
and select
What’s This? from the context menu to display definitions
of the terms.
Case Studies. Right-click on a pivot table and select Case Studies from the context
menu to go directly to a detailed example for the procedure that produced that
Case Studies does not appear on the context menu, then this form of
table. (I
f
Help is not available for that procedure.)
Command syntax charts. In a command syntax window, position the cursor
anywhere within a syntax block for a command and press F1 on the keyboard. A complete
command syntax chart for that command will be displayed.
Other res
ources.
If you can’t find the information you want in the Help system, these
other resources may have the answers you need:
SPSS for Windows Developer’s Guide. Provides information and examples for the
develop
er’s tools included with SPSS for Windows, including OLE automation, third-party API, input/output DLL, production facility, and scripting facility. The Developer’s Guide is available in PDF form in the SPSS\developer directory on the inst
Techni
allation CD.
cal Support Web site.
Answers to many common problems can be found at http://support.spss.com. (The Technical Support Web site requires a login ID and password. Information on how to obtain an ID and password is provided at the URL
listed above.)

Using the Help Table of Contents

E In any window, from the menus choose:
Help
Topics
Page 39
E
Click the Contents tab.
E Double-click items with a book icon to expand or collapse the contents. E ClickanitemtogotothatHelptopic.
15
Getting Help
Using t
E In any
Figure 2-1
Help window w
ith Contents tab displayed
he Help Index
window, from the menus choose:
Help
Topics
Click the Index tab.
E
E Enter a term to search for in the index. E Double-click the topic that you want.
The Help index uses incremental search to find the text that you enter and selects
sest match in the index.
the clo
Page 40
16
Chapter 2
Figure 2-2
Index tab and incremental search
Getting
E Right- E Choose
Help on Dialog Box Controls
click on the dialog box control that you want information about.
What’s This? from the pop-up context menu.
A description of the control and how to use it is displayed in a pop-up window. General information about a dialog box is available from the Help button in the dialog b
ox.
Page 41
Figure 2-3
Dialog box control Help with right mouse button

Getting Help on Output Terms

E Double-click the pivot table to activate it. E Right-click on the term that you want to be explained.
17
Getting Help
E Choose What’s This? from the context menu.
A definition of the term is displayed in a pop-up window.
Page 42
18
Chapter 2
Figure 2-4
Activated pivot table glossary Help with right mouse button

UsingCaseStudies

E Right-click on a pivot table in the Viewer window. E Choose Case Studies from the pop-up context menu.
Copyin
g Help Text from a Pop-Up Window
E Right E Choos
-click anywhere in the pop-up window.
Copy from the context menu.
e
The entire text of the pop-up window is copied.
Page 43

Data Files

Data files come in a wide variety of formats, and this software is designed to handle many of them
SpreadsheDatabase fTab-delimData filesSYSTAT datSAS data fi
, including:
Chapter
3
ets created with Lotus 1-2-3 and Excel
iles created with dBASE and various SQL formats ited and other types of ASCII text files in SPSS format created on other operating systems
afiles
les

Opening a Data File

In addition to files saved in SPSS format, you can open Excel, Lotus 1-2-3, dBASE, and tab-del entering data definition information.
To Open Dat
E From the m
File
Open
Data...
In the Open File dialog box, select the file that you want to open.
E
E Click Open.
imited files without converting the files to an intermediate format or
a Files
enus choose:
19
Page 44
20
Chapter 3
Optionally, you can:
Read variable names from the first row for spreadsheet and tab-delimited files.Specify a range of cells to read for spreadsheet files.Specify a sheet within an Excel file to read (Excel 5 or later).
Data File Ty p
SPSS. Opens d
Macintosh, UNIX, and also the DOS product SPSS/PC+.
SPSS/PC+. Opens SPSS/PC+ data files. SYSTAT. Ope SPSS Portable. Opens data files saved in SPSS portable format. Saving a file in
portable format takes considerably longer than saving the file in SPSS format.
Excel. Open Lotus 1-2-3. Opens data files saved in 1-2-3 format for release 3.0, 2.0, or 1A of Lotus. SYLK. Opens data files saved in SYLK (symbolic link) format, a format used by
some spread
dBASE. Opens dBASE-format files for either dBASE IV, dBASE III or III PLUS,
or dBASE II. Each case is a record. Variable and value labels and missing-value specificat
SAS Long File Name. SAS version 7-9 for Windows, long extension. SAS Short File Name. SAS version 7-9 for Windows, short extension.
SAS v6 for Wi
es
ata files saved in SPSS format, including SPSS for Windows,
ns SYSTAT data files.
s Excel files.
sheet applications.
ions are lost when you save a file in this format.
ndows.
SAS version 6.08 for Windows and OS2.
SAS v6 for UNIX. SAS version 6 for UNIX (Sun, HP, IBM). SAS Transport. SAS transport file. Text . ASCII
text file.
Page 45
21
Data Files
Opening File
Read variab
of the file or the first row of the defined range. The values are converted as necessary to create valid variable names, including converting spaces to underscores.
Worksheet.
Data Editor reads the first worksheet. To read a different worksheet, select the worksheet from the drop-down list.
Range. For
method for specifying cell ranges as you would with the spreadsheet application.
Reading Ex
cel F iles
Read varia
first row of the defined range. Values that don’t conform to variable naming rules are converted to valid variable names, and the original names are used as variable labels.
Workshee
reads the first worksheet. To read a different worksheet, select the worksheet from the drop-down list.
Range. Yo
ranges as you would in Excel.
Options
le names.
Excel 5 or later files can contain multiple worksheets. By default, the
spreadsheet data files, you can also read a range of cells. Use the same
ble names.
t
. Excel files can contain multiple worksheets. By default, the Data Editor
u can also read a range of cells. Use the same method for specifying cell
For spreadsheets, you can read variable names from the first row
You can read variable names from the first row of the file or the
How the D
ata Editor Reads Excel 5 or Later Files
The foll
Data type and width. Each column is a variable. The data type and width for each
variable is determined by the data type and width in the Excel file. If the column contain to string, and all values are read as valid string values.
Blank cells. For numeric variables, blank cells are converted to the system-missing
value, blank cells are treated as valid string values.
owing rules apply to reading Excel 5 or later files:
s more than one data type (for example, date and numeric), the data type is set
indicated by a period. For string variables, a blank is a valid string value, and
Page 46
22
Chapter 3
Variable names. If you read the first row of the Excel file (or the first row of the
specified range) as variable names, values that don’t conform to variable naming rules are convert labels. If you do not read variable names from the Excel file, default variable names are assigned.
ed to valid variable names, and the original names are used as variable

How the Data Editor Reads Older Excel Files and Other Spreadsheets

The following rules apply to reading Excel files prior to version 5 and other spreadshe
Data type and width. The data type and width for each variable are determined by the
column width and data type of the first data cell in the column. Values of other types are conve the global default data type for the spreadsheet (usually numeric) is used.
Blank cells. For numeric variables, blank cells are converted to the system-missing
value, in blank cells are treated as valid string values.
et data:
rted to the system-missing value. If the first data cell in the column is blank,
dicated by a period. For string variables, a blank is a valid string value, and
Variable names. If you do not read variable names from the spreadsheet, the column
(A, B, C, ...) are used for variable names for Excel and Lotus files. For SYLK
letters files and Excel files saved in R1C1 display format, the software uses the column number preceded by the letter C for variable names (C1, C2, C3, ...).

How the Data Editor Reads dBASE Files

Database files are logically very similar to SPSS-format data files. The following general rules apply to dBASE files:
Field names are converted to valid variable names.Colons used in dBASE field names are translated to underscores.Records marked for deletion but not actually purged are included. The software
creates a new string variable, D_R, which contains an asterisk for cases marked
etion.
for del
Page 47
23
Data Files
Reading Data
You can read local analysis mode, the necessary drivers must be installed on your local computer. In distributed analysis mode (available with the server version), the drivers must be installed o Mode” in Chapter 4 on p. 63.
To Read Dat
E From the m
E
E Depending on the data source, you may need to select the database file and/or enter a
E Select the table(s) and fields. E Specify any relationships between your tables.
abase Files
File
Open Database
New Query...
Select the data source.
login name, password, and other information.
base Files
data from any database format for which you have a database driver. In
n the remote server. For more information, see “Distributed Analysis
enus choose:
Optionally, you can:
Specify any selection criteria for your data.Add a prompt for user input to create a parameter query.Save the query you have constructed before running it.
To Edit Saved Database Queries
E From the menus choose:
File
Open Database
Edit Query...
Select the query file (*.spq) that you want to edit.
E
Page 48
24
Chapter 3
E
Follow the instructions for creating a new query.
To Read Data
E From the me
E
E Depending on the database file, you may need to enter a login name and password. E If the query has an embedded prompt, you may need to enter other information (for
base Files with Saved Queries
nus choose:
File
Open Database
Run Query...
Select the query file (*.spq) that you want to run.
example, the quarter for which you want to retrieve sales figures).

Selecting a D ata Source

Use the first screen to select the type of data source to read. After you have chosen thefiletype,theDatabaseWizardmaypromptyouforthepathtoyourdatafile.
Ifyoudono source, click version), this button is not available. To add data sources in distributed analysis mode, see y
Data sources. A data source consists of two essential pieces of information: the driver
that will be used to access the data and the location of the database that you want to access. T local analysis mode, you can install drivers from the CD-ROM for this product:
SPSS Data Access Pack. Installs drivers for a variety of database formats.
Availabl
Microso
Microsoft Access. To install the Microsoft Data Access Pack, double-click
Microsoft Data Access Pack in the Microsoft Data Access Pack folder on the
CD-ROM.
t have any data sources configured, or if you want to add a new data
AddDataSource. In distributed analysis mode (available with the server
our system administrator.
o specify data sources, you must have the appropriate drivers installed. For
e on the AutoPlay menu.
ft Data Access Pack.
Installs drivers for Microsoft products, including
Page 49
Figure 3-1
Database Wizard dialog box
25
Data Files

Database Login

If your database requires a password, the Database Wizard will prompt you for one before it can open the data source.
Page 50
26
Chapter 3
Figure 3-2
Logindialogbox

Selecting Data Fields

The Select Data step controls which tables and fields are read. Database fields (columns) are read as variables.
If a table has any field(s) selected, all of its fields will be visible i n the following Database Wizard windows, but only those fields selected in this dialog box will be imported as variables. This enables you to create table joins and to specify criteria using fields that you are not importing.
Page 51
Figure 3-3
Select Data dialog box
27
Data Files
Displaying field names. To list the fields in a table, click the plus sign (+) to the left of
a table name. To hide the fields, click the minus sign (–) to the left of a table name.
To add a field. Double-click any field in the Available Tables list, or drag it to the
Retrieve Fields in This Order list. Fields can be reordered by dragging and dropping them within the selected fields list.
To remove a field. Double-click any field in the Retrieve Fields in This Order list,
or drag it to the Available Tables list.
Sort field names. If selected, the Database Wizard will display your available fields
in alphabetical order.
Page 52
28
Chapter 3
Creating a Relationship between Tables
The Specify Relationships dialog box allows you to define the relationships between the tables. If fields from more than one table are selected, you must define at least one join.
Figure 3-4
Specify Relationships dialog box
lishing relationships.
Estab
To create a relationship, drag a field from any table onto the
field to which you want to join it. The Database Wizard will draw a join line between the two fields, indicating their relationship. These fields must be of the same data type.
Auto J
oin Tables.
Attempts to automatically join tables based on primary/foreign keys
or matching field names and data type.
Page 53
Specifying join types. If outer joins are supported by your driver, you can specify
either inner joins, left outer joins, or right outer joins. To select the type of join, double-cli
ck the join line between the fields, and the wizard will display the
Relationship Properties dialog box. You can also use the icons in the upper right corner of the dialog box to choose
thetypeofj
oin.
Relationship Properties
This dialog box allows you to specify which type of relationship joins your tables.
Figure 3-5
Relationship Properties dialog box
29
Data Files
Inner joins. An inner join includes only rows where the related fields are equal.
Complet
ing this would give you a data set that contains the variables ID, REGION,
SALES95,andAVGINC foreachemployeewhoworkedinafixedregion.
Page 54
30
Chapter 3
Figure 3-6
Creating an inner join
Outer joins. A left outer join includes all records from the table on the left and only
ecords from the table on the right in which the related fields are equal. In a
those r right outer join, this relationship is switched, so that the join imports all records from the table on the right and only those records from the table on the left in which the
ed fields are equal.
relat
Page 55
Figure 3-7
Creating a right outer join
31
Data Files
Limiting Retrieved Cases
The Limit Retrieved Cases dialog box allows you to specify the criteria to select subsets of cases (rows). Limiting cases generally consists of filling the criteria grid
ne or more criteria. Criteria consist of two expressions and some relation
with o between them. They return a value of true, false,ormissing for each case.
If the result is true, the case is selected.If the result is false or missing, the case is not selected.
Page 56
32
Chapter 3
Most criteria use one or more of the six relational operators (<, >, <=, >=,
=, and <>).
Expression
s can include field names, constants, arithmetic operators, numeric and other functions, and logical variables. You can use fields that you do not plan to import as variables.
Figure 3-8
Limit Retri
eved Cases dialog box
To buildyour criteria, you need at least two expressions and a relation to connect them.
E To build an expression, put your cursor in an Expression cell. You can type field
names, constants, arithmetic operators, numeric and other functions, and logical variables. Other methods of putting a field into a criteria cell include double-clicking the field in the Fields list, dragging the field from the Fields list, or selecting a field from the drop-down menu that is available in any active Expression cell.
Page 57
Data Files
E
The two expressions are usually connected by a relational operator, such as = or >. To
choose the relation, put your cursor in the Relation cell and either type the operator or select it
from the drop-down menu.
33
Dates and ti
mes in expressions need to be specified in a special manner (including the
curly braces shown in the examples):
Date literals should be specified using the general form: {d yyyy-mm-dd}.Time literals should be specified using the general form: {t hh:mm:ss}.Date/time literals (timestamps) should be specified using the general form: {dt
yyyy-mm-dd
Functions.
hh:mm:ss}
A selection of built-in arithmetic, logical, string, date, and time SQL
.
functions is provided. You can select a function from the list and drag it into the expression, or you can enter any valid SQL function. See your database documenta
tion for valid SQL functions. A list of standard functions is available at:
http://msdn.microsoft.com/library/en-us/odbc/htm/odbcscalar_functions.asp
Use Random Sampling. Selects a random sample of cases from the data source.
For large data sources, you may want to limit the number of cases to a small, represen
tative sample. This can significantly reduce the time that it takes to run procedures. Native random sampling, if available for the data source, is faster than SPSS random sampling, since SPSS random sampling must still read the entire data sour
Approxi
ce to extract a random sample.
mately.
Generates a random sample of approximately the specified percentage of cases. Since this routine makes an independent pseudo-random decision for each case, the percentage of cases selected can only approximate the spec
ified percentage. The more cases there are in the data file, the closer the
percentage of cases selected is to the specified percentage.
Exactly. Selects a random sample of the specified number of cases from the
specified total number of cases. If the total number of cases specified exceeds
al number of cases in the data file, the sample will contain proportionally
the tot fewer cases than the requested number.
Prompt
for Value.
You can embed a prompt in your query to create a parameter query.
When users run the query, they will be asked to enter information specified here. You might want to do this if you need to see different views of the same data. For example,
Page 58
34
Chapter 3
you may want to run the same query to see sales figures for different fiscal quarters. Place your cursor in any Expression cell, and click Prompt for Value to create a prompt.
Note:Ifyou SPSS Server) i s not available.
Creating a P
Use the Prom from users each time someone runs your query. It is useful if you want to query the same data source using different criteria.
Figure 3-9
Prompt for Value dialog box
use random sampling, aggregation (available in distributed mode with
arameter Query
pt for Value dialog box to create a dialog box that solicits information
To build a prompt, you need to enter a prompt string and a default value. The prompt string is displayed each time a user runs your query. It should specify the kind of information to enter, and, if the user is not selecting from a list, it should give hints about how the input should be formatted. Forexample,“EnteraQuarter(Q1,Q2, Q3, ...)”.
Allow user to select value from list. If this is selected, you can limit the user to the
values you place here, which are separated by returns.
Data type. Specify the data type here, either Number, String,orDate.
Page 59
The final result looks like this:
35
Data Files
Figure 3-10
User-define
dpromptdialogbox

Aggregating Data

If you are in distributed mode, connected to a remote server (available with SPSS Server), you can aggregate the data before reading it into SPSS.
Figure 3-11
Aggregate Data dialog box
Page 60
36
Chapter 3
You can also aggregate data after reading it into SPSS, but preaggregating may save time for large data sources.
E Select one or more break variables that define how cases are grouped to create
aggregated
data.
E Select one E Select an a
or more aggregate variables.
ggregate function for each aggregate variable.
Optionally, you can create a variable that contains the number of cases in each break group.
Note:Ifyou
use random sampling, aggregation is not available.

Defining Variables

Variable names and labels. The c omplete database field (column) name is used as the
variable l variable names to each column from the database in one of two ways:
If the name of the database field forms a valid, unique variable name, it is used
If the nam
Click any c
Converting strings to numeric values. Check the Recode to Numeric box for a string
variable if you want to automatically convert it to a numeric variable. String values are conve values. The original values are retained as value labels for the new variables.
abel. Unless you modify the variable name, the Database Wizard assigns
as the vari
able name.
e of the database field does not form a valid, unique variable name, a
new, unique name is automatically generated.
elltoeditthevariablename.
rted to consecutive integer values based on alphabetic order of the original
Width for variable-width strings. Controls the width of variable-width string values.
By defaul
t, the width is 255 bytes, and only the first 255 bytes (typically 255
characters in single-byte languages) will be read. The width can be up to 32,767 bytes. Although you probably don’t want to truncate string values, you also don’t want to s
pecify an unnecessarily large value, since excessively large values will
cause SPSS processing to be inefficient.
Page 61
Figure 3-12
Define Variables dialog box
37
Data Files

Sorting Cases

If you are in distributed mode, connected to a remote server (available with SPSS Server), you can sort the data before reading it into SPSS.
Page 62
38
Chapter 3
Figure 3-13
Sort Cases dialog box

Results

You can also sort data after reading it into SPSS, but presorting may save time for largedatasources.
The Results dialog box displays the SQL Select statement for your query.
You can edit the SQL Select statement before you run the query, but if you
click the
Back button to make changes in previous steps, the changes to the
Select statement will be lost.
Page 63
Data Files
You can save the query for future use with Save query to a file.
Select Paste it into the syntax editor for further modification to paste complete GET
DATA
syntax i
nto a syntax window. Copying and pasting the Select statement
from the Results window will not paste the necessary command syntax.
39
Note: The pa
sted syntax contains a blank space before the closing quote on each line of SQL generated by the wizard. These blanks are not superfluous. When the command is processed, all of the lines of the SQL statement are merged together in a very lite
ral fashion. Without the space, there would be no space between the last
character on one line a nd first character on the next line.
Figure 3-1
Results di
4
alog box
Page 64
40
Chapter 3

Text Wizard

The Text Wiz
Tab-delimiSpace-deliComma-deliFixed-fiel
ard can read text data files formatted in a variety of ways:
ted files
mited files
dformatfiles
For delimited files, you can also specify other characters as delimiters between values, and you can specify multiple delimiters.
To Read Text Data Files
E From the menus choose:
File
Read Text Data
Select the text file in the Open dialog box.
E
E Follow the steps in the Text Wizard to define h ow to read the data file.
mited files
Page 65
Text Wizard Step 1
41
Data Files
Figure 3-15
Text W iz a r d S
tep 1
The text file is displayed in a preview window. You can apply a predefined format (previously saved from the Text Wizard) or follow the steps in the Text Wizard to specify how the data should be read.
Page 66
42
Chapter 3
Text Wizard Step 2
Figure 3-16
Text W iz a r d S
tep 2
This step provides information about variables. A variable is similar to a field in a database. For example, each item in a questionnaire is a variable.
How are your variables arranged? To read your data properly, the Text Wizard needs to
know how to determine where the data value for one variable ends and the data value for the next variable begins. The arrangement of variables defines the method used to differentiate one variable from the next.
Delimited. Spaces, commas, tabs, or other characters are used to separate
variables. The variables are recorded in the same order for each case but not necessarily in the same column locations.
Fixed width. Each variable is recorded in the same column location on the same
record (line) for each case in the data file. No delimiter is required between variables. In fact, in many text data files generated by computer programs, data
Page 67
Data Files
values may appear to run together without even spaces separating them. The column loca
Are variable names included at the top of your file? If the first row of the data file
tion determines which variable is being read.
contains descriptive labels for each variable, you can use these labels as variable names. Valu
es that don’t conform to variable naming rules are converted to valid
variable names.
43
Text Wizar
Figure 3-17
Text Wizard Step 3 for delimited files
d Step 3: Delimited Files
This step provides information about cases. A case is similar to a record in a database. For example, each respondent to a questionnaire is a case.
The first case of data begins on which line number? Indicates the first line of the data
file that contains data values. If the top line(s) of the data file contain descriptive labels or other text that does not represent data values, this will not be line 1.
Page 68
44
Chapter 3
How are your cases represented? Controls how the Text Wizard determines where
each case ends and the next one begins.
Each line represents a case. Each line contains only one case. It is fairly common
for each case to be contained on a single line (row), even though this can be a very long li
ne for data files with a large number of variables. If not all lines contain the same number of data values, the number of variables for each case is determined by the line with the greatest number of data values. Cases with fewer data value
A specifi
s are assigned missing values for the additional variables.
c number of variables represents a case.
The specified number of variables for each case tells the Text Wizard where to stop reading one case and start reading the next. Multiple cases can be contained on the same line, and cases can start in th
e middle of one line and be continued on the next line. The Text Wizard determines the end of each case based on the number of values read, regardless of the number of lines. Each case must contain data values (or missing values indicate
How many cases do you want to import? You can import all cases in the data file, the
d by delimiters) for all variables, or the data file will be read incorrectly.
first n cases (n is a number you specify), or a random sample of a specified percentage. Since th
e random sampling routine makes an independent pseudo-random decision for each case, the percentage of cases selected can only approximate the specified percentage. The more cases there are in the data file, the closer the percentage of cases se
lected is to the specified percentage.
Page 69
Text Wizard Step 3: Fixed-Width Files
45
Data Files
Figure 3-18
Text W iza r d S
tep 3 for fixed-width files
This step provides information about cases. A case is similar to a record in a database.
ample, each respondent to questionnaire is a case.
For ex
The first case of data begins on which line number? Indicates the first line of the data
file that contains data values. If the top line(s) of the data file contain descriptive
s or other text that does not represent data values, this will not be line 1.
label
How many lines represent a case? Controls how the Text Wizard determines where
each case ends and the next one begins. Each variable is defined by its line number
n the case and its column location. You need to specify the number of lines
withi for each case to read the data correctly.
How many cases do you want to import? You can import all cases in the data file, the
t n cases (n is a number you specify), or a random sample of a specified percentage.
firs Since the random sampling routine makes an independent pseudo-random decision
Page 70
46
Chapter 3
for each case, the percentage of cases selected can only approximate the specified percentage. The more cases there are in the data file, the closer the percentage of cases selec
ted is to the specified percentage.
Text Wizard Step 4: Delimited Files
Figure 3-19
Text Wizard Step 4 for delimited files
This step displays the TextWizard’s best guess on how to read the data file and allows you to modify how the Text Wizard will read variables from the data file.
Which delimiters appear between variables? Indicates the characters or symbols that
separate data values. You can select any combination of spaces, commas, semicolons, tabs, or other characters. Multiple, consecutive delimiters without intervening data values are treated as missing values.
Page 71
Data Files
What is the text qualifier? Characters used to enclose values that contain delimiter
characters. For example, if a comma is the delimiter, values that contain commas will be read
incorrectly unless there is a text qualifier enclosing the value, preventing the commas in the value from being interpreted as delimiters between values. CSV-format data files exported from Excel use a double quotation mark (“) as a text qualifier
. The text qualifier appears at both the beginning and the end of the value,
enclosing the entire value.
47
Text Wizar
Figure 3-20
Text Wizard Step 4 for fixed-width files
d Step 4: Fixed-Width Files
This step displays the Text Wizard’s best guess on how to read the data file and allows you to modify how the Text Wizard will read variables from the data file. Vertical lines in the preview window indicate where the Text Wizard currently thinks each variable begins in the file.
Page 72
48
Chapter 3
Insert, move, and delete variable break lines as necessary to separate variables. If multiple lines are used for each case, select each line from the drop-down list and modify the v
Note: For computer-generated data files that produce a continuous stream of data values with no intervening spaces or other distinguishing characteristics, it may be difficul
t to determine where each variable begins. Such data files usually rely on a data definition file or some other written description that specifies the line and column location for each variable.
Text Wizard Step 5
ariable break lines as necessary.
Figure 3-2
Text W iz a r
1
dStep5
This steps controls the variable name and the data format that the Text Wizard will use to read each variable and which variables will be included in the final data file.
Page 73
Variable name. You can overwrite the default variable names with your own
variable names. If you read variable names from the data file, the Text Wizard will automatica
lly modify variable names that don’t conform to variable naming rules.
Select a variable in the preview window and then enter a variable name.
Data format. Select a variable in the preview window and then select a format from the
drop-down l
ist. Shift-click to select multiple contiguous variables or Ctrl-click to
select multiple noncontiguous variables.
Text Wizard Formatting Options
Formatting options for reading variables with the Text Wizard include:
rt.
Do not impo
Numeric. Valid values include numbers, a leading plus or minus sign, and a decimal
Omit the selected variable(s) from the imported data file.
indicator.
49
Data Files
String. Va
lid values include virtually any keyboard characters and embedded blanks. For delimited files, you can specify the number of characters in the value, up to a maximum of 32,767. By default, the Text Wizard sets the number of characters to the longest s
tring value encountered for the selected variable(s). For fixed-width files, the number of characters in string values is defined by the placement of variable break lines in step 4.
e.
Date/Tim
Valid values include dates of the general format dd-mm-yyyy, mm/dd/yyyy,
dd.mm.yyyy, yyyy/mm/dd, hh:mm:ss, and a variety of other date and time formats.
Months can be represented in digits, Roman numerals, or three-letter abbreviations, or they c
Dollar. Valid values are numbers with an optional leading dollar sign and optional
an be fully spelled out. Select a date format from the list.
commas as thousands separators.
Comma. V
alid values include numbers that use a period as a decimal indicator and
commas as thousands separators.
Dot. Valid values include numbers that use a comma as a decimal indicator and
s as thousands separators.
period
Page 74
50
Chapter 3
Note: Values that contain invalid characters for the selected format will be treated as missing. Values that contain any of the specified delimiters will be treated as multiple va
Text Wizard Step 6
Figure 3-22
Text W iz a r d S t e p 6
lues.
This is the final step of the Text Wizard. You can save your specifications in a file for use when importing similar text data files. You can also paste the syntax generated by the Text Wizard into a syntax window. You can then customize and/or save the syntax for use in other sessions or in production jobs.
Cache data locally. Adatacacheiscompletecopyofthedatafile,storedintemporary
disk space. Caching the data file can improve performance.
Page 75
51
Data Files
File Informa
Adatafilec information, including:
Variable namesVariable formatsDescriptive variable and value labels
This information is stored in the dictionary portion of the data file. The Data Editor provides on complete dictionary information for the working data file or any other data file.
To Obtain Da
E From the me
E
E For other data files, choose External File, and then select the data file.
ta File Information
File
Display Data File Information
For the currently open data file, choose Working File.
The data file information is displayed in the Viewer.
tion
ontains much more than raw data. It also contains any variable definition
e way to view the variable definition information. You can also display
nus in the Data Editor window choose:

Saving Data Files

Any changes that you make in a data file last only for the duration of the current session—unless you explicitly save the changes.

To Save Modified Data Files

E Make the Data Editor the active window (click anywhere in the window to make it
active).
E From the menus choose:
File
Save
Page 76
52
Chapter 3
The modified data file is saved, overwriting the previous version of the file.
Saving Data
You can save format depends on the version of Excel that will be used to open the data. The Excel application cannot open an Excel file from a newer version of the application. For example, E can easily read an Excel 5.0 document.
There are a few limitations to the Excel file format that don’t exist in SPSS. These
limitatio
Variable
in exported Excel files.
WhenexportingtoExcel97andlater,anoptionisprovidedtoincludevalue
labels instead of values.
Because all Excel files are limited to 256 columns of data, only the first 256
variables
Excel 4.0
Excel 97–2000 files allow 65,536 records. If your data exceed these limits, a warning message is displayed and the data are truncated to the maximum size allowed b
Variable Types
Files in Excel Format
your data in one of three Microsoft Excel file formats. The choice of
xcel 5.0 cannot open an Excel 2000 document. However, Excel 2000
ns include:
information, such as missing values and variable labels, is not included
are included in the exported file.
and Excel 5.0/95 files are limited to 16,384 records, or rows of data.
yExcel.
The following table shows the variable type matching between the original data in SPSS and t
SPSS Variable Type Excel Data Format
Numeric 0.00; #,##0.00;... Comma 0.00; #,##0.00;... Dollar Date d-mmm-yyyy Time hh:mm:ss String General
he exported data in Excel.
$#,##0_);...
Page 77
53
Data Files
Saving Data F
Special han These cases include:
Certain characters that are allowed in SPSS variable names are not valid in SAS,
such as @, #, when the data are exported.
SPSS variable labels containing more than 40 characters are truncated when
exported to a SAS v6 file.
Where they exist, SPSS variable labels are mapped to the SAS variable labels.
If no varia the SAS variable label.
SAS allows only one value for system-missing, whereas SPSS allows numerous
system-missing values. As a result, all system-missing values in SPSS are mapped to a
Save Value Labels
You have the option of saving the values and value labels associated with your data file to a S are exported, the generated syntax file contains:
iles in SAS Format
dling is given to various aspects of your data when saved as a SAS file.
and $. These illegal characters are replaced with an underscore
ble label exists in the SPSS data, the variable name is mapped to
single system-missing value in the SAS file.
AS syntax file. For example, when the value labels for the cars.sav data file
libname library 'd:\spss\' ;
proc format library = library ;
value ORIGIN /* Country of Origin */
1 = 'America
2 = 'European'
3 = 'Japanese' ;
value CYLIND
3 = '3 Cylinders'
4 = '4 Cylinders'
5 = '5 Cylinders
6 = '6 Cylinders'
n'
ER /* Number of Cylinders */
'
Page 78
54
Chapter 3
8 = '8 Cylinde
value FILTER__ /* cylrec=1|cylrec = 2 (FILTER) */
0 = 'Not Selected'
1 = 'Selected' ;
proc datasets l
modify cars;
format ORIGIN ORIGIN.;
format CYLINDER C
format FILTER__ FILTER__.;
quit;
This feature is not
rs' ;
ibrary = library ;
YLINDER.;
supported for the SAS transport file.
Variable Types
The following table shows the variable type matching between the original data in SPSS and the exp
SPSS Variable Type SAS Variable Type SAS Data Format
Numeric Numeric 12 Comma Numeric 12 Dot Numeric 12 Scientific Notation Numeric 12 Date Numeric (Date) for example,
Date (Time) Numeric Time18 Dollar Numeric 12 Custom Currency Numeric 12 String Character
orted data in SAS:
MMDDYY10, ...
$8
Page 79
55
Data Files
Saving Data F
E Make the Da
active).
E From the menus choose:
File
Save As...
Select a file type from the drop-down list.
E
E Enter a filename for the new data file.
To write variable names to the first row of a spreadsheet or tab-delimited data file:
E Click Write variable names to spreadsheet intheSaveDataAsdialogbox.
To save valu
E Click Save
dialog box. To save valu
E Click Save
iles in Other Formats
ta Editor the active window (click anywhere in the window to make it
e labels instead of data values in Excel 97 format:
value labels where defined instead of data values
e labels to a SAS syntax file (active only when a SAS file type is selected):
value labels into a .sas file
intheSaveDataAsdialogbox.
intheSaveDataAs

Saving Data: Data File Types

You can save data in the following formats:
SPSS (*.sav). SPSS format.
Data files saved in SPSS format cannot be read by versions of the software
prior to version 7.5.
Page 80
56
Chapter 3
When using data files with variable names longer than eight bytes in SPSS 10.x
or 11.x, uni
que, eight-byte versions of variable names are used—but the original variable names are preserved for use in release 12.0 or later. In releases prior to SPSS 10, the original long variable names are lost if you save the data file.
When using data files with string variables longer than 255 bytes in versions of
SPSS prior t
o release 13.0, those string variables are broken up into multiple
255-byte string variables.
.sav).
SPSS 7.0 (*
SPSS 7.0 for Windows format. Data files saved in SPSS 7.0 format can be read by SPSS 7.0 and earlier versions of SPSS for Windows but do not include defined multiple response sets or Data Entry for Windows information.
*.sys).
SPSS/PC+ (
SPSS/PC+format.Ifthedatafilecontainsmorethan500 variables, only the first 500 will be saved. For variables with more than one defined user-missing value, additional user-missing values will be recoded into the first defined u
SPSS Portable (*.por). SPSS portable format that can be read by other versions of
ser-missing value.
SPSS and versions on other operating systems (for example, Macintosh or UNIX). Variable
names are limited to eight bytes and are automatically converted to unique
eight-byte names if necessary.
Tab-delimited (*.dat). ASCII text files with values separated by tabs.
Fixed ASC
II (*.dat).
ASCII text file in fixed format, using the default write formats for
all variables. There are no tabs or spaces between variable fields.
Excel 2.1(*.xls). Microsoft Excel 2.1 spreadsheet file. The maximum number of
es is 256, and the maximum number of rows is 16,384.
variabl
Excel 97 and later (*.xls). Microsoft Excel 97/2000/XP spreadsheet file. The maximum
number of variables is 256, and the maximum number of rows is 65,536.
1-2-3 Re
lease 3.0 (*.wk3).
Lotus 1-2-3 spreadsheet file, release 3.0. The maximum
number of variables that you can save is 256.
1-2-3 Release 2.0 (*.wk1). Lotus 1-2-3 spreadsheet file, release 2.0. The maximum
number o
1-2-3 Release 1.0 (*.wks). Lotus 1-2-3 spreadsheet file, release 1A. The maximum
f variables that you can save is 256.
number of variables that you can save is 256.
Page 81
Data Files
SYLK (*.slk). Symbolic link format for Microsoft Excel and Multiplan spreadsheet
files. The maximum number of variables that you can save is 256.
dbf).
dBASE IV (*.
dBASE III (*.dbf). dBASE III format. dBASE II (*.dbf). dBASE II format.
dBASE IV format.
57
SAS v6 for Wi
SAS v6 for UNIX (*.ssd01). SAS v6 file format for UNIX (Sun, HP, IBM). SAS v6 for Alpha/OSF (*.ssd04). SAS v6 file format for Alpha/OSF (DEC UNIX).
SAS v7+ Windo
ndows (*.sd2).
ws short extension (*.sd7).
SAS v6 file format for Windows/OS2.
filename format.
SAS v7+ Windows long extension (*.sas7bdat). SAS version 7–8 for Windows long
filename fo
SAS v7+ for UNIX (*.ssd01). SAS v8 for UNIX. SAS Transport (*.xpt). SAS transport file.
rmat.

Saving Subsets of Variables

Figure 3-23
Save Data As
Variables dialog box
SAS version 7–8 for Windows short
Page 82
58
Chapter 3
For data saved as an SPSS data file, the Save Data As Variables dialog box allows you to select the variables that you want to be saved in the new data file. By default, all variables w
Drop All and then select the variables that you want to save.
ill be saved. Deselect the variables that you don’t want to save, or click
ToSaveaSub
E Make the Da
setofVariables
ta Editor the active window (click anywhere in the window to make it
active).
E From the menus choose:
File
Save As...
Select SPSS (*.sav) from the list of file types.
E
E Click Variables. E Select the variables that you want to save.

Saving File Options

For spreadsheet and tab-delimited files, you can write variable names to the first row of the file.

Protecting Original Data

To prevent the accidental modification/deletion of your original data, you can mark the file as read-only.
E From the Data Editor menus choose:
File
Mark File
Read Only
If you make subsequent modifications to the data and then try to save the data file, you can save the data only with a different filename; so the original data are not affected.
Page 83
You can change the file permissions back to read/write by selecting Mark File Read
from the File menu.
Write

Virtual Active File

The virtual active file enables you to work with large data files without requiring equally large (or larger) amounts of temporary disk space. For most analysis and charting p procedure. Procedures that modify the data require a certain amount of temporary disk space to keep track of the changes, and some actions always require enough disk space for a
Figure 3-24
Temporary disk space requirements
rocedures, the original data source is reread each time you run a different
t least one entire copy of the data file.
Action
Virtual Active
File
GET FILE = 'v1-5.sav'. REGR ESSION … or FREQUENCIES… /SAVE ZPRED.
v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v6 zpre v1 v2 v3 v4 v5 v6 zpre 11 12 13 14 15 11 12 13 14 15 16 1 11 12 13 14 15 16 1 21 22 23 24 25 21 22 23 24 25 26 2 21 22 23 24 25 26 2 31 32 33 34 35 31 32 33 34 35 36 3 31 32 33 34 35 36 3 41 42 43 44 45 41 42 43 44 45 46 4 41 42 43 44 45 46 4 51 52 53 54 55 51 52 53 54 55 56 5 51 52 53 54 55 56 5 61 62 63 64 65 61 62 63 64 65 66 6 61 62 63 64 65 66 6
COMPUTE v6 = … RECODE v4…
59
Data Files
SORT CASES BY…
CACHE
Data
Stored
in
Tem porary
Disk
Space
None
v4 v6 zpre v1 v2 v3 v4 v5 v6 zpre 1416 1 111213141516 1 2426 2 212223242526 2 3436 3 313233343536 3 4446 4 414243444546 4 5456 5 515253545556 5 6466 6 616263646566 6
Actions that don’t require any temporary disk space include:
Reading SPSS data filesMerging two or more SPSS data filesReading database tables with the Database WizardMerging an SPSS data file with a database tableRunning procedures that read data (for example, Frequencies, Crosstabs, Explore)
Page 84
60
Chapter 3
Actions that create one or more columns of data in temporary disk space include:
Computing new variablesRecoding existing variablesRunning procedures that create or modify variables (for example, saving
predicted va
lues in Linear Regression)
Actions that
Reading ExcRunning proReading datUsing the CaLaunching o
create an entire copy of the data file in temporary disk space include:
el files
cedures that sort data (for example, Sort Cases, Split File)
GET TRANSLATE or DATA LIST commands
awith
che Data facility or the
CACHE command
ther applications from SPSS that read the data file (for example,
AnswerTree, DecisionTime)
command provides functionality comparable to DATA LIST
Note:TheGET
DATA
without creating an entire copy of the data file in temporary disk space. The SPLIT
command in command syntax does not sort the data file and therefore does
FILE
not create a
copy of the data file. This command, however, requires sorted data for proper operation, and the dialog box interface for this procedure will automatically sort the data file, resulting in a complete copy of the data file. (Command syntax is not availa
Actions th
Reading dReading t
ble with the Student Version.)
at create an entire copy of the data file by default
atabases with the Database Wizard
ext files with the Text Wizard
:
The Text Wizard provide an optional setting to automatically cache the data. By default, this option is selected. You can turn it off by deselecting For the Dat
CACHE command.
the
abase Wizard you can paste the generated command syntax and delete
Cache data locally.
Page 85
Creating a Data Cache
Although the virtual active file can vastly reduce the amount of temporary disk space required, the absence of a temporary copy of the “active” file means that the original data source external source, creating a temporary copy of the data may improve performance. For example, for data tables read from a database source, the SQL query that reads the informati needs to read the data. Since virtually all statistical analysis procedures and charting procedures need to read the data, the SQL query is reexecuted for each procedure you run, wh large number of procedures.
If you have sufficient disk space on the computer performing the analysis (either your loca improve processing time by creating a data cache of the active file. The data cache is a temporary copy of the complete data.
has to be reread for each procedure. For large data files read from an
on from the database must be reexecuted for any command or procedure that
ich can result in a significant increase in processing time if you run a
l computer or a remote server), you can eliminate multiple SQL queries and
61
Data Files
Note:Byd use the
efault, the Database Wizard automatically creates a data cache, but if you
GET DATA command in command syntax to read a database, a data cache
is not automatically created. (Command syntax is not available with the Student Version
.)
To Create a Data Cache
E From the menus choose:
File
Cache Data...
Click OK or Cache N ow.
E
OK creates a data cache the next time the program reads the data (for example, the
me you run a statistical procedure), which is usually what you want because
next ti it doesn’t require an extra data pass.
Cache Now creates a data cache immediately,
Page 86
62
Chapter 3
which shouldn’t be necessary under most circumstances. Cache Now is useful primarily for two reasons:
A data source is “locked” and can’t be updated by anyone until you end your
session, open a different data source, or cache the data.
For large data sources, scrolling through the contents of the Data View tab in the
Data Editor
To Cache Data Automatically
You can use the SET command to automatically create a data cache after a specified number of changes in the active data file. By default, the active data file is automatic
ally cached after 20 changes in the active data file.
will be much faster if you cache the data.
E From the m
File
New
Syntax
In the syntax window, type SET CACHE n. (where n represents the number of
E
enus choose:
changes in the active data file before the data file is cached).
E From the menus in the syntax window choose:
Run
All
Note: The cache setting is not persistent across sessions. Each time you start a new session, the value is reset to the default of 20.
Page 87
Chapter
4
Distributed
Distributed analysis mode allows you to use a computer other than your local (or desktop) co distributed analysis are typically more powerful and faster than your local computer, appropriate use of distributed analysis mode can significantly reduce computer processin work involves:
Large data files, particularly data read from database sources.Memory-intensive tasks. Any task that takes a long time in local analysis mode
mightbeag
Distributed analysis affects only data-related tasks, such as reading data, transforming data, computing new variables, and calculating statistics. It has no effect on tasks related t
Note: Distributed analysis is available only if you have both a local version and access to a licensed server version of the software installed on a remote server.
mputer for memory-intensive work. Since remote servers used for
g time. Distributed analysis with a remote server can be useful if your
o editing output, such as manipulating pivot tables or modifying charts.
Analysis Mode
ood candidate for distributed analysis.

Distributed versus Local Analysis

Following are some guidelines for choosing distributed or local analysis mode:
Database access. Jobs that perform database queries may run faster in distributed
mode if t the same machine as the database engine. If the necessary database access software is available only on the server or if your network administrator does not permit youtodo distributed mode.
he server has superior access to the database or if the server is running on
wnload large data tables, you will be able to access the database only in
63
Page 88
64
Chapter 4
Ratio of computatio n to output. Commands that perform a lot of computation and
produce small output results (for example, few and small pivot tables, brief text results, or
few and simple charts) have the most to gain from running in distributed mode. The degree of improvement depends largely on the computing power of theremoteserver.
Small jobs.
distributed mode because of inherent client/server overhead.
Charts. Case-oriented charts, such as scatterplots, regression residual plots, and
sequence c database tables, this can result in slower performance in distributed mode because the data have to be sent from the remote server to your local computer. Other charts are based on s aggregation is performed on the server.
Interactive graphics. Sinceitispossibletosaverawdatawithinteractivegraphics
(an optio the remote server to your local computer, significantly increasing the time it takes to save your results.
Pivot ta
particularly true for the OLAP Cubes procedure and tables that contain individual case data, such as those available in the Summarize procedure.
Text out
because this text is produced on the remote server and copied to your local computer for display. Text results have low overhead, however, and tend to transmit quickly.
Server Login
Jobs that run quickly in local mode will almost always run slower in
harts, require raw data on your local computer. For large data files or
ummarized or aggregated data and should perform adequately because the
nal setting), this can result in large amounts of data being transferred from
bles.
Large pivot tables may take longer to create in distributed mode. This is
put.
The more text that is produced, the slower it will be in distributed mode
The Server Login dialog box allows you to select the computer that processes commands and runs procedures. This can be either your local computer or a remote
.
server
Page 89
65
Distributed An
Figure 4-1
ServerLogindialogbox
alysis Mode
You can add, modify, or delete remote servers in the list. Remote servers usually require a user ID and a password, and a domain name may also be necessary. Contact your system administrator for information about available servers, a user ID and password, domain names, and other connection information.
You can select a default server and save the user ID, domain name, and password associated with any server. You are automatically connected to the default server when you start a new session.
Adding and Editing Server Login Settings
Use the Server Login Settings dialog box to add or edit connection information for remote servers for use in distributed analysis mode.
Page 90
66
Chapter 4
Figure 4-2
Server Login Settings dialog box
Contact your system administrator for a list of available servers, port numbers for the serve
rs, and additional connection information. Do not use the Secure Socket
Layer unless instructed to do so by your administrator.
Server Name. A server “name” can be an alphanumeric name assigned to a computer
(for exa
mple, hqdev001) or a unique IP address assigned to a computer (for example,
202.123.456.78).
Port Number. The port number is the port that the server software uses for
communi
Description. Enter an optional description to display in the servers list. Connect with Secure Socket Layer. Secure Socket Layer (SSL) encrypts requests
for dist
cations.
ributed analysis when they are sent to the remote SPSS server . Before you use SSL, check with your administrator. SSL must be configured on your desktop computer and the server for this option to be enabled.
Page 91
67
Distributed An
To Select, Switch, or Add Servers
E From the menus choose:
File
Switch Server...
To select a default server:
E In the server list, select the box next to the server that you want to use. E Enter the user ID, domain name, and password provided by your administrator.
Note: You are automatically connected to the default server when you start a new session.
To switch to
E Select the E Enter your
another server:
server from the list.
user ID, domain name, and password (if necessary).
Note: When you switch servers during a session, all open windows are closed. You will be prompted to save changes before the windows are closed.
alysis Mode
To add a server:
E Get the server connection information from your administrator. E Click Add to open the Server Login Settings dialog box. E Enter the connection information and optional settings and click OK.
To edit a server:
E Get the revised connection information from your administrator. E Click Edit to open the Server Login Settings dialog box. E Enter the changes and click OK.
Page 92
68
Chapter 4
Opening Data Files from a Remote Server
Figure 4-3
Open Remote F
ile dialog box
In distributed analysis mode, the Open Remote File dialog box replaces the standard Open File dialog box.
The list of available files, folders, and drives is dependent on what is available
on or from the remote server. The current server name is indicated at the top of the dialog box.
You will not have access to files on your local computer in distributed analysis
mode unless you specify the drive as a shared device or the folders containing your data files as shared folders.
If the server is running a different operating system (for example, you are running
Windows and the server is running UNIX), you probably won’t have access to local data files in distributed analysis mode even if they are in shared folders.
Only one data file can be open at a time. The current data file is automatically closed when a new data file is opened. If you want to have multiple data files open at the same time, you can start multiple sessions.
To Open Data Files from a Remote Server
E If you aren’t already connected to the remote server, log in to the remote server.
Page 93
69
E
Depending on the type of data file that you want to open, from the menus choose:
File
Open
Data...
or
File
Open Database
or
File
Read Text Dat
a...
Saving Data Files from a Remote Server
Figure 4-4
Save Remote File dialog box
Distributed An
alysis Mode
Indistributedanalysismode,theSaveRemoteFiledialogboxreplacesthestandard Save File dialog box.
The list of available folders and drives is dependent on what is available on or from theremoteserver.Thecurrentservernameis indicated at the top of the dialog box. You will not have access to folders on your local computer unless you specify the drive as a shared device and the folders as shared folders. If the server is running a different operating system (for example, you are running Windows and the server is running UNIX), you probably will not have access to local data files in distributed
Page 94
70
Chapter 4
analysis mode even if they are in shared folders. Permissions for shared folders must include the ability to write to the folder if you want to save data files in a local folder.
To Save Dat a
E Make the Da E From the me
File
Save (or Save As...)
Data File Ac
The view of d and the network is based on the computer you are currently using to process commands and run procedures—which is not necessarily the computer in front of you.
Local anal
data files, folders, and drives that you see in the file access dialog box (for opening data files) is similar to what you see in other applications or in the Windows Explorer. You can se folders on mounted network drives that you normally see.
Distributed an alysis mode. When you use another computer as a “remote server” to
run comma the view from the perspective of the remote server computer. Although you may see familiar folder names such as Program Files anddrivessuchasC,theseare not the f remote server.
Files from a Remote Server
ta Editor the active window. nus choose:
cess in Local and Distributed Analys is Mode
ata files, folders (directories), and drives for both your local computer
ysis mode.
e all of the data files and folders on your computer and any files and
nds and procedures, the view of data files, folders, and drives represents
olders and drives on your computer; they are the folders and drives on the
When you use your local computer as your “server,” the view of
Page 95
71
Figure 4-5
Local and remote views
Local View
Remote View
Distributed An
alysis Mode
tributed analysis mode, you will not have access to data files on your local
In dis computer unless you specify the drive as a shared device or the folders containing your data files as shared folders. If the server is running a different operating system
xample, if you are running Windows and the server is running UNIX), you
(for e probably won’t have access to local data files in distributed analysis mode even iftheyareinsharedfolders.
ributed analysis mode is not the same as accessing data files that reside on
Dist another computer on your network. You can access data files on other network devices in local analysis mode or in distributed analysis mode. In local mode, you
ss other devices from your local computer. In distributed mode, you access other
acce network devices from the remote server.
Page 96
72
Chapter 4
If you’re not sure if you’re using local analysis mode or distributed analysis mode, look at the title bar in the dialog box for accessing data files. If the title of the dialog b
Remote Server: [server name] appears at the top of the dialog box, you’re using
ox contains the word Remote (as in
Open Remote File), or if the text
distributed analysis mode. Note:Thisa
ffects only dialog boxes for accessing data files (for example, Open Data, Save Data, Open Database, and Apply Data Dictionary). For all other file types (for example, Viewer files, syntax files, and script files), the local view is always used.
To Set Shar
E In My Comp E On the Fil E Click the
ing Permissions for a Drive or Folder
, click the folder (directory) or drive that you want to share.
uter
e menu, click
Sharing tab,andthenclickShared As.
Properties.
For more information about sharing drives and folders, see the Help for your operating system.
Availability of Procedures in Distributed Analysis Mode
In distributed analysis mode, only procedures installed on both your local version andtheversionontheremoteserverare available. You cannot use procedures installe cannot use procedures installed on your local version that are not also installed on theremoteserver.
components installed locally that are not available on the remote server. If this is the case, switching from your local computer to a remote server will result in the removal of the af will result in errors. Switching back to local mode will restore all affected procedures.
d on the server that are not also installed on your local version, and you
While th
e latter situation may be unlikely, it is possible that you may have optional
fected procedures from the menus, and the corresponding command syntax
Page 97
73
Using UNC Path Specifications
With the Windows NT server version of SPSS, relative path specifications for data files are relative to the current server in distributed analysis mode, not relative to your local c c:\mydocs\mydata.sav does not point to a directory and file on your C drive; it points to a directory and file on the remote server’s hard drive. If the directory and/or file do not exist o
GET FILE='c:\mydocs\mydata.sav'.
If you are using the Windows NT server version of SPSS, you can use universal naming con syntax. The general form of a UNC specification is:
\\servername\sharename\path\filename
Servername is the name of the computer that contains the data file.Sharename is the folder (directory) on that computer that is designated as a
shared folder.
Path is any additional folder (subdirectory) path below the shared folder.Filename is the name of the data file.
omputer. In practical terms, this means that a path specification such as
n the remote server, this will result in an error in command syntax, as in:
vention (UNC) specifications when accessing data files with command
Distributed An
alysis Mode
For example:
GET FILE='
If the comp
GET FILE='\\204.125.125.53\public\july\sales.sav'.
\\hqdev001\public\july\sales.sav'.
uter does not have a name assigned to it, you can use its IP address, as in:
Even with UNC path specifications, you can access data files only from devices and folders de
signated as shared. When you use distributed analysis mode, this includes
data files on your local computer.
UNIX servers. On UNIX platforms, there is no equivalent of the UNC path, and all
director
y paths must be absolute paths that start at the root of the server; relative paths are not allowed. For example, if the data file is located in /bin/spss/data and the current directory is also /bin/spss/data, you must s
GET FILE='/bin/data/spss/sales.sav'.
pecify the entire path, as in:
GET FILE='sales.sav' is not valid;
Page 98
Page 99

Data Editor

The Data Editor provides a convenient, spreadsheet-like method for creating and editing dat session.
The Data Editor provides two views of your data:
Data view. Displays the actual data values or defined value labels.Variable view. Displays variable definition information, including defined variable
a files. The Data Editor window opens automatically when you start a
and value l level (nominal, ordinal, or scale), and user-defined missing values.
abels, data type (for example, string, date, and numeric), measurement
Chapter
5
In both vie

Data View

Figure 5-1
Data view
ws, you can add, change, and delete information contained in the data file.
75
Page 100
76
Chapter 5
Many of the features of the Data view are similar to those found in spreadsheet applications. There are, however, several important distinctions:
Rows are cases. Each row represents a case or an observation. For example, each
individual respondent to a questionnaire is a case.
Columns are variables. Each column represents a variable or characteristic being
measured. F
Cells cont
or example, each item on a questionnaire is a variable.
ain values. Each cell contains a single value of a variable for a case. The cell is the intersection of the case and the variable. Cells contain only data values. Unlike spreadsheet programs, cells in the Data Editor cannot contain formulas.
The data file is rectangular . The dimensions of the data file are determined by
the number o
f cases and variables. You can enter data in any cell. If you enter data in a cell outside the boundaries of the defined data file, the data rectangle is extended to include any rows and/or columns between that cell and the file boundarie
s. There are no “empty” cells within the boundaries of the data file. For numeric variables, blank cells are converted to the system-missing value. For string variables, a blank is considered a valid value.
Variable
View
Figure 5-2

Variable view

Loading...