SPSS 13 BASE BASE USERS GUIDE

Download

Page 1

SPSS® 13.0 Base User's Guide

Page 2

For more information about SPSS®software products, please visit our Web site at http://www.spss.com or contact

SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412 Tel: (312) 651 Fax: (312) 651-3668

SPSS is a registered trademark and t he other product names are the t rademarks of SPSS Inc. for its proprietary computer software. No m the trademark and license rights in the software and the copyrights in the published materials.

The SOFTWARE and documentation are provided with RESTRICTED R IGHTS. Use, duplication , or disclosure by the Governmentis clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker Drive, 11th Floor, Chicago, IL 60606-6412.

General notice: Other product names mentioned herein are used for identification purposes only and may be trademarks of their respec

TableLook is a trademark of SPSS Inc. Windows is a registered trademark of Microsoft Corporation. DataDirect, D Portions of this product were created using LEADTOOLS © 1991–2000, LEAD Technologies, Inc. ALL RIGHTS RESERVED. LEAD, LEADTOOLS, and LEADVIEW are registered trademarks of LEAD Technologies, Inc. Sax Basic is a t All rights reserved. Portions of this product were based on the work of the FreeType Team (http://www.freetype.org). A portion of th software is provided “as is,” without express or implied warranty. A portion of the SPSS software contains Sun Java Runtime libraries. Copyright © 2003 by Sun Microsystems, Inc. All rights reserved. The licensed from IBM and are available at http://oss.software.ibm.com/icu4j/.

No part of this p electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.

1234567890 060504

ISBN 0-13-185

-3000

aterial describing such software may be produced or distributed without the written permission of the owners of

subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software

tive companies.

ataDirect Connect, INTERSOLV,and SequeLink are registered trademarks of DataDirect Technologies.

Sun Java Runtime libraries include code licensed from RSA Security, Inc. Some portions of the libraries are

04 by SPSS Inc.

ublication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means,

723-1

Page 3

SPSS 13.0

Preface

SPSS 13.0 is a comprehensive system for analyzing data. SPSS can take data from almost any distributions and trends, descriptive statistics, and complex statistical analyses.

This manual, the SPSS® Base 13.0 User’s Guide, documents the graphical user interfac in SPSS Base 13.0 are provided in the Help system, installed with the software. Algorithms used in the statistical procedures are available on the product CD-ROM.

In additi Some extended features of the system can be accessed only via command syntax. (Those features are not available in the Student Version.) Complete command syntax i Help menu.

type of file and use them to generate tabulated reports, charts and plots of

e of SPSS for Windows. Examples using the statistical procedures found

on, beneath the menus and dialog boxes, SPSS uses a command language.

s documented in the SPSS 13.0 Command Syntax Reference, available on the

SPSS Opt

ions

The foll Version) SPSS Base system:

SPSS Regression Models™ provides techniques for analyzing data that do not fit

tradit regression, weight estimation, two-stage least-squares regression, and general nonlinear regression.

SPSS Ad

experimental and biomedical research. It includes procedures for general linear models (GLM), linear mixed models, variance components analysis, loglinear analy and basic and extended Cox regression.

owing options are available as add-on enhancements to the full (not Student

ional linear statistical models. It includes procedures for probit analysis, logistic

vanced Models™

sis, ordinal regression, actuarial life tables, Kaplan-Meier survival analysis,

focuses on techniques often used in sophisticated

iii

Page 4

SPSS Tables™ creates a variety of presentation-quality tabular reports, including

complex stub-and-banner tables and displays of multiple response data.

SPSS Trends™

performs comprehensive forecasting and time series analyses with multiple curve-fitting models, smoothing models, and methods for estimating autoregressive functions.

ies®

SPSS Categor

performs optimal scaling procedures, including correspondence

analysis.

SPSS Conjoint™ performs conjoint analysis.

ests™

SPSS Exact T

calculates exact p values for statistical tests when small or very

unevenly distributed samples could make the usual tests inaccurate.

SPSS Missin g Value Analysis™ describes patterns of missing data, estimates means

and other st

SPSS Maps™ turns your geographically distributed data into high-quality maps with

atistics, and imputes values for missing observations.

symbols, colors, bar charts, pie charts, and combinations of themes to present not only what i

SPSS Complex Samples™ allows survey, market, health, and public opinion

s happening but where it is happening.

researchers, as well as social scientists who use sample survey methodology, to incorpora

SPSS Classification Trees™ creates a tree-based classification model. It classifies

te their complex sample designs into data analysis.

cases into groups or predicts values of a dependent (target) variable based on values of indepe

ndent (predictor) variables. The procedure provides validation tools for

exploratory and confirmatory classification analysis. The SPSS family of products also includes applications for data entry, text analysis,

cation, neural networks, and flowcharting.

classifi

Installation

To install the Base system, run the License Authorization Wizard using the

zation code that you received from SPSS Inc. For more information, see the

authori installation instructions supplied with the SPSS Base system.

Page 5

Compatibility

SPSS is designed to run on many computer systems. See the installation instructions that came with your system for specific information on minimum and recommended requirement

Serial Numbers

Your serial number is your identification number with SPSS Inc. You will need this serial numb

er when you contact SPSS for information regarding support, payment, or

an upgraded system. The serial number was provided with your Base system.

Customer Se

If you have

rvice

any questions concerning your shipment or account, contact your local office, listed on the SPSS Web site at http://www.spss.com/worldwide. Please have your serial number ready for identification.

Training Seminars

SPSS Inc. provides both public and onsite training seminars. All seminars feature hands-on workshops. Seminars will be offered in major cities on a regular basis. For more info

rmation on these seminars, contact your local office, listed on the SPSS

Web site at http://www.spss.com/worldwide.

Technica

The servi

l Support

ces of SPSS Technical Support are available to registered customers. Customers may contact Technical Support for assistance in using SPSS or for installation help for one of the supported hardware environments. To reach Technical Support

, see the SPSS Web site at http://www.spss.com, or contact your local office, listedontheSPSSWebsiteathttp://www.spss.com/worldwide.Bepreparedto identify yourself, your organization, and the serial number of your system.

Page 6

Additional Publications

Additional copies of SPSS product manuals may be purchased directly from SPSS Inc. Visit the SPSS Web Store at http://www.spss.com/estore, or contact your local SPSS office,

listed on the SPSS Web site at http://www.spss.com/worldwide.For telephone orders in the United States and Canada, call SPSS Inc. at 800-543-2185. For telephone orders outside of North America, contact your local office, listed on the SPSS W

eb site.

The SPSS Statistical Procedures Companion, by Marija Norušis, has been published by Prentice Hall. A new version of this book, updated for SPSS 13.0, is planned. T

he SPSS Advanced Statistical Procedures Companion,alsobasedon SPSS 13.0, is forthcoming. The SPSS Guide to Data Analysis for SPSS 13.0 is also in development. Announcements of publications available exclusively through Prentice Hall will b your home country, and then click

e available on the SPSS Web site at http://www.spss.com/estore (select

Books).

Tel l Us Yo

Your comm

ur Thoughts

ents are important. Please let us know about your experiences with SPSS products. We especially like to hear about new and interesting applications using the SPSS system. Please send e-mail to suggest@spss.com or write to SPSS Inc., Attn.: D

irector of Product Planning, 233 South Wacker Drive, 11th Floor, Chicago,

IL 60606-6412.

About Th

This man

is Manual

ual documents the graphical user interface for the procedures included in the Base system. Illustrations of dialog boxes are taken from SPSS for Windows. Dialog boxes in other operating systems are similar. Detailed information about the comman

d syntax for features in this module is provided in the SPSS Command Syntax

Reference, available from the Help menu.

ting SPSS

Contac

If you w

ould like to be on our mailing list, contact one of our offices, listed on our

Web site at http://www.spss.com/worldwide.

Page 7

Contents

1Overview 1

What’s New

Windows ................................................... 4

Menus..................................................... 6

Status Bar

Dialog Box

Variable N

Dialog Box

Subdialog

Selecting V

Getting Inf

Basic Steps

Statistics

Finding Out

inSPSS13.0?...................................... 2

.................................................. 7

es ................................................ 7

ames and Variable Labels in Dialog Box Lists . . . . . . . . . . . . . . . . 8

Controls ........................................... 8

Boxes.............................................. 9

ariables............................................ 9

ormation about Variables in Dialog Boxes . . . . . . . . . . . . . . . . . . 10

ormationaboutDialogBoxControls...................... 10

inDataAnalysis ................................... 11

Coach............................................. 12

MoreaboutSPSS................................... 12

2 Getting Help 13

Using the H

Getting He

Using Case

Copying He

elpTableofContents................................. 14

elpIndex.......................................... 15

lponDialogBoxControls.............................. 16

lponOutputTerms................................... 17

Studies........................................... 18

lpTextfromaPop-UpWindow......................... 18

vii

Page 8

3 Data Files 19

Opening a Da

To Open Data

Data File Ty

Opening Fil

Reading Exc

How the Data

Reading Data

Selecting a D

Database Log

Selecting Da

Creating a Pa

Aggregating

Defining Var

Sorting Case

Results ................................................... 38

TextWizard ................................................ 4

File Informat

Saving Data Fi

To S a v e Modifi

Saving Data Fi

Saving Data: D

Saving Subset

Saving File Op

Protecting Or

Virtual Active

taFile .......................................... 19

Files........................................... 19

pes.............................................. 20

eOptions.......................................... 21

elFiles........................................... 21

Editor Reads Older Excel Files and Other Spreadsheets . . . . 22

EditorReadsdBASEFiles........................... 22

baseFiles ....................................... 23

ataSource....................................... 24

in.............................................. 25

taFields ......................................... 26

rameterQuery.................................... 34

Data............................................ 35

iables ........................................... 36

s............................................... 37

ion.............................................. 51

les ............................................ 51

edDataFiles.................................... 51

lesinExcelFormat................................ 52

lesinSASFormat................................. 53

lesinOtherFormats............................... 55

ataFileTypes.................................... 55

sofVariables.................................... 57

tions........................................... 58

iginalData....................................... 58

File............................................ 59

viii

Page 9

4 Distributed

DistributedversusLocalAnalysis................................ 63

Analysis Mode 63

5 Data Edito

Data View

Variable

Entering

Editing D

GotoCase ................................................. 9

Case Selec

Data Edito

r75

.................................................. 75

View............................................... 76

Data............................................... 88

ata ................................................ 90

tionStatusintheDataEditor........................... 95

rDisplayOptions..................................... 95

rPrinting........................................... 96

6 Data Preparation 99

Defining

Copying D

Identify

Visual Ba

Banding V

Automati

Copying B

User-Mis

VariableProperties................................... 100

ataProperties...................................... 107

ingDuplicateCases ................................... 116

nder.............................................. 119

ariables .......................................... 121

callyGeneratingBandedCategories...................... 124

andedCategories................................... 127

singValuesintheVisualBander......................... 128

Page 10

7 Data Transfo

rmations 129

Computing V

Functions................................................. 1

Missing Val

Random Num b

Count Occur

Recoding Va

Recode into

Recode into D

RankCases................................................ 14

Automatic Re

Date and Time

Time Series D

Scoring Data

ariables......................................... 129

uesinFunctions................................... 133

erGenerators .................................. 133

rencesofValueswithinCases........................ 135

lues............................................ 137

SameVariables................................... 137

ifferentVariables ................................ 140

code .......................................... 147

Wizard ....................................... 150

ataTransformations............................... 166

withPredictiveModels............................ 174

8 File Handling and File Transformations 175

SortCases................................................ 1

Transpose................................................. 1

Merging Dat

AddCases ................................................ 17

Add Variabl

Aggregate D

SplitFile.................................................. 1

Select Case

Weight Case

Restructur

aFiles.......................................... 177

es.............................................. 181

ata ............................................ 184

s .............................................. 190

s.............................................. 195

ingData.......................................... 197

Page 11

9 Working with

Viewer................................................... 221

UsingOutputinOtherApplications.............................. 229

PastingObjectsintotheViewer ................................ 233

PasteSpecial.............................................. 233

PastingObjectsfromOtherApplicationsintotheViewer.............. 234

ExportOutput.............................................. 234

ViewerPrinting............................................. 245

SavingOutput.............................................. 251

Output 221

10 Draft Viewe

To C r e a t e D

Controlli

Fonts in Dr

To P r i n t Dra

To S a v e Draf

ngDraftOutputFormat................................. 255

r253

raftOutput ....................................... 254

aftOutput......................................... 260

ftOutput......................................... 260

tViewerOutput................................... 262

11 Pivot Tables 263

ManipulatingaPivotTable.................................... 263

WorkingwithLayers......................................... 268

Bookmarks................................................ 272

ShowingandHidingCells..................................... 273

EditingResults............................................. 275

ChangingtheAppearanceofTables............................. 275

TableProperties............................................ 278

Page 12

ToChangePivotTableProperties............................... 278

TableProperties:General..................................... 279

ToChangeGeneralTableProperties............................. 279

TableProperties:Footnotes................................... 280

ToChangeFootnoteMarkerProperties........................... 280

TableProperties:CellFormats ................................. 281

ToChangeCellFormats ...................................... 282

TableProperties:Borders..................................... 282

ToChangeBordersinaTable.................................. 283

ToDisplayHiddenBordersinaPivotTable........................ 284

TableProperties:Printing..................................... 284

ToControlPivotTablePrinting ................................. 284

Font ..................................................... 285

DataCellWidths............................................ 286

CellProperties............................................. 287

ToChangeCellProperties..................................... 287

CellProperties:Value........................................ 288

ToChangeValueFormatsinaCell............................... 288

ToChangeValueFormatsforaColumn........................... 288

CellProperties:Alignment .................................... 289

ToChangeAlignmentinCells.................................. 289

CellProperties:Margins...................................... 290

ToChangeMarginsinCells.................................... 290

CellProperties:Shading...................................... 291

ToChangeShadinginCells.................................... 291

FootnoteMarker............................................ 291

SelectingRowsandColumnsinPivotTables....................... 292

ToSelectaRoworColumninaPivotTable........................ 293

ModifyingPivotTableResults.................................. 293

PrintingPivotTables......................................... 294

xii

Page 13

ToPrintHiddenLayersofaPivotTable........................... 294

Controlling Table Breaks for W ide and Long Tables. . . . . . . . . . . . . . . . . . 295

12 Working wit

SyntaxRules............................................... 298

PastingSyntaxfromDialogBoxes .............................. 299

CopyingSyntaxfromtheOutputLog............................. 300

EditingSyntaxinaJournalFile................................. 302

ToRunCommandSyntax...................................... 303

MultipleExecuteCommands................................... 304

13 Frequenci

FrequenciesStatistics ....................................... 310

FrequenciesCharts.......................................... 312

FrequenciesFormat......................................... 312

14 Descript

DescriptivesOptions......................................... 317

es 307

ives 315

h Command Syntax 297

15 Explore

ExploreStatistics........................................... 323

319

xiii

Page 14

ExplorePlots .............................................. 324

ExploreOptions ............................................ 325

16 Crosstabs 3

Crosstabs

Layers........................................... 329

ClusteredBarCharts ................................ 330

Statistics......................................... 330

CellDisplay ....................................... 334

TableFormat ...................................... 335

17 Summarize 337

SummarizeOptions.......................................... 339

SummarizeStatistics ........................................ 340

18 Means 343

MeansOptions............................................. 346

19 OLAP Cu

ubesStatistics ....................................... 352

OLAP C

ubesDifferences...................................... 355

OLAP C

OLAP Cu

besTitle ........................................... 356

bes 349

xiv

Page 15

20 T Tests 357

Independen

Paired-Sam

One-Sample

t-SamplesTTest................................... 357

plesTTest....................................... 361

TTest.......................................... 364

21 One-Way ANOVA 367

One-Way A

One-Way AN

NOVAContrasts................................... 370

NOVAPostHocTests............................... 371

OVAOptions..................................... 374

22 GLM Univariate Analysis 377

GLM Mode

GLM Cont

GLM Prof

GLM Post H

GLMSave................................................. 3

GLM Optio

UNIANOVA

l................................................ 381

rasts............................................. 383

ilePlots........................................... 385

ocComparisons................................... 386

ns............................................... 391

CommandAdditionalFeatures........................ 392

23 Bivariate Correlations 395

BivariateCorrelationsOptions ................................. 398

CORRELATIONS and NONPAR CORR Command Additional Features . . . . . 399

Page 16

24 Partial Corr

elations 401

Partial Cor

relationsOptions ................................... 404

25 Distances 405

DistancesDissimilarityMeasures............................... 407

DistancesSimilarityMeasures................................. 408

26 Linear Regression 409

LinearRegressionVariableSelectionMethods..................... 414

LinearRegressionSetRule.................................... 415

LinearRegressionPlots ...................................... 416

LinearRegression:SavingNewVariables......................... 417

LinearRegressionStatistics................................... 420

LinearRegressionOptions .................................... 422

27 Curve Estimation 425

CurveEstimationModels ..................................... 429

CurveEstimationSave....................................... 430

28 Discri

DiscriminantAnalysisDefineRange............................. 434

minant Analysis 431

xvi

Page 17

DiscriminantAnalysisSelectCases ............................. 435

DiscriminantAnalysisStatistics................................ 435

DiscriminantAnalysisStepwiseMethod.......................... 437

DiscriminantAnalysisClassification............................. 438

DiscriminantAnalysisSave.................................... 440

29 Factor Anal

FactorAnalysisSelectCases.................................. 446

FactorAnalysisDescriptives................................... 447

FactorAnalysisExtraction .................................... 448

FactorAnalysisRotation...................................... 450

FactorAnalysisScores....................................... 451

FactorAnalysisOptions ...................................... 452

30 Choosing a

31 TwoStep C

TwoStepClusterAnalysisOptions............................... 459

TwoStepClusterAnalysisPlots................................. 462

TwoStepClusterAnalysisOutput ............................... 463

ysis 441

Procedure for Clustering 453

luster Analysis 455

32 Hierarc

HierarchicalClusterAnalysisMethod............................ 469

hical Cluster Analysis 465

xvii

Page 18

HierarchicalClusterAnalysisStatistics........................... 470

HierarchicalClusterAnalysisPlots.............................. 471

HierarchicalClusterAnalysisSaveNewVariables.................. 471

33 K-Means Clu

K-MeansClusterAnalysisEfficiency............................. 477

K-MeansClusterAnalysisIterate............................... 478

K-MeansClusterAnalysisSave ................................ 478

K-MeansClusterAnalysisOptions.............................. 479

34 Nonparame

Chi-SquareTest............................................ 482

BinomialTest.............................................. 487

RunsTest................................................. 489

One-SampleKolmogorov-SmirnovTest........................... 492

Two-Independent-SamplesTests............................... 495

Two-Related-SamplesTests................................... 499

TestsforSeveralIndependentSamples .......................... 503

TestsforSeveralRelatedSamples.............................. 507

tric Tests 481

ster Analysis 473

35 Multiple

MultipleResponseDefineSets................................. 512

MultipleResponseFrequencies................................ 513

MultipleResponseCrosstabs.................................. 516

Response Analysis 511

xviii

Page 19

MultipleResponseCrosstabsDefineRanges ...................... 518

MultipleResponseCrosstabsOptions............................ 518

MULTRESPONSECommandAdditionalFeatures ................... 519

36 Reporting R

ReportSummariesinRows.................................... 521

ReportSummariesinColumns ................................. 529

REPORTCommandAdditionalFeatures........................... 534

37 Reliabili

Reliability Analysis Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539

RELIABILITYCommandAdditionalFeatures ....................... 541

38 Multidim

MultidimensionalScalingShapeofData.......................... 545

MultidimensionalScalingCreateMeasure........................ 546

MultidimensionalScalingModel................................ 547

MultidimensionalScalingOptions............................... 548

ALSCALCommandAdditionalFeatures........................... 549

ty Analysis 537

ensional Scaling 543

esults 521

39 Ratio St

tatistics............................................. 553

Ratio S

atistics 551

xix

Page 20

40 Overview of t

CreatingandModifyingaChart................................. 555

ChartDefinitionOptions...................................... 561

he Chart Facility 555

41 ROC Curves

ROC Curve

Options.......................................... 572

569

42 Utilities 573

VariableInformation......................................... 573

DataFileComments......................................... 574

VariableSets .............................................. 575

DefineVariableSets......................................... 575

UseSets.................................................. 576

ReorderingTargetVariableLists................................ 577

43 Options 579

GeneralOptions............................................ 580

ViewerOptions............................................. 582

DraftViewerOptions ........................................ 583

OutputLabelOptions ........................................ 585

ChartOptions.............................................. 587

InteractiveChartOptions ..................................... 591

PivotTableOptions.......................................... 593

Page 21

DataOptions............................................... 594

CurrencyOptions........................................... 595

ScriptOptions.............................................. 597

44 Customizin

MenuEditor............................................... 599

CustomizingToolbars........................................ 600

ShowToolbars............................................. 600

ToCustomizeToolbars ....................................... 601

45 Productio

Using the Production Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609

ExportOptions ............................................. 610

UserPrompts.............................................. 612

ProductionMacroPrompting.................................. 614

ProductionOptions.......................................... 614

FormatControlforProductionJobs.............................. 615

RunningProductionJobsfromaCommandLine.................... 618

PublishtoWeb............................................. 619

SmartViewerWebServerLogin................................ 621

n Facility 607

g Menus and Toolbars 599

46 SPSS Scri

ToRunaScript............................................. 623

ScriptsIncludedwithSPSS ................................... 624

pting Facility 623

xxi

Page 22

Autoscripts................................................ 625

CreatingandEditingScripts................................... 626

ToEditaScript............................................. 627

ScriptWindow............................................. 628

StarterScripts ............................................. 631

CreatingAutoscripts......................................... 632

HowScriptsWork........................................... 636

TableofObjectClassesandNamingConventions................... 638

NewProcedure(Scripting).................................... 643

AddingaDescriptiontoaScript................................ 646

ScriptingCustomDialogBoxes................................. 646

DebuggingScripts .......................................... 650

ScriptFilesandSyntaxFiles................................... 653

47 Output Manag

OutputObjectTypes......................................... 661

CommandIdentifiersandTableSubtypes......................... 663

TableLabels............................................... 664

OMSOptions .............................................. 666

Logging .................................................. 671

ExcludingOutputDisplayfromtheViewer......................... 671

RoutingOutputtoSPSSDataFiles .............................. 672

OXMLTableStructure........................................ 682

OMSIdentifiers ............................................ 686

ement System 657

xxii

Page 23

Appendices

A Database Access Administrator 689

B Customizing HTML Documents 691

To Add Customized HTML Code to Exported Output Documents . . . . . . . . 691

ContentandFormatoftheTextFileforCustomizedHTML............. 692

ToUseaDifferentFileorLocationforCustomHTMLCode ............ 692

Index 695

xxiii

Page 24

Page 25

Overview

SPSS for Windows provides a powerful statistical analysis and data management system in a g to do most of the work for you. Most tasks can be accomplished simply by pointing and clicking the mouse.

Chapter

raphical environment, using descriptive menus and simple dialog boxes

In additio Windows provides:

Data Editor. A versatile spreadsheet-like system for defining, entering, editing, and

displayin

Viewer. The Viewer makes it easy to browse your results, selectively show and hide

output, change the display order results, and move presentation-quality tables and charts be

Multidimensional pivot tables. Your results come alive with multidimensional pivot

tables. Explore your tables by rearranging rows, columns, and layers. Uncover importan splitting your table so that only one group is displayed at a time.

High-resolution graphics. High-resolution, full-color pie charts, bar charts, histograms,

scatter

Database access. Retrieve information from databases by using the Database Wizard

instead of complicated SQL queries.

Data tra

You can easily subset data, combine categories, add, aggregate, merge, split, and transpose files, and more.

Electro

export tables and charts in HTML format for Internet and intranet distribution.

n to the simple point-and-click interface for statistical analysis, SPSS for

gdata.

tween SPSS and other applications.

t findings that can get lost in standard reports. Compare groups easily by

plots, 3-D graphics, and more are included as standard features in SPSS.

nsformations.

nic distribution.

Transformation features help get your data ready for analysis.

Send e-mail reports to others with the click of a button, or

Page 26

Chapter 1

Online Help. Detailed tutorials provide a comprehensive overview; context-sensitive

Help topics in dialog boxes guide you through specific tasks; pop-up definitions in pivot table

results explain statistical terms; the Statistics Coach helps you find the procedures that you need; and Case Studies provide hands-on examples of how to use statistical procedures and interpret the results.

Command lan

guage.

Although most tasks can be accomplished with simple point-and-click gestures, SPSS also provides a powerful command language that allows you to save and automate many common tasks. The command language also provides s

ome functionality not found in the menus and dialog boxes.

Complete command syntax documentation is automatically installed when you

install SPSS. To access the syntax documentation:

E From the menus choose

Help

Command Sy

ntax Reference

What’s New in SPSS 13.0?

There are many new features in SPSS 13.0.

Data Management

 The Date and Time Wizard makes it easy to p erform many calculations with

dates and or subtracting a duration from a date. For more information, see “Date and Time Wizard ” in Chapter 7 on p. 150.

 You can append aggregated results to the working data file. For more information,

see “Agg

 You can c

files. For more information, see “Automatic Recode” in Chapter 7 on p. 147.

 Stringvaluescanbeupto32,767byteslong. Previously, the limit was 255 bytes.  You can create multiple panes in Data View of the Data Editor. For more

information, see “Data Editor Display Options” in Chapter 5 on p. 95.

times, including calculating the difference between dates and adding

regateData”inChapter8onp.184.

reate consistent autorecoding schemes for multiple variables and data

Page 27

Charts

 3-D bar charts.  Population pyramids.  Dot plots.  Paneled charts.

Overview

Statistical

 New Classif  New GLM proc  New Logisti  New Multipl  AICandBICs

Enhancements

ication Tree option for building tree models.

edure in the Complex Samples option.

c Regression procedure in the Complex Samples option.

e Correspondence procedure in the Categories option.

tatistics added to Multinomial Logistic Regression in the Regression Models option, plus the ability to specify the type of statistic used for determining the addition and removal of model terms when using various stepwise met

hods.

Output

 Output Management System Control Panel. For more information, see “Output

Management System” in Chapter 47 on p. 657.

 Export Viewer output to PowerPoint. For more information, see “Export Output”

in Chapter 9

 More outpu

on p. 234.

t sorting options and the ability to hide subtotaled categories in

Custom Tables (Tables option).

 Pivot table output for Curve Estimation and Multiple Response in the Base

system, all time series procedures (including the Trends option), Kaplan-Meier and Life Ta

bles in the Advanced Models option, and Nonlinear Regression in the

Regression Models option.

Command Syntax

 You can control the working directory with the new CD command.

Page 28

Chapter 1



 You can use t

More information about command syntax is available in the SPSS Command Syntax Reference PDF file, which you can access from the Help menu.

SPSS Server

 You can score data based on models built with many SPSS procedures. For more

 You can wor

Windows

You can control the treatment of error conditions in included command syntax

files with t

informatio

he new

n, see “Scoring Data with Predictive Models” in Chapter 7 on p. 174.

INSERT command.

FILE HANDLE command to define directory paths.

k with database sources more efficiently by preaggregating and/or presorting data in the database before reading it into SPSS. For more information, see “Aggregating Data” in Chapter 3 on p. 35.

There are a

Data Editor. This window displays the contents of the data file. You can create n ew

number of different types of windows in SPSS:

data files or modify existing ones with the Data Editor. The Data Editor window opens auto

matically when you start an SPSS session. You can have only one data

file open at a time.

Viewer. All statistical results, tables, and charts are displayed in the Viewer. You can

edit the o

utput and save it for later use. A Viewer window opens automatically the

first time you run a procedure that generates output.

Draft Viewer. You can display output as simple text (instead of interactive pivot tables)

in the Dra

Pivot Table Editor. Output displayed in pivot tables can be modified in many ways with

ft Viewer.

the Pivot Table Editor. You can edit text, swap data in rows and columns, add color, create m

Chart Editor. You can modify high-resolution charts and plots in chart windows. You

ultidimensional tables, and selectively hide and show results.

can change the colors, select different type fonts or sizes, switch the horizontal and

l axes, rotate 3-D scatterplots, and even change the chart type.

vertica

Page 29

Overview

Text Output Editor. Text output not displayed in pivot tables can be modified with the

Text Output Editor. You can edit the output and change font characteristics (type, style, colo

Syntax Editor. You can paste your dialog box choices into a syntax window, where

r, size).

your selections appear in the form of command syntax. You can then edit the command syn

tax to use special features of SPSS not available through dialog boxes.

You can save these commands in a file for use in subsequent SPSS sessions.

Script Editor. Scripting and OLE automation allow you to customize and automate

many tasks

Figure 1-1

Data Editor and Viewer

in SPSS. Use the Script Editor to create and modify basic scripts.

Page 30

Chapter 1

Designated versus Active Window

If you have more than one open Viewer window, output is routed to the designated Viewer window. If you have more than one open Syntax Editor window, command syntax is pa are indicated by an exclamation point (!) in the status bar. You can change the designated windows at any time.

The designa the currently selected window. If you have overlapping windows, the active window appears in the foreground. If you open a window, that window automatically becomes the active

Changing the Designated Window

E Make the window that you want to designate the active window (click anywhere

in the window).

E Click the Designate Window tool on the toolbar (the one with the exclamation point).

sted into the designated Syntax Editor window. The designated windows

ted window should not be confused with the active window, which is

window and the designated window.

E From the menus choose:

Menus

Utilities

Designate Window

Figure 1-2

Designate Window tool

Many of the tasks that you want to perform with SPSS start with menu selections. Each win

dow in SPSS has its own menu bar with menu selections appropriate for

that window type.

The Analyze and Graphs menus are available in all windows, making it easy to generat

e new output without having to switch windows.

Page 31

Status Bar

Overview

The status b

Command status. For each procedure or command that you run, a case counter

indicates the number of cases processed so far. For statistical procedures that require iterative p

Filter status. If you have selected a random sample or a subset of cases for analysis,

the message Filter on indicates that some type of case filtering is currently in effect and not all

Weight status. The message Weight on indicates that a weight variable is being used

to weight cases for analysis.

Split File

separate groups for analysis, based on the values of one or more grouping variables.

Showing an

E From the m

View

Status Bar

Dialog Bo

ar at the bottom of each SPSS windowprovides the followinginformation:

rocessing, the number of iterations is displayed.

cases in the data file are included in the analysis.

status.

The message Split File on indicates that the data file has been split into

d Hiding the Status Bar

enus choose:

xes

Most menu and options for analysis.

Dialog boxes for statistical procedures and charts typically have two basic components:

Source va

allowed by the selected procedure are displayed in the source list. Use of short string and long string variables is restricted in many procedures.

Target v

for the analysis, such as dependent and independent variable lists.

selections open dialog boxes. You use dialog boxes to select variables

riable list.

ariable list(s).

A list of variables in the working data file. Only variable types

One or more lists indicating the variables that you have chosen

Page 32

Chapter 1

Variable Nam

You can disp

 To control t

menu in any window.

 To define or modify variable labels, use Variable View in the Data Editor.  For data imported from database sources, field names a re used as variable labels.  For long labels, position the mouse pointer over the label in the list to view

theentirelabel.

 If no variable label is defined, the variable name is displayed.

Figure 1-3

Variable la

es and Variable Labels in Dialog Box Lists

lay either variable names or variable labels in dialog box lists.

he display of variable names or labels, choose

bels displayed in a dialog box

Options from the Edit

Dialog Box Controls

There are five standard controls in most dialog boxes:

OK. Runs the procedure. After you select your variables and choose any additional

cations, click

specifi

Paste. Generates command syntax from the dialog box selections and pastes the

syntax into a syntax window. You can then customize the commands with additional

s not available from dialog boxes.

feature

OK to run the procedure. This also closes the dialog box.

Page 33

Reset. Deselects any variables in the selected variable list(s) and resets all

specifications in the dialog box and any subdialog boxes to the default state.

Overview

Cancel. Can

opened and closes the dialog box. Within a session, dialog box settings are persistent. A dialog box retains your last set of specifications until you override them.

Help. Conte

information about the current dialog box. You can also get help on individual dialog box controls by clicking the control with the right mouse button.

cels any changes in the dialog box settings since the last time it was

xt-sensitive Help. This takes you to a Help window that contains

Subdialog Boxes

Since most procedures provide a great deal of flexibility, not all of the possible choices can be contained in a single dialog box. The main dialog box usually contains the minimu are made in subdialog boxes.

In the main dialog box, controls with an ellipsis (...) after the name indicate that a

subdialo

m information required to r un a procedure. Additional specifications

g box will be displayed.

Selecting Variables

To select a single variable, you simply highlight it on the source variable list and click the variable list, you can double-click individual variables to move them from the source list to the target list.

right arrow button next to the target variable list. If there is only one target

You can also select multiple variables:

 To select multiple variables that are grouped together on the variable list, click the

first one and then Shift-click the last one in the group.

 To select multiple variables that are not grouped together on the variable list, use

the Ctrl and so on.

-click method. Click the first variable, then Ctrl-click the next variable,

Page 34

Chapter 1

Getting Info

E Right-cli E Select Var

Figure 1-4

Variable information with right mouse button

rmation about Variables in Dialog Boxes

ck on a variable in the source or target variable list.

iable Information

from the pop-up context menu.

Getting Information about Dialog Box Controls

E Right-click the control you want to know about. E Select What’s This? on the pop-up context menu.

A pop-up window displays information about the control.

Page 35

Figure 1-5

Right mouse button “What’s This?” pop-up Help for dialog box controls

Overview

Basic Steps in Data Analysis

Analyzing data with SPSS is easy. All you have to do is:

Get your data into SPSS. You can open a previously saved SPSS data file; read a

spreadsheet, database, or text data file; or enter your data directly in the Data Editor.

Select a procedure. Select a procedure from the menus to calculate statistics or

to create a chart.

Select the variables for the analysis. Thevariablesinthedatafilearedisplayedina

dialog box for the procedure.

Run the procedure and look at the results. Results are displayed in the Viewer.

Page 36

Chapter 1

Statistics C

If you are un the Statistics Coach can help you get started by prompting you with simple questions, nontechnical language, and visual examples that help you select the basic statistical and chartin

To use the S

Help

Statistics Coach

The Statistics Coach covers only a selected subset of procedures in the SPSS Base system. It used statistical techniques.

Finding Ou

For a compr SPSS menu choose:

Help

Tutorial

oach

familiar with SPSS or with the statistical procedures available in SPSS,

g features that are best suited for your data.

tatistics Coach, from the menus in any SPSS window choose:

is designed to provide general assistance for many of the basic, commonly

t More about SPSS

ehensive overview of SPSS basics, see the online tutorial. From any

Page 37

Getting H elp

Help is provided in many different forms:

Help menu. The Help menu in most SPSS windows provides access to the main Help

system, plus tutorials and technical reference material.

 Topics. Provides access to the Contents, Index, and Search tabs, which you

can use to find specific Help topics.

 Tutoria l. Illustrated, step-by-step instructions on how to use many of the basic

features in SPSS. You don’t have to view the whole tutorial from start to finish. You can choose the topics you want to view, skip around and view topics in any order, and use the index or table of contents to find specific topics.

 Case Studies. Hands-on examples of how to create various types of statistical

analyses and how to interpret the results. The sample data files used in the examples are also provided so that you can work through the examples to see exactly how the results were produced. You can choose the specific procedure(s) you want to learn about from the table of contents or search for relevant topics in the index.

 Statistics Coach. A wizard-like approach to guide you through the process of

finding the procedure that you want to use. After you make a series of selections, the Statistics Coach opens the dialog box for the statistical, reporting, or charting procedure that meets your selected criteria. The Statistics Coach provides access to most statistical and reporting procedures in the Base system and many charting procedures.

 Command Syntax Reference. Detailed command syntax reference information is

provided in the SPSS Command Syntax Reference, available from the Help menu.

Chapter

Context-sensitive Help. In many places in the user interface, you can get

context-sensitive Help.

Page 38

Chapter 2



Dialog box Help buttons. Most dialog boxes have a Help button that takes you

directly to

a Help topic for that dialog box. The Help topic provides general

information and links to related topics.

 Dialog box context menu Help. Many dialog boxes provide context-sensitive Help

for individual controls and features. Right-click on any control in a dialog box

hat’s This?

and select

control and directions for its use. (If

from the context menu to display a description of the

What’s This? does not appear on the context

menu, then this form of Help is not available for that dialog box.)

 Pivot table context menu Help. Right-click on terms in an activated pivot table in

the Viewer

and select

What’s This? from the context menu to display definitions

of the terms.

 Case Studies. Right-click on a pivot table and select Case Studies from the context

menu to go directly to a detailed example for the procedure that produced that

Case Studies does not appear on the context menu, then this form of

table. (I

Help is not available for that procedure.)

 Command syntax charts. In a command syntax window, position the cursor

anywhere within a syntax block for a command and press F1 on the keyboard. A complete

command syntax chart for that command will be displayed.

Other res

ources.

If you can’t find the information you want in the Help system, these

other resources may have the answers you need:

 SPSS for Windows Developer’s Guide. Provides information and examples for the

develop

er’s tools included with SPSS for Windows, including OLE automation, third-party API, input/output DLL, production facility, and scripting facility. The Developer’s Guide is available in PDF form in the SPSS\developer directory on the inst

 Techni

allation CD.

cal Support Web site.

Answers to many common problems can be found at http://support.spss.com. (The Technical Support Web site requires a login ID and password. Information on how to obtain an ID and password is provided at the URL

listed above.)

Using the Help Table of Contents

E In any window, from the menus choose:

Help

Topics

Page 39

Click the Contents tab.

E Double-click items with a book icon to expand or collapse the contents. E ClickanitemtogotothatHelptopic.

Getting Help

Using t

E In any

Figure 2-1

Help window w

ith Contents tab displayed

he Help Index

window, from the menus choose:

Help

Topics

Click the Index tab.

E Enter a term to search for in the index. E Double-click the topic that you want.

The Help index uses incremental search to find the text that you enter and selects

sest match in the index.

the clo

Page 40

Chapter 2

Figure 2-2

Index tab and incremental search

Getting

E Right- E Choose

Help on Dialog Box Controls

click on the dialog box control that you want information about.

What’s This? from the pop-up context menu.

A description of the control and how to use it is displayed in a pop-up window. General information about a dialog box is available from the Help button in the dialog b

ox.

Page 41

Figure 2-3

Dialog box control Help with right mouse button

Getting Help on Output Terms

E Double-click the pivot table to activate it. E Right-click on the term that you want to be explained.

Getting Help

E Choose What’s This? from the context menu.

A definition of the term is displayed in a pop-up window.

Page 42

Chapter 2

Figure 2-4

Activated pivot table glossary Help with right mouse button

UsingCaseStudies

E Right-click on a pivot table in the Viewer window. E Choose Case Studies from the pop-up context menu.

Copyin

g Help Text from a Pop-Up Window

E Right E Choos

-click anywhere in the pop-up window.

Copy from the context menu.

The entire text of the pop-up window is copied.

Page 43

Data Files

Data files come in a wide variety of formats, and this software is designed to handle many of them

 Spreadshe  Database f  Tab-delim  Data files  SYSTAT dat  SAS data fi

, including:

Chapter

ets created with Lotus 1-2-3 and Excel

iles created with dBASE and various SQL formats ited and other types of ASCII text files in SPSS format created on other operating systems

afiles

les

Opening a Data File

In addition to files saved in SPSS format, you can open Excel, Lotus 1-2-3, dBASE, and tab-del entering data definition information.

To Open Dat

E From the m

File

Open

Data...

In the Open File dialog box, select the file that you want to open.

E Click Open.

imited files without converting the files to an intermediate format or

a Files

enus choose:

Page 44

Chapter 3

Optionally, you can:

 Read variable names from the first row for spreadsheet and tab-delimited files.  Specify a range of cells to read for spreadsheet files.  Specify a sheet within an Excel file to read (Excel 5 or later).

Data File Ty p

SPSS. Opens d

Macintosh, UNIX, and also the DOS product SPSS/PC+.

SPSS/PC+. Opens SPSS/PC+ data files. SYSTAT. Ope SPSS Portable. Opens data files saved in SPSS portable format. Saving a file in

portable format takes considerably longer than saving the file in SPSS format.

Excel. Open Lotus 1-2-3. Opens data files saved in 1-2-3 format for release 3.0, 2.0, or 1A of Lotus. SYLK. Opens data files saved in SYLK (symbolic link) format, a format used by

some spread

dBASE. Opens dBASE-format files for either dBASE IV, dBASE III or III PLUS,

or dBASE II. Each case is a record. Variable and value labels and missing-value specificat

SAS Long File Name. SAS version 7-9 for Windows, long extension. SAS Short File Name. SAS version 7-9 for Windows, short extension.

SAS v6 for Wi

ata files saved in SPSS format, including SPSS for Windows,

ns SYSTAT data files.

s Excel files.

sheet applications.

ions are lost when you save a file in this format.

ndows.

SAS version 6.08 for Windows and OS2.

SAS v6 for UNIX. SAS version 6 for UNIX (Sun, HP, IBM). SAS Transport. SAS transport file. Text . ASCII

text file.

Page 45

Data Files

Opening File

Read variab

of the file or the first row of the defined range. The values are converted as necessary to create valid variable names, including converting spaces to underscores.

Worksheet.

Data Editor reads the first worksheet. To read a different worksheet, select the worksheet from the drop-down list.

Range. For

method for specifying cell ranges as you would with the spreadsheet application.

Reading Ex

cel F iles

Read varia

first row of the defined range. Values that don’t conform to variable naming rules are converted to valid variable names, and the original names are used as variable labels.

Workshee

reads the first worksheet. To read a different worksheet, select the worksheet from the drop-down list.

Range. Yo

ranges as you would in Excel.

Options

le names.

Excel 5 or later files can contain multiple worksheets. By default, the

spreadsheet data files, you can also read a range of cells. Use the same

ble names.

. Excel files can contain multiple worksheets. By default, the Data Editor

u can also read a range of cells. Use the same method for specifying cell

For spreadsheets, you can read variable names from the first row

You can read variable names from the first row of the file or the

How the D

ata Editor Reads Excel 5 or Later Files

The foll

Data type and width. Each column is a variable. The data type and width for each

variable is determined by the data type and width in the Excel file. If the column contain to string, and all values are read as valid string values.

Blank cells. For numeric variables, blank cells are converted to the system-missing

value, blank cells are treated as valid string values.

owing rules apply to reading Excel 5 or later files:

s more than one data type (for example, date and numeric), the data type is set

indicated by a period. For string variables, a blank is a valid string value, and

Page 46

Chapter 3

Variable names. If you read the first row of the Excel file (or the first row of the

specified range) as variable names, values that don’t conform to variable naming rules are convert labels. If you do not read variable names from the Excel file, default variable names are assigned.

ed to valid variable names, and the original names are used as variable

How the Data Editor Reads Older Excel Files and Other Spreadsheets

The following rules apply to reading Excel files prior to version 5 and other spreadshe

Data type and width. The data type and width for each variable are determined by the

column width and data type of the first data cell in the column. Values of other types are conve the global default data type for the spreadsheet (usually numeric) is used.

Blank cells. For numeric variables, blank cells are converted to the system-missing

value, in blank cells are treated as valid string values.

et data:

rted to the system-missing value. If the first data cell in the column is blank,

dicated by a period. For string variables, a blank is a valid string value, and

Variable names. If you do not read variable names from the spreadsheet, the column

(A, B, C, ...) are used for variable names for Excel and Lotus files. For SYLK

letters files and Excel files saved in R1C1 display format, the software uses the column number preceded by the letter C for variable names (C1, C2, C3, ...).

How the Data Editor Reads dBASE Files

Database files are logically very similar to SPSS-format data files. The following general rules apply to dBASE files:

 Field names are converted to valid variable names.  Colons used in dBASE field names are translated to underscores.  Records marked for deletion but not actually purged are included. The software

creates a new string variable, D_R, which contains an asterisk for cases marked

etion.

for del

Page 47

Data Files

Reading Data

You can read local analysis mode, the necessary drivers must be installed on your local computer. In distributed analysis mode (available with the server version), the drivers must be installed o Mode” in Chapter 4 on p. 63.

To Read Dat

E From the m

E Depending on the data source, you may need to select the database file and/or enter a

E Select the table(s) and fields. E Specify any relationships between your tables.

abase Files

File

Open Database

New Query...

Select the data source.

base Files

data from any database format for which you have a database driver. In

n the remote server. For more information, see “Distributed Analysis

enus choose:

Optionally, you can:

 Specify any selection criteria for your data.  Add a prompt for user input to create a parameter query.  Save the query you have constructed before running it.

To Edit Saved Database Queries

E From the menus choose:

File

Open Database

Edit Query...

Select the query file (*.spq) that you want to edit.

Page 48

Chapter 3

Follow the instructions for creating a new query.

To Read Data

E From the me

E Depending on the database file, you may need to enter a login name and password. E If the query has an embedded prompt, you may need to enter other information (for

base Files with Saved Queries

nus choose:

File

Open Database

Run Query...

Select the query file (*.spq) that you want to run.

example, the quarter for which you want to retrieve sales figures).

Selecting a D ata Source

Use the first screen to select the type of data source to read. After you have chosen thefiletype,theDatabaseWizardmaypromptyouforthepathtoyourdatafile.

Ifyoudono source, click version), this button is not available. To add data sources in distributed analysis mode, see y

Data sources. A data source consists of two essential pieces of information: the driver

that will be used to access the data and the location of the database that you want to access. T local analysis mode, you can install drivers from the CD-ROM for this product:

 SPSS Data Access Pack. Installs drivers for a variety of database formats.

Availabl

 Microso

Microsoft Access. To install the Microsoft Data Access Pack, double-click

Microsoft Data Access Pack in the Microsoft Data Access Pack folder on the

CD-ROM.

t have any data sources configured, or if you want to add a new data

AddDataSource. In distributed analysis mode (available with the server

our system administrator.

o specify data sources, you must have the appropriate drivers installed. For

e on the AutoPlay menu.

ft Data Access Pack.

Installs drivers for Microsoft products, including

Page 49

Figure 3-1

Database Wizard dialog box

Data Files

Database Login

If your database requires a password, the Database Wizard will prompt you for one before it can open the data source.

Page 50

Chapter 3

Figure 3-2

Logindialogbox

Selecting Data Fields

The Select Data step controls which tables and fields are read. Database fields (columns) are read as variables.

If a table has any field(s) selected, all of its fields will be visible i n the following Database Wizard windows, but only those fields selected in this dialog box will be imported as variables. This enables you to create table joins and to specify criteria using fields that you are not importing.

Page 51

Figure 3-3

Select Data dialog box

Data Files

Displaying field names. To list the fields in a table, click the plus sign (+) to the left of

a table name. To hide the fields, click the minus sign (–) to the left of a table name.

To add a field. Double-click any field in the Available Tables list, or drag it to the

Retrieve Fields in This Order list. Fields can be reordered by dragging and dropping them within the selected fields list.

To remove a field. Double-click any field in the Retrieve Fields in This Order list,

or drag it to the Available Tables list.

Sort field names. If selected, the Database Wizard will display your available fields

in alphabetical order.

Page 52

Chapter 3

Creating a Relationship between Tables

The Specify Relationships dialog box allows you to define the relationships between the tables. If fields from more than one table are selected, you must define at least one join.

Figure 3-4

Specify Relationships dialog box

lishing relationships.

Estab

To create a relationship, drag a field from any table onto the

field to which you want to join it. The Database Wizard will draw a join line between the two fields, indicating their relationship. These fields must be of the same data type.

Auto J

oin Tables.

Attempts to automatically join tables based on primary/foreign keys

or matching field names and data type.

Page 53

Specifying join types. If outer joins are supported by your driver, you can specify

either inner joins, left outer joins, or right outer joins. To select the type of join, double-cli

ck the join line between the fields, and the wizard will display the

Relationship Properties dialog box. You can also use the icons in the upper right corner of the dialog box to choose

thetypeofj

oin.

Relationship Properties

This dialog box allows you to specify which type of relationship joins your tables.

Figure 3-5

Relationship Properties dialog box

Data Files

Inner joins. An inner join includes only rows where the related fields are equal.

Complet

ing this would give you a data set that contains the variables ID, REGION,

SALES95,andAVGINC foreachemployeewhoworkedinafixedregion.

Page 54

Chapter 3

Figure 3-6

Creating an inner join

Outer joins. A left outer join includes all records from the table on the left and only

ecords from the table on the right in which the related fields are equal. In a

those r right outer join, this relationship is switched, so that the join imports all records from the table on the right and only those records from the table on the left in which the

ed fields are equal.

relat

Page 55

Figure 3-7

Creating a right outer join

Data Files

Limiting Retrieved Cases

The Limit Retrieved Cases dialog box allows you to specify the criteria to select subsets of cases (rows). Limiting cases generally consists of filling the criteria grid

ne or more criteria. Criteria consist of two expressions and some relation

with o between them. They return a value of true, false,ormissing for each case.

 If the result is true, the case is selected.  If the result is false or missing, the case is not selected.

Page 56

Chapter 3



Most criteria use one or more of the six relational operators (<, >, <=, >=,

=, and <>).

 Expression

s can include field names, constants, arithmetic operators, numeric and other functions, and logical variables. You can use fields that you do not plan to import as variables.

Figure 3-8

Limit Retri

eved Cases dialog box

To buildyour criteria, you need at least two expressions and a relation to connect them.

E To build an expression, put your cursor in an Expression cell. You can type field

names, constants, arithmetic operators, numeric and other functions, and logical variables. Other methods of putting a field into a criteria cell include double-clicking the field in the Fields list, dragging the field from the Fields list, or selecting a field from the drop-down menu that is available in any active Expression cell.

Page 57

Data Files

The two expressions are usually connected by a relational operator, such as = or >. To

choose the relation, put your cursor in the Relation cell and either type the operator or select it

from the drop-down menu.

Dates and ti

mes in expressions need to be specified in a special manner (including the

curly braces shown in the examples):

 Date literals should be specified using the general form: {d yyyy-mm-dd}.  Time literals should be specified using the general form: {t hh:mm:ss}.  Date/time literals (timestamps) should be specified using the general form: {dt

yyyy-mm-dd

Functions.

hh:mm:ss}

A selection of built-in arithmetic, logical, string, date, and time SQL

functions is provided. You can select a function from the list and drag it into the expression, or you can enter any valid SQL function. See your database documenta

tion for valid SQL functions. A list of standard functions is available at:

http://msdn.microsoft.com/library/en-us/odbc/htm/odbcscalar_functions.asp

Use Random Sampling. Selects a random sample of cases from the data source.

For large data sources, you may want to limit the number of cases to a small, represen

tative sample. This can significantly reduce the time that it takes to run procedures. Native random sampling, if available for the data source, is faster than SPSS random sampling, since SPSS random sampling must still read the entire data sour

 Approxi

ce to extract a random sample.

mately.

Generates a random sample of approximately the specified percentage of cases. Since this routine makes an independent pseudo-random decision for each case, the percentage of cases selected can only approximate the spec

ified percentage. The more cases there are in the data file, the closer the

percentage of cases selected is to the specified percentage.

 Exactly. Selects a random sample of the specified number of cases from the

specified total number of cases. If the total number of cases specified exceeds

al number of cases in the data file, the sample will contain proportionally

the tot fewer cases than the requested number.

Prompt

for Value.

You can embed a prompt in your query to create a parameter query.

When users run the query, they will be asked to enter information specified here. You might want to do this if you need to see different views of the same data. For example,

Page 58

Chapter 3

you may want to run the same query to see sales figures for different fiscal quarters. Place your cursor in any Expression cell, and click Prompt for Value to create a prompt.

Note:Ifyou SPSS Server) i s not available.

Creating a P

Use the Prom from users each time someone runs your query. It is useful if you want to query the same data source using different criteria.

Figure 3-9

Prompt for Value dialog box

use random sampling, aggregation (available in distributed mode with

arameter Query

pt for Value dialog box to create a dialog box that solicits information

To build a prompt, you need to enter a prompt string and a default value. The prompt string is displayed each time a user runs your query. It should specify the kind of information to enter, and, if the user is not selecting from a list, it should give hints about how the input should be formatted. Forexample,“EnteraQuarter(Q1,Q2, Q3, ...)”.

Allow user to select value from list. If this is selected, you can limit the user to the

values you place here, which are separated by returns.

Data type. Specify the data type here, either Number, String,orDate.

Page 59

The final result looks like this:

Data Files

Figure 3-10

User-define

dpromptdialogbox

Aggregating Data

If you are in distributed mode, connected to a remote server (available with SPSS Server), you can aggregate the data before reading it into SPSS.

Figure 3-11

Aggregate Data dialog box

Page 60

Chapter 3

You can also aggregate data after reading it into SPSS, but preaggregating may save time for large data sources.

E Select one or more break variables that define how cases are grouped to create

aggregated

data.

E Select one E Select an a

or more aggregate variables.

ggregate function for each aggregate variable.

Optionally, you can create a variable that contains the number of cases in each break group.

Note:Ifyou

use random sampling, aggregation is not available.

Defining Variables

Variable names and labels. The c omplete database field (column) name is used as the

variable l variable names to each column from the database in one of two ways:

 If the name of the database field forms a valid, unique variable name, it is used

 If the nam

Click any c

Converting strings to numeric values. Check the Recode to Numeric box for a string

variable if you want to automatically convert it to a numeric variable. String values are conve values. The original values are retained as value labels for the new variables.

abel. Unless you modify the variable name, the Database Wizard assigns

as the vari

able name.

e of the database field does not form a valid, unique variable name, a

new, unique name is automatically generated.

elltoeditthevariablename.

rted to consecutive integer values based on alphabetic order of the original

Width for variable-width strings. Controls the width of variable-width string values.

By defaul

t, the width is 255 bytes, and only the first 255 bytes (typically 255

characters in single-byte languages) will be read. The width can be up to 32,767 bytes. Although you probably don’t want to truncate string values, you also don’t want to s

pecify an unnecessarily large value, since excessively large values will

cause SPSS processing to be inefficient.

Page 61

Figure 3-12

Define Variables dialog box

Data Files

Sorting Cases

If you are in distributed mode, connected to a remote server (available with SPSS Server), you can sort the data before reading it into SPSS.

Page 62

Chapter 3

Figure 3-13

Sort Cases dialog box

Results

You can also sort data after reading it into SPSS, but presorting may save time for largedatasources.

The Results dialog box displays the SQL Select statement for your query.

 You can edit the SQL Select statement before you run the query, but if you

click the

Back button to make changes in previous steps, the changes to the

Select statement will be lost.

Page 63

Data Files



You can save the query for future use with Save query to a file.

 Select Paste it into the syntax editor for further modification to paste complete GET

DATA

syntax i

nto a syntax window. Copying and pasting the Select statement

from the Results window will not paste the necessary command syntax.

Note: The pa

sted syntax contains a blank space before the closing quote on each line of SQL generated by the wizard. These blanks are not superfluous. When the command is processed, all of the lines of the SQL statement are merged together in a very lite

ral fashion. Without the space, there would be no space between the last

character on one line a nd first character on the next line.

Figure 3-1

Results di

alog box

Page 64

Chapter 3

Text Wizard

The Text Wiz

 Tab-delimi  Space-deli  Comma-deli  Fixed-fiel

ard can read text data files formatted in a variety of ways:

ted files

mited files

dformatfiles

For delimited files, you can also specify other characters as delimiters between values, and you can specify multiple delimiters.

To Read Text Data Files

E From the menus choose:

File

Read Text Data

Select the text file in the Open dialog box.

E Follow the steps in the Text Wizard to define h ow to read the data file.

mited files

Page 65

Text Wizard Step 1

Data Files

Figure 3-15

Text W iz a r d S

tep 1

The text file is displayed in a preview window. You can apply a predefined format (previously saved from the Text Wizard) or follow the steps in the Text Wizard to specify how the data should be read.

Page 66

Chapter 3

Text Wizard Step 2

Figure 3-16

Text W iz a r d S

tep 2

This step provides information about variables. A variable is similar to a field in a database. For example, each item in a questionnaire is a variable.

How are your variables arranged? To read your data properly, the Text Wizard needs to

know how to determine where the data value for one variable ends and the data value for the next variable begins. The arrangement of variables defines the method used to differentiate one variable from the next.

 Delimited. Spaces, commas, tabs, or other characters are used to separate

variables. The variables are recorded in the same order for each case but not necessarily in the same column locations.

 Fixed width. Each variable is recorded in the same column location on the same

record (line) for each case in the data file. No delimiter is required between variables. In fact, in many text data files generated by computer programs, data

Page 67

Data Files

values may appear to run together without even spaces separating them. The column loca

Are variable names included at the top of your file? If the first row of the data file

tion determines which variable is being read.

contains descriptive labels for each variable, you can use these labels as variable names. Valu

es that don’t conform to variable naming rules are converted to valid

variable names.

Text Wizar

Figure 3-17

Text Wizard Step 3 for delimited files

d Step 3: Delimited Files

This step provides information about cases. A case is similar to a record in a database. For example, each respondent to a questionnaire is a case.

The first case of data begins on which line number? Indicates the first line of the data

file that contains data values. If the top line(s) of the data file contain descriptive labels or other text that does not represent data values, this will not be line 1.

Page 68

Chapter 3

How are your cases represented? Controls how the Text Wizard determines where

each case ends and the next one begins.

 Each line represents a case. Each line contains only one case. It is fairly common

for each case to be contained on a single line (row), even though this can be a very long li

ne for data files with a large number of variables. If not all lines contain the same number of data values, the number of variables for each case is determined by the line with the greatest number of data values. Cases with fewer data value

 A specifi

s are assigned missing values for the additional variables.

c number of variables represents a case.

The specified number of variables for each case tells the Text Wizard where to stop reading one case and start reading the next. Multiple cases can be contained on the same line, and cases can start in th

e middle of one line and be continued on the next line. The Text Wizard determines the end of each case based on the number of values read, regardless of the number of lines. Each case must contain data values (or missing values indicate

How many cases do you want to import? You can import all cases in the data file, the

d by delimiters) for all variables, or the data file will be read incorrectly.

first n cases (n is a number you specify), or a random sample of a specified percentage. Since th

e random sampling routine makes an independent pseudo-random decision for each case, the percentage of cases selected can only approximate the specified percentage. The more cases there are in the data file, the closer the percentage of cases se

lected is to the specified percentage.

Page 69

Text Wizard Step 3: Fixed-Width Files

Data Files

Figure 3-18

Text W iza r d S

tep 3 for fixed-width files

This step provides information about cases. A case is similar to a record in a database.

ample, each respondent to questionnaire is a case.

For ex

The first case of data begins on which line number? Indicates the first line of the data

file that contains data values. If the top line(s) of the data file contain descriptive

s or other text that does not represent data values, this will not be line 1.

label

How many lines represent a case? Controls how the Text Wizard determines where

each case ends and the next one begins. Each variable is defined by its line number

n the case and its column location. You need to specify the number of lines

withi for each case to read the data correctly.

How many cases do you want to import? You can import all cases in the data file, the

t n cases (n is a number you specify), or a random sample of a specified percentage.

firs Since the random sampling routine makes an independent pseudo-random decision

Page 70

Chapter 3

for each case, the percentage of cases selected can only approximate the specified percentage. The more cases there are in the data file, the closer the percentage of cases selec

ted is to the specified percentage.

Text Wizard Step 4: Delimited Files

Figure 3-19

Text Wizard Step 4 for delimited files

This step displays the TextWizard’s best guess on how to read the data file and allows you to modify how the Text Wizard will read variables from the data file.

Which delimiters appear between variables? Indicates the characters or symbols that

separate data values. You can select any combination of spaces, commas, semicolons, tabs, or other characters. Multiple, consecutive delimiters without intervening data values are treated as missing values.

Page 71

Data Files

What is the text qualifier? Characters used to enclose values that contain delimiter

characters. For example, if a comma is the delimiter, values that contain commas will be read

incorrectly unless there is a text qualifier enclosing the value, preventing the commas in the value from being interpreted as delimiters between values. CSV-format data files exported from Excel use a double quotation mark (“) as a text qualifier

. The text qualifier appears at both the beginning and the end of the value,

enclosing the entire value.

Text Wizar

Figure 3-20

Text Wizard Step 4 for fixed-width files

d Step 4: Fixed-Width Files

This step displays the Text Wizard’s best guess on how to read the data file and allows you to modify how the Text Wizard will read variables from the data file. Vertical lines in the preview window indicate where the Text Wizard currently thinks each variable begins in the file.

Page 72

Chapter 3

Insert, move, and delete variable break lines as necessary to separate variables. If multiple lines are used for each case, select each line from the drop-down list and modify the v

Note: For computer-generated data files that produce a continuous stream of data values with no intervening spaces or other distinguishing characteristics, it may be difficul

t to determine where each variable begins. Such data files usually rely on a data definition file or some other written description that specifies the line and column location for each variable.

Text Wizard Step 5

ariable break lines as necessary.

Figure 3-2

Text W iz a r

dStep5

This steps controls the variable name and the data format that the Text Wizard will use to read each variable and which variables will be included in the final data file.

Page 73

Variable name. You can overwrite the default variable names with your own

variable names. If you read variable names from the data file, the Text Wizard will automatica

lly modify variable names that don’t conform to variable naming rules.

Select a variable in the preview window and then enter a variable name.

Data format. Select a variable in the preview window and then select a format from the

drop-down l

ist. Shift-click to select multiple contiguous variables or Ctrl-click to

select multiple noncontiguous variables.

Text Wizard Formatting Options

Formatting options for reading variables with the Text Wizard include:

rt.

Do not impo

Numeric. Valid values include numbers, a leading plus or minus sign, and a decimal

Omit the selected variable(s) from the imported data file.

indicator.

Data Files

String. Va

lid values include virtually any keyboard characters and embedded blanks. For delimited files, you can specify the number of characters in the value, up to a maximum of 32,767. By default, the Text Wizard sets the number of characters to the longest s

tring value encountered for the selected variable(s). For fixed-width files, the number of characters in string values is defined by the placement of variable break lines in step 4.

Date/Tim

Valid values include dates of the general format dd-mm-yyyy, mm/dd/yyyy,

dd.mm.yyyy, yyyy/mm/dd, hh:mm:ss, and a variety of other date and time formats.

Months can be represented in digits, Roman numerals, or three-letter abbreviations, or they c

Dollar. Valid values are numbers with an optional leading dollar sign and optional

an be fully spelled out. Select a date format from the list.

commas as thousands separators.

Comma. V

alid values include numbers that use a period as a decimal indicator and

commas as thousands separators.

Dot. Valid values include numbers that use a comma as a decimal indicator and

s as thousands separators.

period

Page 74

Chapter 3

Note: Values that contain invalid characters for the selected format will be treated as missing. Values that contain any of the specified delimiters will be treated as multiple va

Text Wizard Step 6

Figure 3-22

Text W iz a r d S t e p 6

lues.

This is the final step of the Text Wizard. You can save your specifications in a file for use when importing similar text data files. You can also paste the syntax generated by the Text Wizard into a syntax window. You can then customize and/or save the syntax for use in other sessions or in production jobs.

Cache data locally. Adatacacheiscompletecopyofthedatafile,storedintemporary

disk space. Caching the data file can improve performance.

Page 75

Data Files

File Informa

Adatafilec information, including:

 Variable names  Variable formats  Descriptive variable and value labels

This information is stored in the dictionary portion of the data file. The Data Editor provides on complete dictionary information for the working data file or any other data file.

To Obtain Da

E From the me

E For other data files, choose External File, and then select the data file.

ta File Information

File

Display Data File Information

For the currently open data file, choose Working File.

The data file information is displayed in the Viewer.

tion

ontains much more than raw data. It also contains any variable definition

e way to view the variable definition information. You can also display

nus in the Data Editor window choose:

Saving Data Files

Any changes that you make in a data file last only for the duration of the current session—unless you explicitly save the changes.

To Save Modified Data Files

E Make the Data Editor the active window (click anywhere in the window to make it

active).

E From the menus choose:

File

Save

Page 76

Chapter 3

The modified data file is saved, overwriting the previous version of the file.

Saving Data

You can save format depends on the version of Excel that will be used to open the data. The Excel application cannot open an Excel file from a newer version of the application. For example, E can easily read an Excel 5.0 document.

There are a few limitations to the Excel file format that don’t exist in SPSS. These

limitatio

 Variable

in exported Excel files.

 WhenexportingtoExcel97andlater,anoptionisprovidedtoincludevalue

labels instead of values.

 Because all Excel files are limited to 256 columns of data, only the first 256

variables

 Excel 4.0

Excel 97–2000 files allow 65,536 records. If your data exceed these limits, a warning message is displayed and the data are truncated to the maximum size allowed b

Variable Types

Files in Excel Format

your data in one of three Microsoft Excel file formats. The choice of

xcel 5.0 cannot open an Excel 2000 document. However, Excel 2000

ns include:

information, such as missing values and variable labels, is not included

are included in the exported file.

and Excel 5.0/95 files are limited to 16,384 records, or rows of data.

yExcel.

The following table shows the variable type matching between the original data in SPSS and t

SPSS Variable Type Excel Data Format

Numeric 0.00; #,##0.00;... Comma 0.00; #,##0.00;... Dollar Date d-mmm-yyyy Time hh:mm:ss String General

he exported data in Excel.

$#,##0_);...

Page 77

Data Files

Saving Data F

Special han These cases include:

 Certain characters that are allowed in SPSS variable names are not valid in SAS,

such as @, #, when the data are exported.

 SPSS variable labels containing more than 40 characters are truncated when

exported to a SAS v6 file.

 Where they exist, SPSS variable labels are mapped to the SAS variable labels.

If no varia the SAS variable label.

 SAS allows only one value for system-missing, whereas SPSS allows numerous

system-missing values. As a result, all system-missing values in SPSS are mapped to a

Save Value Labels

You have the option of saving the values and value labels associated with your data file to a S are exported, the generated syntax file contains:

iles in SAS Format

dling is given to various aspects of your data when saved as a SAS file.

and $. These illegal characters are replaced with an underscore

ble label exists in the SPSS data, the variable name is mapped to

single system-missing value in the SAS file.

AS syntax file. For example, when the value labels for the cars.sav data file

libname library 'd:\spss\' ;

proc format library = library ;

value ORIGIN /* Country of Origin */

1 = 'America

2 = 'European'

3 = 'Japanese' ;

value CYLIND

3 = '3 Cylinders'

4 = '4 Cylinders'

5 = '5 Cylinders

6 = '6 Cylinders'

ER /* Number of Cylinders */

Page 78

Chapter 3

8 = '8 Cylinde

value FILTER__ /* cylrec=1|cylrec = 2 (FILTER) */

0 = 'Not Selected'

1 = 'Selected' ;

proc datasets l

modify cars;

format ORIGIN ORIGIN.;

format CYLINDER C

format FILTER__ FILTER__.;

quit;

This feature is not

rs' ;

ibrary = library ;

YLINDER.;

supported for the SAS transport file.

Variable Types

The following table shows the variable type matching between the original data in SPSS and the exp

SPSS Variable Type SAS Variable Type SAS Data Format

Numeric Numeric 12 Comma Numeric 12 Dot Numeric 12 Scientific Notation Numeric 12 Date Numeric (Date) for example,

Date (Time) Numeric Time18 Dollar Numeric 12 Custom Currency Numeric 12 String Character

orted data in SAS:

MMDDYY10, ...

Page 79

Data Files

Saving Data F

E Make the Da

active).

E From the menus choose:

File

Save As...

Select a file type from the drop-down list.

E Enter a filename for the new data file.

To write variable names to the first row of a spreadsheet or tab-delimited data file:

E Click Write variable names to spreadsheet intheSaveDataAsdialogbox.

To save valu

E Click Save

dialog box. To save valu

E Click Save

iles in Other Formats

ta Editor the active window (click anywhere in the window to make it

e labels instead of data values in Excel 97 format:

value labels where defined instead of data values

e labels to a SAS syntax file (active only when a SAS file type is selected):

value labels into a .sas file

intheSaveDataAsdialogbox.

intheSaveDataAs

Saving Data: Data File Types

You can save data in the following formats:

SPSS (*.sav). SPSS format.

 Data files saved in SPSS format cannot be read by versions of the software

prior to version 7.5.

Page 80

Chapter 3



When using data files with variable names longer than eight bytes in SPSS 10.x

or 11.x, uni

que, eight-byte versions of variable names are used—but the original variable names are preserved for use in release 12.0 or later. In releases prior to SPSS 10, the original long variable names are lost if you save the data file.

 When using data files with string variables longer than 255 bytes in versions of

SPSS prior t

o release 13.0, those string variables are broken up into multiple

255-byte string variables.

.sav).

SPSS 7.0 (*

SPSS 7.0 for Windows format. Data files saved in SPSS 7.0 format can be read by SPSS 7.0 and earlier versions of SPSS for Windows but do not include defined multiple response sets or Data Entry for Windows information.

*.sys).

SPSS/PC+ (

SPSS/PC+format.Ifthedatafilecontainsmorethan500 variables, only the first 500 will be saved. For variables with more than one defined user-missing value, additional user-missing values will be recoded into the first defined u

SPSS Portable (*.por). SPSS portable format that can be read by other versions of

ser-missing value.

SPSS and versions on other operating systems (for example, Macintosh or UNIX). Variable

names are limited to eight bytes and are automatically converted to unique

eight-byte names if necessary.

Tab-delimited (*.dat). ASCII text files with values separated by tabs.

Fixed ASC

II (*.dat).

ASCII text file in fixed format, using the default write formats for

all variables. There are no tabs or spaces between variable fields.

Excel 2.1(*.xls). Microsoft Excel 2.1 spreadsheet file. The maximum number of

es is 256, and the maximum number of rows is 16,384.

variabl

Excel 97 and later (*.xls). Microsoft Excel 97/2000/XP spreadsheet file. The maximum

number of variables is 256, and the maximum number of rows is 65,536.

1-2-3 Re

lease 3.0 (*.wk3).

Lotus 1-2-3 spreadsheet file, release 3.0. The maximum

number of variables that you can save is 256.

1-2-3 Release 2.0 (*.wk1). Lotus 1-2-3 spreadsheet file, release 2.0. The maximum

number o

1-2-3 Release 1.0 (*.wks). Lotus 1-2-3 spreadsheet file, release 1A. The maximum

f variables that you can save is 256.

number of variables that you can save is 256.

Page 81

Data Files

SYLK (*.slk). Symbolic link format for Microsoft Excel and Multiplan spreadsheet

files. The maximum number of variables that you can save is 256.

dbf).

dBASE IV (*.

dBASE III (*.dbf). dBASE III format. dBASE II (*.dbf). dBASE II format.

dBASE IV format.

SAS v6 for Wi

SAS v6 for UNIX (*.ssd01). SAS v6 file format for UNIX (Sun, HP, IBM). SAS v6 for Alpha/OSF (*.ssd04). SAS v6 file format for Alpha/OSF (DEC UNIX).

SAS v7+ Windo

ndows (*.sd2).

ws short extension (*.sd7).

SAS v6 file format for Windows/OS2.

filename format.

SAS v7+ Windows long extension (*.sas7bdat). SAS version 7–8 for Windows long

filename fo

SAS v7+ for UNIX (*.ssd01). SAS v8 for UNIX. SAS Transport (*.xpt). SAS transport file.

rmat.

Saving Subsets of Variables

Figure 3-23

Save Data As

Variables dialog box

SAS version 7–8 for Windows short

Page 82

Chapter 3

For data saved as an SPSS data file, the Save Data As Variables dialog box allows you to select the variables that you want to be saved in the new data file. By default, all variables w

Drop All and then select the variables that you want to save.

ill be saved. Deselect the variables that you don’t want to save, or click

ToSaveaSub

E Make the Da

setofVariables

ta Editor the active window (click anywhere in the window to make it

active).

E From the menus choose:

File

Save As...

Select SPSS (*.sav) from the list of file types.

E Click Variables. E Select the variables that you want to save.

Saving File Options

For spreadsheet and tab-delimited files, you can write variable names to the first row of the file.

Protecting Original Data

To prevent the accidental modification/deletion of your original data, you can mark the file as read-only.

E From the Data Editor menus choose:

File

Mark File

Read Only

If you make subsequent modifications to the data and then try to save the data file, you can save the data only with a different filename; so the original data are not affected.

Page 83

You can change the file permissions back to read/write by selecting Mark File Read

from the File menu.

Write

Virtual Active File

The virtual active file enables you to work with large data files without requiring equally large (or larger) amounts of temporary disk space. For most analysis and charting p procedure. Procedures that modify the data require a certain amount of temporary disk space to keep track of the changes, and some actions always require enough disk space for a

Figure 3-24

Temporary disk space requirements

rocedures, the original data source is reread each time you run a different

t least one entire copy of the data file.

Action

Virtual Active

File

GET FILE = 'v1-5.sav'. REGR ESSION … or FREQUENCIES… /SAVE ZPRED.

v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v6 zpre v1 v2 v3 v4 v5 v6 zpre 11 12 13 14 15 11 12 13 14 15 16 1 11 12 13 14 15 16 1 21 22 23 24 25 21 22 23 24 25 26 2 21 22 23 24 25 26 2 31 32 33 34 35 31 32 33 34 35 36 3 31 32 33 34 35 36 3 41 42 43 44 45 41 42 43 44 45 46 4 41 42 43 44 45 46 4 51 52 53 54 55 51 52 53 54 55 56 5 51 52 53 54 55 56 5 61 62 63 64 65 61 62 63 64 65 66 6 61 62 63 64 65 66 6

COMPUTE v6 = … RECODE v4…

Data Files

SORT CASES BY…

CACHE

Data

Stored

Tem porary

Disk

Space

None

v4 v6 zpre v1 v2 v3 v4 v5 v6 zpre 1416 1 111213141516 1 2426 2 212223242526 2 3436 3 313233343536 3 4446 4 414243444546 4 5456 5 515253545556 5 6466 6 616263646566 6

Actions that don’t require any temporary disk space include:

 Reading SPSS data files  Merging two or more SPSS data files  Reading database tables with the Database Wizard  Merging an SPSS data file with a database table  Running procedures that read data (for example, Frequencies, Crosstabs, Explore)

Page 84

Chapter 3

Actions that create one or more columns of data in temporary disk space include:

 Computing new variables  Recoding existing variables  Running procedures that create or modify variables (for example, saving

predicted va

lues in Linear Regression)

Actions that

 Reading Exc  Running pro  Reading dat  Using the Ca  Launching o

create an entire copy of the data file in temporary disk space include:

el files

cedures that sort data (for example, Sort Cases, Split File)

GET TRANSLATE or DATA LIST commands

awith

che Data facility or the

CACHE command

ther applications from SPSS that read the data file (for example,

AnswerTree, DecisionTime)

command provides functionality comparable to DATA LIST

Note:TheGET

DATA

without creating an entire copy of the data file in temporary disk space. The SPLIT

command in command syntax does not sort the data file and therefore does

FILE

not create a

copy of the data file. This command, however, requires sorted data for proper operation, and the dialog box interface for this procedure will automatically sort the data file, resulting in a complete copy of the data file. (Command syntax is not availa

Actions th

 Reading d  Reading t

ble with the Student Version.)

at create an entire copy of the data file by default

atabases with the Database Wizard

ext files with the Text Wizard

The Text Wizard provide an optional setting to automatically cache the data. By default, this option is selected. You can turn it off by deselecting For the Dat

CACHE command.

the

abase Wizard you can paste the generated command syntax and delete

Cache data locally.

Page 85

Creating a Data Cache

Although the virtual active file can vastly reduce the amount of temporary disk space required, the absence of a temporary copy of the “active” file means that the original data source external source, creating a temporary copy of the data may improve performance. For example, for data tables read from a database source, the SQL query that reads the informati needs to read the data. Since virtually all statistical analysis procedures and charting procedures need to read the data, the SQL query is reexecuted for each procedure you run, wh large number of procedures.

If you have sufficient disk space on the computer performing the analysis (either your loca improve processing time by creating a data cache of the active file. The data cache is a temporary copy of the complete data.

has to be reread for each procedure. For large data files read from an

on from the database must be reexecuted for any command or procedure that

ich can result in a significant increase in processing time if you run a

l computer or a remote server), you can eliminate multiple SQL queries and

Data Files

Note:Byd use the

efault, the Database Wizard automatically creates a data cache, but if you

GET DATA command in command syntax to read a database, a data cache

is not automatically created. (Command syntax is not available with the Student Version

To Create a Data Cache

E From the menus choose:

File

Cache Data...

Click OK or Cache N ow.

OK creates a data cache the next time the program reads the data (for example, the

me you run a statistical procedure), which is usually what you want because

next ti it doesn’t require an extra data pass.

Cache Now creates a data cache immediately,

Page 86

Chapter 3

which shouldn’t be necessary under most circumstances. Cache Now is useful primarily for two reasons:

 A data source is “locked” and can’t be updated by anyone until you end your

session, open a different data source, or cache the data.

 For large data sources, scrolling through the contents of the Data View tab in the

Data Editor

To Cache Data Automatically

You can use the SET command to automatically create a data cache after a specified number of changes in the active data file. By default, the active data file is automatic

ally cached after 20 changes in the active data file.

will be much faster if you cache the data.

E From the m

File

New

Syntax

In the syntax window, type SET CACHE n. (where n represents the number of

enus choose:

changes in the active data file before the data file is cached).

E From the menus in the syntax window choose:

Run

All

Note: The cache setting is not persistent across sessions. Each time you start a new session, the value is reset to the default of 20.

Page 87

Chapter

Distributed

Distributed analysis mode allows you to use a computer other than your local (or desktop) co distributed analysis are typically more powerful and faster than your local computer, appropriate use of distributed analysis mode can significantly reduce computer processin work involves:

 Large data files, particularly data read from database sources.  Memory-intensive tasks. Any task that takes a long time in local analysis mode

mightbeag

Distributed analysis affects only data-related tasks, such as reading data, transforming data, computing new variables, and calculating statistics. It has no effect on tasks related t

Note: Distributed analysis is available only if you have both a local version and access to a licensed server version of the software installed on a remote server.

mputer for memory-intensive work. Since remote servers used for

g time. Distributed analysis with a remote server can be useful if your

o editing output, such as manipulating pivot tables or modifying charts.

Analysis Mode

ood candidate for distributed analysis.

Distributed versus Local Analysis

Following are some guidelines for choosing distributed or local analysis mode:

Database access. Jobs that perform database queries may run faster in distributed

mode if t the same machine as the database engine. If the necessary database access software is available only on the server or if your network administrator does not permit youtodo distributed mode.

he server has superior access to the database or if the server is running on

wnload large data tables, you will be able to access the database only in

Page 88

Chapter 4

Ratio of computatio n to output. Commands that perform a lot of computation and

produce small output results (for example, few and small pivot tables, brief text results, or

few and simple charts) have the most to gain from running in distributed mode. The degree of improvement depends largely on the computing power of theremoteserver.

Small jobs.

distributed mode because of inherent client/server overhead.

Charts. Case-oriented charts, such as scatterplots, regression residual plots, and

sequence c database tables, this can result in slower performance in distributed mode because the data have to be sent from the remote server to your local computer. Other charts are based on s aggregation is performed on the server.

Interactive graphics. Sinceitispossibletosaverawdatawithinteractivegraphics

(an optio the remote server to your local computer, significantly increasing the time it takes to save your results.

Pivot ta

particularly true for the OLAP Cubes procedure and tables that contain individual case data, such as those available in the Summarize procedure.

Text out

because this text is produced on the remote server and copied to your local computer for display. Text results have low overhead, however, and tend to transmit quickly.

Server Login

Jobs that run quickly in local mode will almost always run slower in

harts, require raw data on your local computer. For large data files or

ummarized or aggregated data and should perform adequately because the

nal setting), this can result in large amounts of data being transferred from

bles.

Large pivot tables may take longer to create in distributed mode. This is

put.

The more text that is produced, the slower it will be in distributed mode

The Server Login dialog box allows you to select the computer that processes commands and runs procedures. This can be either your local computer or a remote

server

Page 89

Distributed An

Figure 4-1

ServerLogindialogbox

alysis Mode

You can add, modify, or delete remote servers in the list. Remote servers usually require a user ID and a password, and a domain name may also be necessary. Contact your system administrator for information about available servers, a user ID and password, domain names, and other connection information.

You can select a default server and save the user ID, domain name, and password associated with any server. You are automatically connected to the default server when you start a new session.

Adding and Editing Server Login Settings

Use the Server Login Settings dialog box to add or edit connection information for remote servers for use in distributed analysis mode.

Page 90

Chapter 4

Figure 4-2

Server Login Settings dialog box

Contact your system administrator for a list of available servers, port numbers for the serve

rs, and additional connection information. Do not use the Secure Socket

Layer unless instructed to do so by your administrator.

Server Name. A server “name” can be an alphanumeric name assigned to a computer

(for exa

mple, hqdev001) or a unique IP address assigned to a computer (for example,

202.123.456.78).

Port Number. The port number is the port that the server software uses for

communi

Description. Enter an optional description to display in the servers list. Connect with Secure Socket Layer. Secure Socket Layer (SSL) encrypts requests

for dist

cations.

ributed analysis when they are sent to the remote SPSS server . Before you use SSL, check with your administrator. SSL must be configured on your desktop computer and the server for this option to be enabled.

Page 91

Distributed An

To Select, Switch, or Add Servers

E From the menus choose:

File

Switch Server...

To select a default server:

E In the server list, select the box next to the server that you want to use. E Enter the user ID, domain name, and password provided by your administrator.

Note: You are automatically connected to the default server when you start a new session.

To switch to

E Select the E Enter your

another server:

server from the list.

user ID, domain name, and password (if necessary).

Note: When you switch servers during a session, all open windows are closed. You will be prompted to save changes before the windows are closed.

alysis Mode

To add a server:

E Get the server connection information from your administrator. E Click Add to open the Server Login Settings dialog box. E Enter the connection information and optional settings and click OK.

To edit a server:

E Get the revised connection information from your administrator. E Click Edit to open the Server Login Settings dialog box. E Enter the changes and click OK.

Page 92

Chapter 4

Opening Data Files from a Remote Server

Figure 4-3

Open Remote F

ile dialog box

In distributed analysis mode, the Open Remote File dialog box replaces the standard Open File dialog box.

 The list of available files, folders, and drives is dependent on what is available

on or from the remote server. The current server name is indicated at the top of the dialog box.

 You will not have access to files on your local computer in distributed analysis

mode unless you specify the drive as a shared device or the folders containing your data files as shared folders.

 If the server is running a different operating system (for example, you are running

Windows and the server is running UNIX), you probably won’t have access to local data files in distributed analysis mode even if they are in shared folders.

Only one data file can be open at a time. The current data file is automatically closed when a new data file is opened. If you want to have multiple data files open at the same time, you can start multiple sessions.

To Open Data Files from a Remote Server

E If you aren’t already connected to the remote server, log in to the remote server.

Page 93

Depending on the type of data file that you want to open, from the menus choose:

File

Open

Data...

File

Open Database

File

Read Text Dat

a...

Saving Data Files from a Remote Server

Figure 4-4

Save Remote File dialog box

Distributed An

alysis Mode

Indistributedanalysismode,theSaveRemoteFiledialogboxreplacesthestandard Save File dialog box.

The list of available folders and drives is dependent on what is available on or from theremoteserver.Thecurrentservernameis indicated at the top of the dialog box. You will not have access to folders on your local computer unless you specify the drive as a shared device and the folders as shared folders. If the server is running a different operating system (for example, you are running Windows and the server is running UNIX), you probably will not have access to local data files in distributed

Page 94

Chapter 4

analysis mode even if they are in shared folders. Permissions for shared folders must include the ability to write to the folder if you want to save data files in a local folder.

To Save Dat a

E Make the Da E From the me

File

Save (or Save As...)

Data File Ac

The view of d and the network is based on the computer you are currently using to process commands and run procedures—which is not necessarily the computer in front of you.

Local anal

data files, folders, and drives that you see in the file access dialog box (for opening data files) is similar to what you see in other applications or in the Windows Explorer. You can se folders on mounted network drives that you normally see.

Distributed an alysis mode. When you use another computer as a “remote server” to

run comma the view from the perspective of the remote server computer. Although you may see familiar folder names such as Program Files anddrivessuchasC,theseare not the f remote server.

Files from a Remote Server

ta Editor the active window. nus choose:

cess in Local and Distributed Analys is Mode

ata files, folders (directories), and drives for both your local computer

ysis mode.

e all of the data files and folders on your computer and any files and

nds and procedures, the view of data files, folders, and drives represents

olders and drives on your computer; they are the folders and drives on the

When you use your local computer as your “server,” the view of

Page 95

Figure 4-5

Local and remote views

Local View

Remote View

Distributed An

alysis Mode

tributed analysis mode, you will not have access to data files on your local

In dis computer unless you specify the drive as a shared device or the folders containing your data files as shared folders. If the server is running a different operating system

xample, if you are running Windows and the server is running UNIX), you

(for e probably won’t have access to local data files in distributed analysis mode even iftheyareinsharedfolders.

ributed analysis mode is not the same as accessing data files that reside on

Dist another computer on your network. You can access data files on other network devices in local analysis mode or in distributed analysis mode. In local mode, you

ss other devices from your local computer. In distributed mode, you access other

acce network devices from the remote server.

Page 96

Chapter 4

If you’re not sure if you’re using local analysis mode or distributed analysis mode, look at the title bar in the dialog box for accessing data files. If the title of the dialog b

Remote Server: [server name] appears at the top of the dialog box, you’re using

ox contains the word Remote (as in

Open Remote File), or if the text

distributed analysis mode. Note:Thisa

ffects only dialog boxes for accessing data files (for example, Open Data, Save Data, Open Database, and Apply Data Dictionary). For all other file types (for example, Viewer files, syntax files, and script files), the local view is always used.

To Set Shar

E In My Comp E On the Fil E Click the

ing Permissions for a Drive or Folder

, click the folder (directory) or drive that you want to share.

uter

e menu, click

Sharing tab,andthenclickShared As.

Properties.

For more information about sharing drives and folders, see the Help for your operating system.

Availability of Procedures in Distributed Analysis Mode

In distributed analysis mode, only procedures installed on both your local version andtheversionontheremoteserverare available. You cannot use procedures installe cannot use procedures installed on your local version that are not also installed on theremoteserver.

components installed locally that are not available on the remote server. If this is the case, switching from your local computer to a remote server will result in the removal of the af will result in errors. Switching back to local mode will restore all affected procedures.

d on the server that are not also installed on your local version, and you

While th

e latter situation may be unlikely, it is possible that you may have optional

fected procedures from the menus, and the corresponding command syntax

Page 97

Using UNC Path Specifications

With the Windows NT server version of SPSS, relative path specifications for data files are relative to the current server in distributed analysis mode, not relative to your local c c:\mydocs\mydata.sav does not point to a directory and file on your C drive; it points to a directory and file on the remote server’s hard drive. If the directory and/or file do not exist o

GET FILE='c:\mydocs\mydata.sav'.

If you are using the Windows NT server version of SPSS, you can use universal naming con syntax. The general form of a UNC specification is:

\\servername\sharename\path\filename

 Servername is the name of the computer that contains the data file.  Sharename is the folder (directory) on that computer that is designated as a

shared folder.

 Path is any additional folder (subdirectory) path below the shared folder.  Filename is the name of the data file.

omputer. In practical terms, this means that a path specification such as

n the remote server, this will result in an error in command syntax, as in:

vention (UNC) specifications when accessing data files with command

Distributed An

alysis Mode

For example:

GET FILE='

If the comp

GET FILE='\\204.125.125.53\public\july\sales.sav'.

\\hqdev001\public\july\sales.sav'.

uter does not have a name assigned to it, you can use its IP address, as in:

Even with UNC path specifications, you can access data files only from devices and folders de

signated as shared. When you use distributed analysis mode, this includes

data files on your local computer.

UNIX servers. On UNIX platforms, there is no equivalent of the UNC path, and all

director

y paths must be absolute paths that start at the root of the server; relative paths are not allowed. For example, if the data file is located in /bin/spss/data and the current directory is also /bin/spss/data, you must s

GET FILE='/bin/data/spss/sales.sav'.

pecify the entire path, as in:

GET FILE='sales.sav' is not valid;

Page 98

Page 99

Data Editor

The Data Editor provides a convenient, spreadsheet-like method for creating and editing dat session.

The Data Editor provides two views of your data:

 Data view. Displays the actual data values or defined value labels.  Variable view. Displays variable definition information, including defined variable

a files. The Data Editor window opens automatically when you start a

and value l level (nominal, ordinal, or scale), and user-defined missing values.

abels, data type (for example, string, date, and numeric), measurement

Chapter

In both vie

Data View

Figure 5-1

Data view

ws, you can add, change, and delete information contained in the data file.

Page 100

Chapter 5

Many of the features of the Data view are similar to those found in spreadsheet applications. There are, however, several important distinctions:

 Rows are cases. Each row represents a case or an observation. For example, each

individual respondent to a questionnaire is a case.

 Columns are variables. Each column represents a variable or characteristic being

measured. F

 Cells cont

or example, each item on a questionnaire is a variable.

ain values. Each cell contains a single value of a variable for a case. The cell is the intersection of the case and the variable. Cells contain only data values. Unlike spreadsheet programs, cells in the Data Editor cannot contain formulas.

 The data file is rectangular . The dimensions of the data file are determined by

the number o

f cases and variables. You can enter data in any cell. If you enter data in a cell outside the boundaries of the defined data file, the data rectangle is extended to include any rows and/or columns between that cell and the file boundarie

s. There are no “empty” cells within the boundaries of the data file. For numeric variables, blank cells are converted to the system-missing value. For string variables, a blank is considered a valid value.

Variable

View

Figure 5-2

SPSS 13 BASE BASE USERS GUIDE

Specifications and Main Features

Frequently Asked Questions

User Manual

Overview

What’s New in SPSS 13.0?

Windows

Menus

Status Bar

Dialog Box Controls

Subdialog Boxes

Selecting Variables

Getting Information about Dialog Box Controls

Basic Steps in Data Analysis

Statistics Coach

Getting H elp

Using the Help Table of Contents

Getting Help on Output Terms

UsingCaseStudies

Data Files

Opening a Data File

How the Data Editor Reads Older Excel Files and Other Spreadsheets

How the Data Editor Reads dBASE Files

Selecting a D ata Source

Database Login

Selecting Data Fields

Aggregating Data

Defining Variables

Sorting Cases

Results

Text Wizard

Saving Data Files

To Save Modified Data Files

Saving Data: Data File Types

Saving Subsets of Variables

Saving File Options

Protecting Original Data

Virtual Active File

Distributed versus Local Analysis

Data Editor

Data View

Variable view