For more information about SPSS®software products, please visit our Web site at http://www.spss.com or contact
SPSS Inc.
233 South Wacker Drive, 11th Floor
Chicago, IL 60606-6412
Tel: (312) 651
Fax: (312) 651-3668
SPSS is a registered trademark and t he other product names are the t rademarks of SPSS Inc. for its proprietary computer
software. No m
the trademark and license rights in the software and the copyrights in the published materials.
The SOFTWARE and documentation are provided with RESTRICTED R IGHTS. Use, duplication , or disclosure by the
Governmentis
clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker Drive, 11th Floor, Chicago, IL 60606-6412.
General notice: Other product names mentioned herein are used for identification purposes only and may be trademarks of
their respec
Sun Java Runtime libraries include code licensed from RSA Security, Inc. Some portions of the libraries are
04 by SPSS Inc.
ublication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means,
353-3
Page 3
Preface
SPSS 13.0 is a comprehensive system for analyzing data. The Complex Samples
optional a
manual. The Complex Samples add-on module must be used with the SPSS 13.0
Basesystemandiscompletelyintegratedintothatsystem.
Installation
To install the Complex Samples add-on module, run the License Authorization
Wizard using the authorization code that you received from SPSS Inc. For more
informat
Compatibility
SPSS is designed to run on many computer systems. See the installation instructions
that cam
requirements.
dd-on module provides the additional analytic techniques described in this
ion, see the installation instructions supplied with the SPSS Base system.
e with your system for specific information on minimum and recommended
Serial N
Your ser
this serial number when you contact SPSS Inc. for information regarding support,
payment, or an upgraded system. The serial number was provided with your Base
system
Customer Service
If you have any questions concerning your shipment or account, contact your local
office
your serial number ready for identification.
umbers
ial number is your identification number with SPSS Inc. You will need
.
, listed on the SPSS Web site at http://www.spss.com/worldwide. Please have
iii
Page 4
Training Seminars
SPSS Inc. provides both public and onsite training seminars. All seminars feature
hands-on workshops. Seminars will be offered in major cities on a regular basis. For
more informa
tion on these seminars, contact your local office, listed on the SPSS
Web site at http://www.spss.com/worldwide.
Technical S
The service
upport
s of SPSS Technical Support are available to registered customers.
Customers may contact Technical Support for assistance in using SPSS or for
installation help for one of the supported hardware environments. To reach Technical
Support, s
ee the SPSS Web site at http://www.spss.com, or contact your local office,
listedontheSPSSWebsiteathttp://www.spss.com/worldwide.Bepreparedto
identify yourself, your organization, and the serial number of your system.
Additional Publications
Additional copies of SPSS product manuals may be purchased directly from SPSS
Inc. Visit the SPSS Web Store at http://www.spss.com/estore, or contact your local
SPSS offi
ce, listed on the SPSS Web site at http://www.spss.com/worldwide.For
telephone orders in the United States and Canada, call SPSS Inc. at 800-543-2185.
For telephone orders outside of North America, contact your local office, listed
on the SP
SS Web site.
The SPSS Statistical Procedures Companion, by Marija Norusis, has been
published by Prentice Hall. A new version of this book, updated for SPSS 13.0, is
planned
.TheSPSS Advanced Statistical Procedures Companion, also based on SPSS
13.0, is forthcoming. The SPSS Guide to Data Analysis forSPSS13.0isalsoin
development. Announcements of publications available exclusively through Prentice
Hall wi
your home country, and then click
ll be available on the SPSS Web site at http://www.spss.com/estore (select
Books).
Tel l Us
Your co
Your Thoughts
mments are important. Please let us know about your experiences with SPSS
products. We especially like to hear about new and interesting applications using
the SPSS system. Please send e-mail to suggest@spss.com or write to SPSS Inc.,
iv
Page 5
Attn.: Director of Product Planning, 233 South Wacker Drive, 11th Floor, Chicago,
IL 60606-6412.
About This Manual
This manual documents the graphical user interface for the procedures included in
the Complex Samples add-on module. Illustrations of dialog boxes are taken from
SPSS for Win
dows. Dialog boxes in other operating systems are similar. Detailed
information about the command syntax for features in this module is provided in the
SPSS Command Syntax Reference, available from the Help menu.
Contacting SPSS
If you would like to be on our mailing list, contact one of our offices, listed on our
Web site at http://www.spss.com/worldwide.
ex Samples Logistic Regression to Assess Credit Risk . . . . . . 181
Analysis..................................... 181
elEffects.................................... 188
y193
Index195
xiii
Page 14
Page 15
Part 1: User's Guide
Page 16
Page 17
Chapter
1
Introductio
ntoSPSSComplex
Samples Procedures
An inherent assumption of analytical procedures in traditional software packages
isthattheobservationsinadatafilerepresentasimplerandomsamplefromthe
population of interest. This assumption is untenable for an increasing number of
companies and researchers who find it both cost-effective and convenient to obtain
samples in a more structured way.
The SPSS Complex Samples option allows you to select a sample according to
a complex design and incorporate the design specifications into the data analysis,
thus ensuring that your results are valid.
Properties of Complex Samples
A complex sample can differ from a simple random sample in many ways. In a
simple random sample, individual sampling units are selected at random with e qual
probability and without replacement (WOR) directly from the entire population. By
contrast, a given complex sample can have some or all of the following features:
Stratification. Stratified sampling involves selecting samples independently within
non-overlapping subgroups of the population, or strata. For example, strata may
be socioeconomic groups, job categories, age groups, or ethnic groups. With
stratification, you can ensure adequate sample sizes for subgroups of interest,
improve the precision of overall estimates, and use different sampling methods from
stratum to stratum.
Clustering. Cluster sampling involves the selection of groups of sampling units, or
clusters. For example, clusters may be schools, hospitals, or geographical areas,
and sampling units may be students, patients, or citizens. Clustering is common in
multistage designs and area (geographic) samples.
1
Page 18
2
Chapter 1
Multiple stages. In multistage sampling, you select a first-stage sample based on
clusters. Then you create a second-stage sample by drawing subsamples from the
selected cl
usters. If the second-stage sample is based on subclusters, you can then add
a third stage to the sample. For example, in the first stage of a survey, a sample of
cities could be drawn. Then, from the selected cities, households could be sampled.
Finally, f
rom the selected households, individuals could be polled. The Sampling and
Analysis Preparation wizards allow you to specify three stages in a design.
Nonrandom sampling. When selection at random is difficult to obtain, units can be
sampled sy
Unequal selection probabilities. When sampling clusters that contain unequal numbers
stematically (at a fixed interval) or sequentially.
of units, you can use probability-proportional-to-size (PPS) sampling to make a
cluster’
s selection probability equal to the proportion of units it contains. PPS
sampling can also use more general weighting schemes to select units.
Unrestricted sampling. Unrestricted sampling selects units with replacement (WR).
Thus, an i
Sampling weights. Sampling weights are automatically computed while drawing a
ndividual unit can be selected for the sample more than once.
complex sample and ideally correspond to the “frequency” that each sampling unit
represe
nts in the target population. Therefore, the sum of the weights over the sample
should estimate the population size. Complex Samples analysis procedures require
sampling weights in order to properly analyze a complex sample. Note that these
weights
should be used entirely within the Complex Samples option and should not
be used with other analytical procedures via the Weight Cases procedure, which treats
weightsascasereplications.
Usage of Complex Samples Procedures
Your usage of Complex Samples procedures depends on your particular needs. The
primary types of users are those who:
Plan and carry out surveys according to complex designs, possibly analyzing the
sample later. The primary tool for surveyors is the Sampling Wizard.
Analyze sample data files previously obtained according to complex designs.
e using the Complex Samples analysis procedures, you may need to use the
Befor
Analysis Preparation Wizard.
Page 19
3
Plan Files
Introduction t
Regardlessofwhichtypeofuseryouare,youneedtosupplydesigninformationto
Complex Samples procedures. This information is stored in a plan file for easy reuse.
A plan file contains complex sample specifications. There are two types of plan files:
Sampling plan. The specifications given in the Sampling Wizard define a sample
design that
specifications. The sampling plan file also contains a default analysis plan that uses
estimation methods suitable for the specified sample design.
Analysis p
analysis procedures to properly compute variance estimates for a complex sample.
The plan includes the sample structure, estimation methods for each stage, and
reference
Wizard allows you to create and edit analysis plans.
There are several advantages to saving your specifications in a plan file, including:
A surveyor can specify the first stage of a multistage sampling plan and draw
first-st
and then modify the sampling plan to include the second stage.
analysis plan and refer to that plan from each Complex Samples analysis
procedu
Adesig
which simplifies the instructions for analysts and avoids the need for each analyst
to specify his or her own analysis plans.
is used to draw a complex sample. The sampling plan file contains those
lan.
This plan file contains information needed by Complex Samples
s to required variables, such as sample weights. The Analysis Preparation
age units now, collect information on sampling units for the second stage,
re.
ner of large-scale public use samples can publish the sampling plan file,
o SPSS Complex Samples Procedures
Further
Readings
For more
Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley and Sons.
Kish,L.1965.Survey Sampling. New York: John Wiley and Sons.
Kish, L.
information on sampling techniques, see the following texts:
1987. Statistical Design for Research. New York: John Wiley and Sons.
Page 20
4
Chapter 1
Murthy, M. N. 1967. Sampling Theory and Methods. Calcutta, India: Statistical
Publishing Society.
Särndal, C.
, B. Swensson, and J. Wretman. 1992. Model Assisted Survey Sampling.
New York: Springer-Verlag.
Page 21
Chapter
2
Sampling fro
Figure 2-1
Sampling W iz ard, Welcome step
maComplexDesign
The Sampling Wizard guides you through the steps for creating, modifying, or
uting a sampling plan file. Before using the Wizard, you should have a
exec
well-defined target population, a list of sampling units, and an appropriate sample
design in mind.
5
Page 22
6
Chapter 2
Creating a Ne
E From the me
Analyze
Complex Samples
Select a Sample...
Select Design a sample and choose a plan filename to save the sample plan.
E
E Click Next to continue through the Wizard.
E Optionally, i n the Define Variables step, you can define strata, clusters, and input
sample weights. After you define these, click Next.
E Optionally,in the Sampling Method step, you can choose a method for selecting items.
If you select
Otherwise, click Next and then:
E In the Sample Size step, specify the number or proportion of units to sample.
You can now click
Choose output variables to save.
Add a second or third stage to the design.
Set various selection options, including which stages to draw samples from, the
random number seed, and whether to treat user-missing values as valid values of
design var
Choose wh
Paste your
wSamplePlan
nus choose:
PPS Brewer or PPS Murthy, you can click Finish to draw the sample.
Finish to draw the sample. Optionally, in further steps, you can:
iables.
eretosaveoutputdata.
selections as command syntax.
Page 23
7
Sampling W iz
Figure 2-2
Sampling Wizard, Design Variables step
ard: Design Variables
Sampling from a
Complex Design
tep allows you to select stratification and clustering variablesand to define input
This s
sample weights. You can also specify a label for the stage.
Stratify By. The cross-classification of stratification variables defines distinct
pulations, or strata. Separate samples are obtained for each stratum. To improve
subpo
the precision of your estimates, units within strata should be as homogeneous as
possible for the characteristics of interest.
ters.
Clus
are useful when directly sampling observational units from the population is
expensive or impossible; instead, you can sample clusters from the population and
then
clusters can introduce correlations among sampling units, resulting in a loss of
Cluster variables define groups of observational units, or clusters. Clusters
sample observational units from the selected clusters. However, the use of
Page 24
8
Chapter 2
precision. To minimize this effect, units within clusters should be as heterogeneous
as possible for the characteristics of interest. You must define at least one cluster
variable in
order to plan a multistage design. Clusters are also necessary in the use of
several different sampling methods. For more information, see “Sampling Wizard:
Sampling Method” on p. 9.
Input Sampl
e Weight.
If the current sample design is part of a larger sample design,
you may have sample weights from a previous stage of the larger design. You
can specify a numeric variable containing these weights in the first stage of the
current de
sign. Sample weights are computed automatically for subsequent stages
of the current design.
Stage Label. You can specify an optional string label for each stage. This is used in the
output to
help identify stagewise information.
Note: The source variable list has the same content across steps of the Wizard. In other
words, variables removed from the source list in a particular step are removed from
the list i
nallsteps.Variablesreturnedtothesource list appear in the list in all steps.
Tree Controls for Navigating the Sampling Wizard
On the left side of each step in the Sampling Wizard is an outline of all the steps. You
can navi
Steps are enabled as long as all previous steps are valid—that is, if each previous step
has been given the minimum required specifications for that step. See the Help for
individ
ual steps for more information on why a given step may be invalid.
Page 25
9
Sampling W iz
Figure 2-3
Sampling W izard, Method step
ard: Sampling Method
Sampling from a
Complex Design
tep allows you to specify how to select cases from the working data file.
This s
Method. Controls in this group are used to choose a selection method. Some sampling
types allow you to choose whether to sample with replacement (WR) or without
cement (WOR). See the type descriptions for more information. Note that some
repla
probability-proportional-to-size (PPS) types are available only when clusters have
been defined and that all PPS types are available only in the first stage of a design.
over, WR methods are available only in the last stage of a design.
More
Sim
ple Random Sampling.
selected with or without replacement.
Units are selected with equal probability. They can be
Page 26
10
Chapter 2
Simple Systematic. Units are selected at a fixed interval throughout the sampling
frame (or st
rata,iftheyhavebeenspecified)and extracted without replacement.
A randomly selected unit w ithin the first interval is chosen as the starting point.
Simple Sequential. Units are selected sequentially with equal probability and
without replacement.
PPS. This is a first-stage method that selects units at random with probability
proportion
al to size. Any units can be selected with replacement; only clusters
can be sampled without replacement.
PPS Systematic. This is a first-stage method that systematically selects units with
probability proportional to size. They are selected without replacement.
PPS Sequential. This is a first-stage method that sequentially selects units with
probabili
PPS Brewe
ty proportional to cluster size and without replacement.
r.
This is a first-stage method that selects two clusters from each
stratum with probability proportional to cluster size and without replacement. A
cluster variable must be specified to use this method.
PPS Murthy. This is a first-stage method that selects two clusters from each
stratum wi
th probability proportional to cluster size and without replacement. A
cluster variable must be specified to use this method.
PPS Sampford. This is a first-stage method that selects more than two clusters
from each stratum with probability proportional to cluster size and without
replacem
ent. It is an extension of Brewer’s method. A cluster variable must be
specified to use this method.
Use WR estimation for analysis. By default, an estimation method is specified in
the plan file that is consistent with the selected sampling method. This allows you
to use wi
th-replacement estimation even if the sampling method implies WOR
estimation. This option is available only in stage 1.
Measure
of Size (MOS).
If a PPS method is selected, you must specify a measure of
size that defines the size of each unit. These sizes can be explicitly defined in a
variable or they can be computed from the data. Optionally, you can set lower and
upper b
computed from the data. These options are available only in stage 1.
Page 27
11
Sampling Wiz
Figure 2-4
Sampling Wizard, Sample Size step
ard: Sample Size
Sampling from a
Complex Design
tep allows you to specify the number or proportion of units to sample within
This s
the current stage. The sample size can be fixed or it can vary across strata. For the
purpose of specifying sample size, clusters chosen in previous stages can be used to
ne strata.
defi
Units. You can specify an exact sample size or a proportion of units to sample.
Value. A single value is applied to all strata. If Counts is selected as the unit
ic, you should enter a positive integer. If
metr
Proportions is selected, you should
enter a non-negative value. Unless sampling with replacement, proportion values
should also be no greater than 1.
Page 28
12
Chapter 2
Unequal values for strata. Allows you to enter size values on a per-stratum basis
via the Defi
Read values
ne Unequal Sizes dialog box.
from variable.
Allows you to select a numeric variable that contains
size values for strata.
is selected, you have the option to set lower and upper bounds on
If
Proporti
ons
the number of units sampled.
Define Uneq
Figure 2-5
Define Unequal Sizes dialog box
The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum basis.
Size Specifications grid. The grid displays the cross-classifications of up to five strata
or cluster variables—one stratum/cluster combination per row. Eligible grid variables
include all stratification variables from the current and previous stages and all cluster
variables from previous stages. Variables can be reordered within the grid or moved
to the Exclude list. Enter sizes in the rightmost column. Click
to toggle the display of value labels and data values for stratification and cluster
variables in the grid cells. Cells that contain unlabeled values always show values.
ual Sizes
Labels or Values
Page 29
13
Click Refresh Strata to repopulate the grid with each combination of labeled data
values for variables in the grid.
Exclude. To
more variables to the Exclude list. These variables are not used to define sample sizes.
Sampling Wi
Figure 2-6
Sampling Wizard, Output Variables step
Sampling from a
specify sizes for a subset of stratum/cluster combinations, move one or
Complex Design
zard: Output Variables
tep allows you to choose variables to save when the sample is drawn.
This s
Population size. The estimated number of units in the population for a given stage.
The root name for the saved variable is PopulationSize_.
e proportio n.
Sampl
variable is SamplingRate_.
The sampling rate at a given stage. The root name for the saved
Page 30
14
Chapter 2
Sample size. The number of units drawn at a given stage. The root name for t he
saved variable is SampleSize_.
ht.
Sample weig
The inverse of the inclusion probabilities. The root name for the
saved variable is SampleWeight_.
Some stagewise variables are generated automatically. These include:
Inclusion probabilities. The proportion of units drawn at a given stage. The root name
for the save
Cumulative weight. The cumulative sample weight over stages previous to
dvariableisInclusionProbability_.
and including the current one. The root name for the saved variable is
SampleWei
Index. Identifies units selected multiple times within a given stage. The root name for
ghtCumulative_.
the saved variable is Index_.
Note:Save
d variable root names include an integer suffix that reflects the stage
number—for example, PopulationSize_1_ for the saved population size for stage 1.
Page 31
15
Sampling Wiz
Figure 2-7
Sampling Wizard, Plan Summary step
ard: Plan Summary
Sampling from a
Complex Design
s the last step within each stage, providing a summary of the sample design
This i
specifications through the current stage. From here, you can either proceed to the
next stage (creating it, if necessary) or set options for drawing the sample.
tep allows you to choose whether to draw a sample. You can also control other
This s
sampling options, such as the random seed and missing-value handling.
Draw sample. In addition to choosing whether to draw a sample, you can also choose
cute part of the sampling design. Stages must be drawn in order—that is,
to exe
stage 2 cannot be drawn unless stage 1 is also drawn. When editing or executing a
plan, you cannot resample locked stages.
.
This allows you to choose a seed value for random number generation.
Seed
Include user-missing values. This determines whether user-missing values are valid. If
so, user-missing values are treated as a separate category.
already sorted.
Data
variables, this option allows you to speed the selection process.
If your sample frame is presorted by the values of the stratification
Page 33
17
Sampling Wiz
Figure 2-9
Sampling Wizard, Draw Sample, Output Files step
Sampling from a
ard: Draw Sample Output Files
Complex Design
tep allows you to choose where to direct sampled cases, weight variables, joint
This s
probabilities, and case selection rules.
Sample da ta. These options let you determine where sample output is written. It can
ed to the working data file or saved to an external file. If an external file is
be add
specified, the sampling output variables and variables in the working data file for
the selected cases are saved to the file.
t probabilities.
Join
written. Joint probabilities are produced if the PPS WOR, PPS Brewer, PPS
Sampford, or PPS Murthy method is selected and WR estimation is not specified.
These options let you determine where joint probabilities are
Page 34
18
Chapter 2
Case selection rules. If you are constructing your sample one stage at a time, you may
want to save the case selection rules to a text file. They are useful for constructing the
subframe fo
r subsequent stages.
Sampling Wizard: Finish
Figure 2-10
Sampling Wizard, Finish step
This is the final step. You can save the plan file and draw the sample now or paste
selections into a syntax window.
your
When making changes to stages in the existing plan file, you can save the edited
plan to a new file or overwrite the existing file. When adding stages without making
ges to existing stages, the Wizard automatically overwrites the existing plan file.
chan
Ifyouwanttosavetheplantoanewfile,select
Wizard into a syntax window
and change the filename in the syntax commands.
Paste the syntax generated by the
Page 35
19
Modifying an
E From the me
Analyze
Complex Samples
Select a Sample...
Select Edit a sample design and choose a plan file to edit.
E
E Click Next to continue through the Wizard.
E Review the sampling plan in the Plan Summary step, and then click Next.
Subsequent s
steps for more information.
E Navigate to the Finish step, and specify a new name for the edited plan file or choose
to overwrite the existing plan file.
Optionally, you can:
Specify stages that have already been sampled.
Remove stages from the plan.
Sampling from a
Complex Design
Existing Sample Plan
nus choose:
teps are largely the same as for a new design. See the Help for individual
Page 36
20
Chapter 2
Sampling Wiz
Figure 2-11
Sampling Wizard, Plan Summary step
ard: Plan Summary
tep allows you to review the sampling plan and indicate stages that have already
This s
been sampled. If editing a plan, you can also remove stages from the plan.
Previously sampled stages. If an extended sampling frame is not available, you will
o execute a multistage sampling design one stage at a time. Select which stages
have t
have already been sampled from the drop-down list. Any stages that have been
executed are locked; they are not available in the Draw Sample Selection Options
, and they cannot be altered when editing a plan.
step
Remove stages. You can remove stages 2 and 3 from a multistage design.
Page 37
21
Complex Design
RunninganEx
E From the me
Analyze
Complex Samples
Select a Sample...
Select Draw a sample and choose a plan file to run.
E
E Click Next to continue through the Wizard.
E Review the sampling plan in the Plan Summary step, and then click Next.
E The individual steps containing stage information are skipped when executing a
Sampling from a
isting Sample Plan
nus choose:
sample plan. You can now go on to the Finish step at any time.
Optionally, you can:
Specify stages that have already been sampled.
CSPLAN and CSSELECT Commands Additional Features
The SPSS command language also allows you to:
Specify custom names for output variables.
Control the output in the Viewer. For example, you can suppress the stagewise
summary of t
he plan that is displayed if a sample is designed or modified,
suppress the summary of the distribution of sampled cases by strata that is shown
if the sample design is executed, and request a case processing summary.
Choose a subset of variables in the working data file to write to an external
sample fil
e.
See the SPSS Command Syntax Reference for complete syntax information.
Page 38
Page 39
Chapter
3
Preparing a C
for Analysis
Figure 3-1
Analysis Preparation Wizard, Welcome step
omplex Sample
23
Page 40
24
Chapter 3
The Analysis Preparation Wizard guides you through the steps for creating o r
modifying an analysis plan for use with the various Complex Samples analysis
procedures
. Before using the Wizard, you should have a sample drawn according to a
complex design.
Creating a new plan is most useful when you do not have access to the sampling
plan file us
ed to draw the sample (recall that the sampling plan contains a default
analysis plan). If you do have access to the sampling plan file used to draw the
sample, you can use the default analysis plan contained in the sampling plan file or
override t
he default analysis specifications and save your changes to a new file.
Creating a New Analysis Plan
E From the menus choose:
Analyze
Complex Sa
Prepare fo
mples
r Analysis...
E
Select Cr
eateaplanfile
, and choose a plan filename to which you will save the
analysis plan.
E Click Next to continue through the Wizard.
E Specify the variable containing sample weights in the Design Variables step,
optionally defining strata and clusters.
You can now click
Select the method for estimating standard errors in the Estimation Method step.
Specify the number of units sampled or the inclusion probability per unit in
Finish to save the plan. Optionally, in further steps you can:
the Size step.
Add a second or third stage to the design.
Paste your selections as command syntax.
tep allows you to identify the stratification and clustering variables and define
This s
sample weights. You can also provide a label for the stage.
Strata. The cross-classification of stratification variables defines distinct
pulations, or strata. Your total sample represents the combination of
subpo
independent samples from each stratum.
Clusters. Cluster variables define groups of observational units, or clusters. Samples
ninmultiplestagesselectclustersintheearlier stages and then subsample units
draw
from the selected clusters. When analyzing a data file obtained by sampling clusters
with replacement, you should include the duplication index as a cluster variable.
le Weights.
Samp
are computed automatically for subsequent stages of the current design.
You must provide sample weights in the first stage. Sample weights
Page 42
26
Chapter 3
Stage Label. You can specify an optional string label for each stage. This is used in the
output to help identify stagewise information.
Note: The so
urce variable list has the same contents across steps of the Wizard. In
other words, variables removed from the source list in a particular step are removed
from the list in all steps. Variables returned to the source list show up in all steps.
Tree Controls for Navigating the Analysis Wizard
At the left side of each step of the Analysis Wizard is an outline of all the steps. You
can navigate the Wizard by clicking on the name of an enabled step in the outline.
Steps are e
previous step has been given the minimum required specifications for that step. For
more information on why a given step may be invalid, see the Help for individual
steps.
nabled as long as all previous steps are valid—that is, as long as each
Page 43
27
Analysis Pre
Figure 3-3
Analysis Preparation Wizard, Estimation Metho d step (stage 1)
Preparing a Com
paration Wizard: Estimation Method
plex Sample for Ana lysis
tep allows you to specify an estimation method for the stage.
This s
WR (sampling with replacement). WR estimation does not include a correction for
sampling from a finite population, since it assumes that the sample was taken from
inite population. When the population for the stage is much larger than the
an inf
sample, this is a reasonable assumption. WR estimation can be specified only in the
final stage of a design; the Wizard will not allow you to add another stage if you
ct WR estimation.
sele
Equal WO R (equal probability sampling without replacement). Equal WOR estimation
includes the finite population correction and assumes that units are sampled with
l probability. Equal WOR can be specified in any stage of a design.
equa
Page 44
28
Chapter 3
Unequal WOR (unequal probability sampling without replacement). In addition to using
the finite population correction, Unequal WOR accounts for sampling units (usually
clusters) s
only in the first stage.
elected with unequal probability. This estimation method is available
Analysis Pr
Figure 3-4
Analysis Preparation Wizard, Size step (stage 1)
eparation Wizard: Size
step is used to specify inclusion probabilities or population sizes for the current
This
stage. Sizes can be fixed or can vary across strata. For the purpose of specifying sizes,
clusters specified in previous stages can be used to define strata.
s.
You can specify exact population sizes or the probabilities with which units
Unit
were sampled.
Page 45
29
Value. A single value is applied to all strata. If Population Sizes is selected as the
unit metric
selected, you should enter a value between 0 and 1, inclusive.
Unequal values for strata. Allows you to enter size values on a per-stratum basis
via the Define Unequal Sizes dialog box.
Read values from variable. Allows you to select a numeric variable that contains
size values
Define Unequal Sizes
Figure 3-5
Define Unequal Sizes dialog box
Preparing a Com
, you should enter a non-negative integer. If
for strata.
plex Sample for Ana lysis
Inclusion Probabilities is
The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum basis.
Size Specifications grid. The grid displays the cross-classifications of up to five strata
or cluster variables—one stratum/cluster combination per row. Eligible grid variables
include all stratification variables from the current and previous stages and all cluster
variables from previous stages. Variables can be reordered within the grid or moved
to the Exclude list. Enter sizes in the rightmost column. Click
Labels or Values
to toggle the display of value labels and data values for stratification and cluster
variables in the grid cells. Cells that contain unlabeled values always show values.
Page 46
30
Chapter 3
Click Refresh Strata to repopulate the grid with each combination of labeled data
values for variables in the grid.
Exclude. To
more variables to the Exclude list. These variables are not used to define sample sizes.
Analysis Pr
Figure 3-6
Analysis Preparation Wizard, Plan Summar y step (stage 1)
specify sizes for a subset of stratum/cluster combinations, move one or
eparation Wizard: Stage Summary
s the last step within each stage, providing a summary of the analysis design
This i
specifications through the current stage. From here, you can either proceed to the
next stage (creating it if necessary) or save the analysis specifications.
If you cannot add another stage, it is likely because:
No cluster variable was specified in the Design Variables step.
Page 47
31
You selected WR estimation in the Estimation Method step.
This is the third stage of the analysis, and the Wizard supports a maximum of
three stages
.
Analysis Preparation Wizard: Finish
Figure 3-7
Analysis Preparation Wizard, Finish step
Preparing a Com
plex Sample for Ana lysis
This is the final step. You can save the plan file now or paste your selections to
ax window.
asynt
When making changes to stages in the existing plan file, you can save the edited
plan to a new file or overwrite the existing file. When adding stages without making
ges to existing stages, the Wizard automatically overwrites the existing plan file.
chan
If you want to save the plan to a new file, choose to
Wizard into a syntax window
and change the filename in the syntax commands.
Paste the syntax g enerated by the
Page 48
32
Chapter 3
Modifying an
E From the me
Analyze
Complex Samples
Prepare for Analysis...
Select Edit a plan file, and choose a plan filename to which you will save the analysis
E
plan.
E Click Next to continue through the Wizard.
E Review the analysis plan in the Plan Summary step, and then click Next.
Subsequent steps are largely the same as for a new design. For more information, see
the Help for individual steps.
E Navigate to the Finish step, and specify a new name for the edited plan file, or choose
to overwrit
Optionally
Remove st
Existing Analysis Plan
nus choose:
e the existing plan file.
, you can:
ages from the plan.
Page 49
33
Analysis Pre
Figure 3-8
Analysis Preparation Wizard, Plan Summary step
Preparing a Com
paration Wizard: Plan Summary
plex Sample for Ana lysis
tep allows you to review the analysis plan and remove stages from the plan.
This s
Remove Stages. You can remove stages 2 and 3 from a multistage design. Since a plan
must have at least one stage, you can edit but not remove stage 1 from the design.
Page 50
Page 51
Chapter
4
Complex Samp
Complex Samples analysis procedures require analysis specifications from an
analysis or sample plan file in order to provide valid results.
Figure 4-1
Complex Samples Plan dialog box
les Plan
Plan. Specify the path of an analysis or sample plan file.
Joint Probabilities. In order to use Unequal WOR estimation for clusters drawn
using a PPS WOR method, you need to specify a separate file containing the joint
probabilities. This file is created by the Sampling Wizard during sampling.
35
Page 52
Page 53
Chapter
5
Complex Samp
The Complex Samples Frequencies procedure produces frequency tables for selected
variables a nd displays univariate statistics. Optionally, you can request statistics by
subgroups, defined by one or more categorical variables.
Example. Using the Complex Samples Frequencies procedure, you can obtain
univariate tabular statistics for vitamin usage among U.S. citizens, based on the
results of the National Health Interview Survey (NHIS) and with an appropriate
analysis plan for this public use data.
Statistics. The procedure produces estimates of cell population sizes and table
percentages, plus standard errors, confidence intervals, coefficients of variation,
design effects, square roots of design effects, cumulative values, and unweighted
counts for each estimate. Additionally, chi-square and likelihood ratio statistics are
computed for the test of equal cell proportions.
Data. Variables for which frequency tables are produced should be categorical.
Subpopulation variables can be string or numeric, but should be categorical.
Assumptions. The cases in the data file represent a sample from a complex design
that should be analyzed according to the specifications in the file selected in the
Complex Samples Plan dialog box.
les Frequencies
Obtaining Complex Samples Frequencies
E From the menus choose:
Analyze
Complex Samples
Frequencies...
Select a plan file and optionally select a custom joint probabilities file.
E
37
Page 54
38
Chapter 5
E
Click Continue.
Figure 5-1
Frequencies
E Select at least one frequency variable.
dialog box
Optionally, you can:
Specify variables to define subpopulations. Statistics are computed separately for
each subpopulation.
Page 55
39
Complex Samp
Figure 5-2
Frequencies Statistics dialog box
Cells. This group allows you to request estimates of the cell population sizes and
table percentages.
Statistics. This group produces statistics associated with the population size or table
percentage.
Standard error. The standard error of the estimate.
Confidence interval. A confidence interval for the estimate, using the specified
level.
Coefficient of variation. The ratio of the standard error of the estimate to the
estimate.
Unweighted count. The number of units used to compute the estimate.
Design effect. The ratio of the variance of the estimate to the variance obtained
by assuming that the sample is a simple random sample. This is a measure of
the effect of specifying a complex design, where values further from 1 indicate
greater effects.
Square root of design effect. This is a measure of the effectof specifying a complex
design, where values further from 1 indicate greater effects.
Cumulative values. The cumulative estimate through each value of the variable.
les Frequencies Statistics
Complex Sample
s Frequencies
Page 56
40
Chapter 5
Test of equal cell proportions. This produces chi-square and likelihood-ratio tests of
the hypothesis that the categories of a variable have equal frequencies. Separate
tests are pe
rformed for each variable.
Complex Samples Missing Values
Figure 5-3
Missing Values dialog box
Tables.
Use all
This group determines which cases are used in the analysis.
available data.
Missing values are determined on a table-by-table basis.
Thus, the cases used to compute statistics may vary across frequency or
crosstabulation tables.
Use consistent case base. Missing values are determined across all variables.
he cases used to compute statistics are consistent across tables.
Thus, t
Categorical Desig n Variables. This group determines whether user-missing values are
valid or invalid.
Page 57
41
Complex Samp
Figure 5-4
Options dialog box
Subpopulation Display. You can choose to have subpopulations displayed in the same
table or in separate tables.
les Options
Complex Sample
s Frequencies
Page 58
Page 59
Chapter
6
Complex Samp
The Complex Samples Descriptives procedure displays univariate summary statistics
for several variables. Optionally, you can request statistics by subgroups, defined
by one or more categorical variables.
Example. Using the Complex Samples Descriptives procedure, you can obtain
univariate descriptive statistics for the activity levels of U.S. citizens based on the
results of the National Health Interview Survey (NHIS) and with an appropriate
analysis plan for this public use data.
Statistics. The procedure produces means and sums, plus t tests, standard errors,
confidence intervals, coefficients of variation, unweighted counts, population sizes,
design effects, and square roots of design effects for each estimate.
Data. Measures should be scale variables. Subpopulation variables can be string
or numeric but should be categorical.
Assumptions. The cases in the data file represent a sample from a complex design
that should be analyzed according to the specifications in the file selected in the
Complex Samples Plan dialog box.
les Descriptives
Obtaining Complex Samples Descriptives
E From the menus choose:
Analyze
Complex Samples
Descriptives...
Select a plan file, and optionally select a custom joint probabilities file.
E
E Click Continue.
43
Page 60
44
Chapter 6
Figure 6-1
Descriptives dialog box
E Select at least one measure variable.
Optionally, you can:
Specify variables to define subpopulations. Statistics are computed separately for
each subpopulation.
Page 61
45
Complex Samp
Figure 6-2
Descriptives Statistics dialog box
Summaries. This group allows you to request estimates of the means and sums of the
measure
a specified value.
variables. Additionally, you can request t tests of the estimates against
les Descriptives Statistics
Complex Sample
s Descriptives
Statistics. This group produces statistics associated with the mean or sum.
Standard error. The standard error of the estimate.
Confidence interval. A confidence interval for the estimate, using the specified
level.
Coeff
icient of variation.
The ratio of the standard error of the estimate to the
estimate.
Unweighted count. The number of units used to compute the estimate.
Population size. The estimated number of units in the population.
Design effect. The ratio of the variance of the estimate to the variance obtained
by assuming that the sample is a simple random sample. This is a measure of
ect of specifying a complex design, where values further from 1 indicate
the eff
greater effects.
Square root of design effect. This is a measure of the effectof specifying a complex
design, where values further from 1 indicate greater effects.
Page 62
46
Chapter 6
Complex Samp
Figure 6-3
Descriptives Missing Values dialog box
Statistics for Measure Variables. This group determines which cases are used in the
analysis.
Use all available data. Missing values are determined on a variable-by-variable
basis, thus the cases used to compute statistics may vary across measure variables.
Ensure consistent case base. Missing values are determined across all variables,
thus the
les Descriptives Missing Values
cases used to compute statistics are consistent.
Categorical Design Variables. This group determines whether user-missing values are
valid or invalid.
Complex Samples Options
Figure 6-4
Options dialog box
Page 63
47
Complex Sample
s Descriptives
Subpopulation Display. You can choose to have subpopulations displayed in the same
table or in separate tables.
Page 64
Page 65
Chapter
7
Complex Samp
The Complex Samples Crosstabs procedure produces crosstabulation tables for pairs
of selected variables and displays two-way statistics. Optionally, you can request
statistics by subgroups, defined by one or more categorical variables.
Example. Using the Complex Samples Crosstabs procedure, you can obtain
cross-classification statistics for smoking frequency by vitamin usage of U.S. citizens,
based on the results of the National Health Interview Survey (NHIS) and with an
appropriate analysis plan for this public-use data.
Statistics. The procedure produces estimates of cell population sizes and row, column,
and table percentages, plus standard errors, confidence intervals, coefficients of
variation, expected values, design effects, square roots of design effects, residuals,
adjusted residuals, and unweighted counts for each estimate. The odds ratio, relative
risk, and risk difference are computed for 2-by-2 tables. Additionally, Pearson and
likelihood-ratio statistics are computed for the test of independence of the row and
column variables.
Data. Row and column variables should be categorical. Subpopulation variables can
be string or numeric but should be categorical.
les Crosstabs
Assumptions. The cases in the data file represent a sample from a complex design
that should be analyzed according to the specifications in the file selected in the
Complex Samples Plan dialog box.
Obtaining Complex Samples Crosstabs
E From the menus choose:
Analyze
Complex Samples
Crosstabs...
49
Page 66
50
Chapter 7
E
Select a plan file and, optionally, select a custom joint probabilities file.
E Click Continue.
Figure 7-1
Crosstabs di
E Select at least one row variable and one column variable.
alog box
Optionally, you can:
Specify variables to define subpopulations. Statistics are computed separately for
each subpopulation.
Page 67
51
Complex Samp
Figure 7-2
Crosstabs Statistics dialog box
les Crosstabs Statistics
Complex Sample
s Crosstabs
Cells. This group allows you to request estimates of the cell population size and row,
column, and table percentages.
Statistics. This group produces statistics associated with the population size and row,
column, and table percentages.
Standard error. The standard error of the estimate.
Confidence interval. A confidence interval for the estimate, using the specified
level.
Coefficient of variation. The ratio of the standard error of the estimate to the
estimate.
Expected values. The expected value of the estimate, under the hypothesis of
independence of the row and column variable.
Unweighted count. The number of units used to compute the estimate.
Page 68
52
Chapter 7
Design effe ct. The ratio of the variance of the estimate to the variance obtained
by assuming
thatthesampleisasimplerandomsample.Thisisameasureof
the effect of specifying a complex design, where values further from 1 indicate
greater effects.
Square root of design effect. This is a measure of the effectof specifying a complex
design, whe
Residuals
re values further from 1 indicate greater effects.
.
The expected value is the number of cases that you would expect in the
cell if there were no relationship between the two variables. A positive residual
indicates that there are more cases in the cell than there would be if the row and
column var
Adjusted
iables were independent.
residuals.
The residual for a cell (observed minus expected value)
dividedbyanestimateofitsstandarderror.Theresultingstandardizedresidualis
expressed in standard deviation units above or below the mean.
Summaries for 2-by-2 Tables. This group produces statistics for tables in which the row
and column variable each have two categories. Each is a measure of the strength of
the assoc
Odds rat
iation between the presence of a factor and the occurrence of an event.
io.
The odds ratio can be used as an estimate of relative risk when the
occurrence of the factor is rare.
Relative risk. The ratio of the risk of an event in the presence of the factor to the
d the risk of the event in the absence of the factor.
Test of i
ndependence of rows and columns.
This produces chi-square and
likelihood-ratio tests of the hypothesis that a row and column variable are
independent. Separate tests are performed for each pair of variables.
Page 69
53
Complex Samp
Figure 7-3
Missing Values dialog box
Tables. This group determines which cases are used in the analysis.
Use all available data. Missing values are determined on a table-by-table basis.
Thus, th
crosstabulation tables.
Use consistent case base. Missing values are determined across all variables.
Thus, the cases used to compute statistics are consistent across tables.
Complex Sample
les Missing Values
e cases used to compute statistics may vary across frequency or
s Crosstabs
Categorical Desig n Variables. This group determines whether user-missing values are
valid or
invalid.
Complex Samples Options
Figure 7-4
Options dialog box
Page 70
54
Chapter 7
Subpopulation Display. You can choose to have subpopulations displayed in the same
table or in separate tables.
Page 71
Chapter
8
Complex Samp
The Complex Samples Ratios procedure displays univariate summary statistics for
ratios of variables. Optionally, you can request statistics by subgroups, defined
by one or more categorical variables.
Example. Using the Complex Samples Ratios procedure, you can obtain descriptive
statistics for the ratio of current property value to last assessed value, based on the
results of a statewide survey carried out according to a complex design and with an
appropriate analysis plan for the data.
Statistics. The procedure produces ratio estimates, t tests, standard errors, confidence
intervals, coefficients of variation, unweighted counts, population sizes, design
effects, and square roots of design effects.
Data. Numerators and denominators should be positive-valued scale variables.
Subpopulation variables can be string or numeric but should be categorical.
Assumptions. The cases in the data file represent a sample from a complex design
that should be analyzed according to the specifications in the file selected in the
Complex Samples Plan dialog box.
les Ratios
Obtaining Complex Samples Ratios
E From the menus choose:
Analyze
Complex Samples
Ratios...
Select a plan file and, optionally, select a custom joint probabilities file.
E
E Click Continue.
55
Page 72
56
Chapter 8
Figure 8-1
Ratios dialog box
E Select at least one numerator variable and denominator variable.
Optionally, you can:
Specify variables to define subgroups for which statistics are produced.
Page 73
57
Complex Samp
Figure 8-2
Ratios Statistics dialog box
ics.
Statist
Standa
Confide
level.
Coefficient of variation. The ratio of the standard error of the estimate to the
estimate.
Unweighted count. The number of units used to compute the estimate.
Population size. The estimated number of units in the population.
Design effect. The ratio of the variance of the estimate to the variance obtained
by assum
the effect of specifying a complex design, where values further from 1 indicate
greater effects.
Square root of design effect. This is a measure of the effectof specifying a complex
design,
Complex Sample
les Ratios Statistics
This group produces statistics associated with the ratio estimate.
rd error.
nce interval.
The standard error of the estimate.
A confidence interval for the estimate, using the specified
ing that the sample is a simple random sample. This is a measure of
where values further from 1 indicate greater effects.
sRatios
.
You can request t tests of the estimates against a specified value.
t-test
Page 74
58
Chapter 8
Complex Samp
Figure 8-3
Ratios Missing Values dialog box
Ratios. This group determines which cases are used in the analysis.
Use all available da ta. Missing values are determined on a ratio-by-ratio basis.
Thus, th
pairs.
Ensure consistent case base. Missing values are determined across all variables.
Thus, the cases used to compute statistics are consistent.
les Ratios Missing Values
e cases used to compute statistics may vary across numerator-denominator
Categorical Design Variables. This group determines whether user-missing values are
valid or invalid.
Complex Samples Options
Figure 8-4
Options dialog box
Page 75
59
Complex Sample
sRatios
Subpopulation Display. You can choose to have subpopulations displayed in the same
table or in separate tables.
Page 76
Page 77
Chapter
9
Complex Samp
les
General Linear Model
The Comple
regression analysis, as well as analysis of variance and covariance, for samples
drawn by complex sampling methods. Optionally, you can request analyses for
a subpopul
Example. A grocery store chain surveyed a set of customers concerning their
purchasing habits, according to a complex design. Given the survey results and
how much e
frequency with which customers shop is related to the amount they spend in a month,
controlling for the gender of the customer and incorporating the sampling design.
Statisti
t tests, design effects, and square roots of design effects for model parameters, as
well as the correlations and covariances between parameter estimates. Measures of
model fi
also available. Additionally, you can request estimated marginal means for levels of
model factors and factor interactions.
x Samples General Linear Model (CSGLM) procedure performs linear
ation.
ach customer spent in the previous month, the store wants to see if the
cs.
The procedure produces estimates, standard errors, confidence intervals,
t and descriptive statistics for the dependent and independent variables are
he dependent variable is quantitative. Factors are categorical. Covariates
Data. T
are quantitative variables that are related to the dependent variable. Subpopulation
variables can be string or numeric but should be categorical.
tions.
Assump
that should be analyzed according to the specifications in the file selected in the
Complex Samples Plan dialog box.
Select a plan file and, optionally, select a custom joint probabilities file.
E
E Click Continue.
Figure 9-1
General Linear Model dialog box
E Select a dependent variable.
Page 79
63
Complex Sample
s General Linear Model
Optionally, you can:
Select variables for Factors and Covariates, as appropriate for your data.
Specify a variable to define a subpopulation. The analysis is performed only for
the selected
Figure 9-2
Model dialog box
category of the subpopulation variable.
Specify Model Effects. By default, the procedure builds a main-effects model using the
factors and covariates specified in the main dialog box. Alternatively, you can build a
custom model that includes interaction effects and nested terms.
Page 80
64
Chapter 9
Non-Nested Terms
For the selected factors and covariates:
Interaction. Creates the highest-level interaction term for all selected variables.
.
Main effects
All 2-way. Creates all possible two-way interactions of the selected variables.
All 3-way. Creates all possible three-way interactions of the selected variables.
Creates a main-effects term for each variable selected.
All 4-way. Cr
All 5-way. Creates all possible five-way interactions of the selected variables.
Nested Terms
You can buil
eates all possible four-way interactions of the selected variables.
d nested terms for your model in this procedure. Nested terms are
useful for modeling the effect of a factor or covariate whose values do not interact
with the levels of another factor. For example, a grocery store chain may follow the
spending ha
bits of its customers at several store locations. Since each customer
frequents only one of these locations, the Customer effect can be said to be nestedwithin the Store location effect.
Additiona
lly, you can include interaction effects or add multiple levels of nesting
to the nested term.
Limitations. Nested terms have the following restrictions:
All factors within an interaction must be unique. Thus, if A is a factor, then
specifyin
All facto
g A*A is invalid.
rs within a nested effect must be unique. Thus, if A is a factor, then
specifying A(A) is invalid.
No effect can be nested within a covariate. Thus, if A is a factor and X is a
covariate, then specifying A(X) is invalid.
Intercept. The intercept is usually included in the model. If you can assume the
data pass through the origin, you can exclude the intercept. Even if you include the
intercep
t in the model, you can choose to suppress statistics related to it.
Page 81
65
Complex Samp
Figure 9-3
General Linear Model Statistics dialog box
Model Parameters. This group allows you to control the display of statistics related to
the mod
Estim
Stand
Confid
t-test. Displays a t test of each coefficient estimate. The null hypothesis for each
Covariances of parameter estimates. Displays an estimate of the covariance matrix
Corre
el parameters.
ate.
ard error.
The confidence level for the interval is set in the Options dialog box.
test is that the value of the coefficient is 0.
for the
lations of parameter estimates.
for the model coefficients.
Complex Sample
les General Linear Model Statistics
Displays estimates of the coefficients.
Displays the standard error for each coefficient estimate.
ence interval.
model coefficients.
Displays a confidence interval for each coefficient estimate.
Displays an estimate of the correlation matrix
s General Linear Model
Page 82
66
Chapter 9
Design effe ct. The ratio of the variance of the estimate to the variance obtained
by assuming
thatthesampleisasimplerandomsample.Thisisameasureof
the effect of specifying a complex design, where values further from 1 indicate
greater effects.
Square root of design effect. This is a measure of the effectof specifying a complex
design, whe
Model fit. D
re values further from 1 indicate greater effects.
2
isplays R
and root mean squared error statistics.
Population
means of dependent variable and covariates.
about the dependent variable, covariates, and factors.
Sample design information. Displays summary information about the sample,
including the unweighted count and the population size.
Complex Samples Hypothesis Tests
Figure 9-4
Hypothesis Tests d ialog box
Displays summary information
Page 83
67
Complex Sample
Test Statistic. This group allows you to select the type of statistic used for testing
s General Linear Model
hypotheses. You can choose between F, adjusted F, chi-square, and adjusted
chi-square
Sampling De
.
grees of Freedom.
This group gives you control over the sampling design
degrees of freedom used to compute p values for all test statistics. If based on the
sampling design, the value is the difference between the number of primary sampling
units and t
he number of strata in the first stage of sampling. Alternatively, you can set
a custom degrees of freedom by specifying a positive integer.
Adjustment for Multiple Comparisons. When performing many hypothesis tests, you
run the risk of increased overall Type I error (the probability of incorrectly rejecting
a null hyp
othesis). This group allows you to choose the method of adjusting the
significance level.
Least significant difference. This method does not control the overall probability
of reject
ing the hypotheses that some linear contrasts are different from the
null hypothesis values.
Sequential Sidak. This is a sequentially step-down rejective Sidak procedure
that is much less conservative in terms of rejecting individual hypotheses but
maintai
Sequen
ns the same overall significance level.
tial Bonferroni.
This is a sequentially step-down rejective Bonferroni
procedure that is much less conservative in terms of rejecting individual
hypotheses but maintaining the same overall significance level.
Sidak. This method provides tighter bounds than the Bonferroni approach.
Bonferroni. This method adjusts the observed significance level for the fact that
e contrasts are being tested.
multipl
Page 84
68
Chapter 9
Complex Samp
Figure 9-5
General Linear Model Esti mated Means dialog box
The Estimated Means dialog box allows you to display the model-estimated marginal
means for levels of factors and factor interactions specified in the Model subdialog
box. You can also request that the overall population mean be displayed.
les General Linear Model Estimated Means
Term . Estimated means are computed for the selected factors and factor interactions.
Contrast. The contrast determines how hypothesis tests are set up to compare the
estimated means.
Simple. Compares the mean of each level to the mean of a specified level. This
type of contrast is useful when there is a control group.
Deviation. Compares the mean of each level (except a reference category) to the
mean of all of the levels (grand mean). The levels of the factor can be in any order.
Difference. Compares the mean of each level (except the first) to the mean of
previous levels. They are sometimes called reverse Helmert contrasts.
Helmert. Compares the mean of each level of the factor (except the last) to the
mean of subsequent levels.
Page 85
69
Repeated. Compares the mean of each level (except the last) to the mean of
the subsequ
Polynomial
The first degree of freedom contains the linear effect across all categories; the
second degree of freedom, the quadratic effect; and so on. These contrasts are
oftenusedt
Reference C
or factor level against which the others are compared.
Complex Sa
Figure 9-6
General Linear Model Save dialog box
Complex Sample
ent level.
.
Compares the linear effect, quadratic effect, cubic effect, and so on.
o estimate polynomial trends.
ategory.
The simple and deviation contrasts require a reference category,
mples General Linear Model Save
s General Linear Model
Save Variables. This group allows you to save the model predicted values and
residuals as new variables in the working file.
Page 86
70
Chapter 9
Export Model as SPSS data. Writes an SPSS data file containing a covariance (or
correlation, if selected) matrix of the parameter estimates in the model. Also, for each
dependent v
ariable, there will be a row of p arameter estimates, a row of standard
errors, a row of significance values for the t statistics corresponding to the parameter
estimates, and a row of sampling design degrees of freedom. You can use this matrix
file in oth
er procedures that read an SPSS matrix file.
Export Mod
el as XML.
Saves the parameter estimates and the parameter covariance
matrix, if selected, in XML (PMML) format. SmartScore and the server version of
SPSS (a separate product) can use this model file to apply the model information
to other da
ta files for scoring purposes.
Complex Samples General Linear Model Options
Figure 9-7
General Linear Model Options dialog box
Missing Values.
User-
any covariates, must have valid data. Cases with invalid data for any of these
variables are deleted from the analysis. These controls allow you to decide whether
missing values are treated as valid among the strata, cluster, subpopulation, and
userfactor variables.
All design variables, as well as the dependent variable and
Confidence Interval. This is the confidence interval level for coefficient estimates
and estimated marginal means. Specify a value greater than or equal to 50 and less
100.
than
Page 87
71
CSGLM Comman
The SPSS com
Specify cus
(using the
Fix covariates at values other than their means when computing estimated
marginal means (using the
Specify a metric for polynomial contrasts (using the EMMEANS subcommand).
Specify a tolerance value for checking singularity (using the CRITERIA
subcommand
Create use
Produce a g
See the SPSS Command Syntax Reference for complete syntax information.
Complex Sample
s General Linear Model
d Additional Features
mand language also allows you to:
tom tests of effects versus a linear combination of effects or a value
CUSTOM subcommand).
EMMEANS subcommand).
).
r-specified names for saved variables (using the
eneral estimable function table (using the
SAVE subcommand).
PRINT subcommand).
Page 88
Page 89
Chapter
10
Complex Samp
les
Logistic Regression
The Comple
analysis on a binary or multinomial dependent variable for samples drawn by complex
sampling methods. Optionally, you can request analyses for a subpopulation.
Example. A
different branches, according to a complex design. While incorporating the sample
design, the officer wants to see if the probability with which a customer defaults is
related t
Statistics. The procedure produces estimates, exponentiated estimates, standard
errors, confidence intervals, t tests, design effects, and square roots of design effects
for model
estimates. Pseudo R
dependent and independent variables are also available.
Data. Th
are quantitative variables that are related to the dependent variable. Subpopulation
variables can be string or numeric but should be categorical.
x Samples Logistic Regression procedure performs logistic regression
loan officer has collected past records of customers given loans at several
o age, employment history, and amount of credit debt.
parameters, as well as the correlations and covariances between parameter
2
statistics, classification tables, and descriptive statistics for the
e dependent variable is categorical. Factors are categorical. Covariates
ions.
Assumpt
that should be analyzed according to the specifications in the file selected in the
Complex Samples Plan dialog box.
Select a plan file and, optionally, select a custom joint probabilities file.
Figure 10-1
Logistic Reg
ression dialog box
E Select a dependent variable.
Optionally, you can:
Select variables for factors and covariates, as appropriate for your data.
Specify a variable to define a subpopulation. The analysis is performed only for
the selected category of the subpopulation variable.
Page 91
75
Complex Samp
Figure 10-2
Logistic Regression Reference Category dialog box
By default, the Complex Samples Logistic Regression procedure makes the
highest-valued category the reference category. This dialog box allows you to specify
the highest, lowest, or a custom category as the reference category.
Complex Sample
s Logistic Regression
les Logistic Regression Reference Category
Page 92
76
Chapter 10
Complex Samp
Figure 10-3
Logistic Regression Model dialog box
les Logistic Regression Model
Specify Model Effects. By default, the procedure builds a main-effects model using the
factors and covariates specified in the main dialog box. Alternatively, you can build a
om model that includes interaction effects and nested terms.
cust
Non-Nested Terms
For the selected factors and covariates:
raction.
Inte
Main effects. Creates a main-effects term for each variable selected.
Creates the highest-level interaction term for all selected variables.
Page 93
77
Complex Sample
All 2-way. Creates all possible two-way interactions of the selected variables.
All 3-way. Creates all possible three-way interactions of the selected variables.
All 4-way. Cr
All 5-way. Creates all possible five-way interactions of the selected variables.
Nested Terms
You can build
eates all possible four-way interactions of the selected variables.
nested terms for your model in this procedure. Nested terms are
s Logistic Regression
useful for modeling the effect of a factor or covariate whose values do not interact
with the levels of another factor. For example, a grocery store chain may follow the
spending ha
bits of its customers at several store locations. Since each customer
frequents only one of these locations, the Customer effect can be said to be nestedwithin the Store location effect.
Additiona
lly, you can include interaction effects or add multiple levels of nesting
to the nested term.
Limitations. Nested terms have the following restrictions:
All factors within an interaction must be unique. Thus, if A is a factor, then
specifyin
All facto
g A*A is invalid.
rs within a nested effect must be unique. Thus, if A is a factor, then
specifying A(A) is invalid.
No effect can be nested within a covariate. Thus, if A is a factor and X is a
covariate, then specifying A(X) is invalid.
Intercept. The intercept is usually included in the model. If you can assume the
data pass through the origin, you can exclude the intercept. Even if you include the
intercep
t in the model, you can choose to suppress statistics related to it.
Page 94
78
Chapter 10
Complex Samp
Figure 10-4
Logistic Regression Statistics dialog box
Model Fit. Controls the displays of statistics that measure the overall model
performance.
Pseudo R-square. The R
counterpart among logistic regression models. There are, instead, multiple
measures that attempt to mimic the properties of the R
Classification table. Displays the tabulated cross-classifications of the observed
category by the model-predicted category on the dependent variable.
les Logistic Regression Statistics
2
statistic from linear regression does not have an exact
2
statistic.
Parameters. This group allows you to control the display of statistics related to the
model parameters.
Estimate. Displays estimates of the coefficients.
Exponentiated estimate. Displays the base of the natural logarithm raised to the
power of the estimates of the coefficients. While the estimate has nice properties
for statistical testing, the exponentiated estimate, or exp(B), is easier to interpret.
Page 95
79
Complex Sample
Standard error. Displays the standard error for each coefficient estimate.
Confidence interval. Displays a confidence interval for each coefficient estimate.
The confiden
t-test. Dis
ce level for the interval is set in the Options dialog box.
plays a t test of each coefficient estimate. The null hypothesis for each
s Logistic Regression
test is that the value of the coefficient is 0.
Covariances of parameter estimates. Displays an estimate of the covariance matrix
for the model coefficients.
Correlations of parameter estimates. Displays an estimate of the correlation matrix
for the mode
Design eff
l coefficients.
ect.
The ratio of the variance of the estimate to the variance obtained
by assuming that the sample is a simple random sample. This is a measure of
the effect of specifying a complex design, where values further from 1 indicate
greater ef
Square ro
fects.
ot of design effect.
This is a measure of the effect of specifying a complex
design, where values further from 1 indicate greater effects.
Summary statistics for model variables. Displays summary information about the
dependent
variable, covariates, and factors.
Sample des
ign information.
Displays summary information about the sample,
including the unweighted count and the population size.
Page 96
80
Chapter 10
Complex Samp
Figure 10-5
Hypothesis Tests d ialog box
Test Statistic. This group allows you to select the type of statistic used for testing
hypotheses. You can choose between F, adjusted F, chi-square, and adjusted
chi-square.
les Hypothesis Tests
Sampling Degrees of Freedom. This group gives you control over the sampling design
degrees of freedom used to compute p values for all test statistics. If based on the
sampling design, the value is the difference between the number of primary sampling
units and the number of strata in the first stage of sampling. Alternatively, you can set
a custom degrees of freedom by specifying a positive integer.
Adjustment for Multiple Comparisons. When performing many hypothesis tests, you
run the risk of increased overall Type I error (the probability of incorrectly rejecting
a null hypothesis). This group allows you to choose the method of adjusting the
significance level.
Page 97
81
Complex Sample
Least significant difference. This method does not control the overall probability
of rejectin
g the hypotheses that some linear contrasts are different from the
null hypothesis values.
Sequential Sidak. This is a sequentially step-down rejective Sidak procedure
that is much less conservative in terms of rejecting individual hypotheses but
maintains t
Sequentia
he same overall significance level.
l Bonferroni.
This is a sequentially step-down rejective Bonferroni
procedure that is much less conservative in terms of rejecting individual
hypotheses but maintaining the same overall significance level.
Sidak. This method provides tighter bounds than the Bonferroni approach.
Bonferroni. This method adjusts the observed significance level for the fact that
multiple c
ontrasts are being tested.
Complex Samples Logistic Regression Odds Ratios
Figure 10-6
Logistic Regression Odds Ratios dialog box
s Logistic Regression
Page 98
82
Chapter 10
The Odds Ratios dialog box allows you to display the model-estimated odds ratios for
specified factors and covariates. A separate set of odds ratios is computed for each
category of
the dependent variable except the reference category.
Factors. Fo
r each selected factor, displays the ratio of the odds at each category of the
factor to the odds at the specified reference category.
Covariates. For each selected covariate, displays the ratio of the odds at the covariate’s
mean value plus the specified units of change to the odds at the mean.
When computing odds ratios for a factor or covariate, the procedure fixes all other
factors at
their highest levels and all other covariates at their means. If a factor or
covariate interacts with other predictors in the model, then the odds ratios depend not
only on the change in the specified variable but also on the values of the variables
with whic
h it interacts. If a specified covariate interacts with itself in the model (for
example, age*age), then the odds ratios depend on both the change in the covariate
and the value of the covariate.
Page 99
83
Complex Samp
Figure 10-7
Logistic Regression Save dialog box
Complex Sample
les Logistic Regression Save
s Logistic Regression
Save Variables. This group allows you to save the model-predicted category and
predicted probabilities as new variables in the working file.
Export Model as SPSS data. Writes an SPSS data file containing a covariance (or
correlation, if selected) matrix of the parameter estimates in the model. Also, for each
dependent variable, there will be a row of parameter estimates, a row of standard
errors, a row of significance values for the t statistics corresponding to the parameter
estimates, and a row of sampling design degrees of freedom. You can use this matrix
file in other procedures that read an SPSS matrix file.
Page 100
84
Chapter 10
Export Model as XML. Saves the parameter estimates and the parameter covariance
matrix, if selected, in XML (PMML) format. SmartScore and the server version of
SPSS (a sepa
to other data files for scoring purposes.
rate product) can use this model file to apply the model information
Complex Sam
Figure 10-8
Logistic Regression Options dialog box
ples Logistic Regression Options
Estimation. This group gives you control of various criteria used in the model
estimation.
Maximum Iterations. The maximum number of iterations the algorithm will
execute. Specify a non-negative integer.
Maximum Step-Halving. At each iteration, the step size is reduced by a factor
5 until the log-likelihood increases or maximum step-halving is reached.
of 0.
Specify a positive integer.
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.