HP HP-UX Performance Tools User Manual

HP MLIB User’s Guide
Seventh Edition
HP Part No. B6061-96027 and B6061-96028
HP MLIB
December 2004
Printed in USA
Edition: Seventh
Document Numbers: B6061-96027 and B6061-96028 Remarks: Released Dcember 2004 with HP MLIB software version9.0.
User’s Guide split into two volumes with this release.
Edition: Sixth
Document Number: B6061-96023 Remarks: Released September 2003 with HP MLIB software version 8.50.
Edition: Fifth
Document Number: B6061-96020 Remarks: Released September 2002 with HP MLIB software version 8.30.
Edition: Fourth
Document Number: B6061-96017 Remarks: Released June 2002 with HP MLIB software version 8.20.
Edition: Third
Document Number: B6061-96015 Remarks: Released September 2001 with HP MLIB software version 8.10.
Edition: Second
Document Number: B6061-96012 Remarks: Released June 2001 with HP MLIB software version 8.00.
Edition: First
Document Number: B6061-96010 Remarks: Released December 1999 with HP MLIB software version
B.07.00. This HP MLIB User’s Guide contains both HP VECLIB and HP LAPACK information. This document supercedes both the HP MLIB VECLIB User’s Guide, fifth edition (B6061-96006) and the HP MLIB LAPACK User’s Guide, sixth edition (B6061-96005).
Notice
Copyright 1979-2004 Hewlett-Packard Development Company. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws.
The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material,
including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance or use of this material.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
LAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
ScaLAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
SuperLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
SOLVERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
VMATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Purpose and audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
Documentation resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

Part 1

1 Introduction to VECLIB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Accessing VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Compiling and linking (VECLIB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Problem with +ppu compatibility and duplicated symbols . . . . . . . . . . . . . . . . . . . . 14
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Linking for parallel or non parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Controlling VECLIB parallelism at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Performance benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
OpenMP-based nested parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
v
Message passing-based nested parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Default CPS library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Default Pthread library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . 22
Roundoff effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Data types and precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
VECLIB naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Data type and byte length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Operator arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Low-level subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
High-level subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
HP MLIB man pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 Basic Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
What you need to know to use vector subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
BLAS storage conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
BLAS indexing conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Operator arguments in the BLAS Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Representation of a permutation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Representation of a Householder matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Subprograms for basic vector operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes 40
ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN Index of minimum of magnitudes . . 43
ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements. . . . . . . . 46
ISMAX/IDMAX/IIMAX Index of maximum element of vector . . . . . . . . . . . . . . . . . 49
ISMIN/IDMIN/IIMIN Index of minimum element of vector. . . . . . . . . . . . . . . . . . . 51
ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx Search vector for element . . . . . . . . . . . . 53
SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes . . . . . . . . 56
SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN Minimum of magnitudes . . . . . . . . . . 59
vi Table of Contents
SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes . . . . . . . . . . . . 62
SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC Elementary vector operation 65
SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation . . . . . . . 68
SCLIP/DCLIP/ICLIP Two sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
SCLIPL/DCLIPL/ICLIPL Left sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
SCLIPR/DCLIPR/ICLIPR Right sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . 77
SCOPY/DCOPY/ICOPY/CCOPY/CCOPYC/ZCOPY/ZCOPYC Copy vector . . . . . . . 80

SDOT/DDOT/CDOTC/CDOTU/ZDOTC/ZDOTU Dot product . . . . . . . . . . . . . . . . . 84

SDOTI/DDOTI/CDOTCI/CDOTUI/ZDOTCI/ZDOTUI Sparse dot product . . . . . . . 88

SFRAC/DFRAC Extract fractional parts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
SGTHR/DGTHR/IGTHR/CGTHR/ZGTHR Gather sparse vector . . . . . . . . . . . . . . 94
SGTHRZ/DGTHRZ/IGTHRZ/CGTHRZ/ZGTHRZ Gather and zero sparse vector . 96
SLSTxx/DLSTxx/ILSTxx/CLSTxx/ZLSTxx List selected vector elements . . . . . . . 99
SMAX/DMAX/IMAX Maximum of vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
SMIN/DMIN/IMIN Minimum of vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
SNRM2/DNRM2/SCNRM2/DZNRM2 Euclidean norm . . . . . . . . . . . . . . . . . . . . . 107
SNRSQ/DNRSQ/SCNRSQ/DZNRSQ Euclidean norm squared . . . . . . . . . . . . . . . 110
SRAMP/DRAMP/IRAMP Generate linear ramp. . . . . . . . . . . . . . . . . . . . . . . . . . . 112

SROT/DROT/CROT/CSROT/ZROT/ZDROT Apply Givens rotation . . . . . . . . . . . 114

SROTG/DROTG/CROTG/ZROTG Construct Givens rotation . . . . . . . . . . . . . . . . 118
SROTI/DROTI Apply sparse Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
SROTM/DROTM Apply modified Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . 123
SROTMG/DROTMG Construct modified Givens rotation . . . . . . . . . . . . . . . . . . . 127
SRSCL/DRSCL/CRSCL/CSRSCL/ZRSCL/ZDRSCL Scale vector. . . . . . . . . . . . . . 130
SSCAL/DSCAL/CSCAL/CSSCAL/CSCALC/ZSCAL/ZDSCAL/ZSCALC Scale vector . .
133
SSCTR/DSCTR/ISCTR/CSCTR/ZSCTR Scatter sparse vector. . . . . . . . . . . . . . . . 136
SSUM/DSUM/ISUM/CSUM/ZSUM Vector sum . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
SSWAP/DSWAP/ISWAP/CSWAP/ZSWAP Swap two vectors . . . . . . . . . . . . . . . . . 141

SWDOT/DWDOT/CWDOTC/CWDOTU/ZWDOTC/ZWDOTU Weighted dot product .

145
SZERO/DZERO/IZERO/CZERO/ZZERO Clear vector . . . . . . . . . . . . . . . . . . . . . . 150
F_SAMAX_VAL/F_DAMAX_VAL/F_CAMAX_VAL/F_ZAMAX_VAL Maximum
absolute value and location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
F_SAMIN_VAL/F_DAMIN_VAL/F_CAMIN_VAL/F_ZAMIN_VAL Minimum absolute
value and location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
F_SAPPLY_GROT/F_DAPPLY_GROT/F_CAPPLY_GROT/F_ZAPPLY_GROT Apply
Table of Contents v i i
plane rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
F_SAXPBY/F_DAXPBY/F_CAXPBY/F_ZAXPBY Scaled vector accumulation . . . 161 F_SAXPY_DOT/F_DAXPY_DOT/F_CAXPY_DOT/F_ZAXPY_DOT Combine AXPY
and DOT routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
F_SCOPY/F_DCOPY/F_CCOPY/F_ZCOPY Copy vector. . . . . . . . . . . . . . . . . . . . . 167
F_SDOT/F_DDOT/F_CDOT/F_ZDOT Add scaled dot product . . . . . . . . . . . . . . . . 169
F_SFPINFO/F_DFPINFO Environmental inquiry . . . . . . . . . . . . . . . . . . . . . . . . . 172
F_SGEN_GROT/F_DGEN_GROT/F_CGEN_GROT/F_ZGEN_GROT Generate
Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
F_SGEN_HOUSE/F_DGEN_HOUSE/F_CGEN_HOUSE/F_ZGEN_HOUSE Generate
Householder transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
F_SGEN_JROT/F_DGEN_JROT/F_CGEN_JROT/F_ZGEN_JROT Generate Jacobi
rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
F_SMAX_VAL/F_DMAX_VAL Maximum value and location. . . . . . . . . . . . . . . . . 181
F_SMIN_VAL/F_DMIN_VAL Minimum value and location. . . . . . . . . . . . . . . . . . 183
F_SNORM/F_DNORM Norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
F_SPERMUTE/F_DPERMUTE/F_CPERMUTE/F_ZPERMUTE Permute vector 187
F_SRSCALE/F_DRSCALE/F_CRSCALE/F_ZRSCALE Reciprocal Scale . . . . . . . 190
F_SSORT/F_DSORT Sort vector entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
F_SSORTV/F_DSORTV Sort vector and return index vector. . . . . . . . . . . . . . . . . 193
F_SSUM/F_DSUM/F_CSUM/F_ZSUM Sum of entries of a vector. . . . . . . . . . . . . 195
F_SSUMSQ/F_DSUMSQ/F_CSUMSQ/F_ZSUMSQ Sum of squares . . . . . . . . . . . 197
F_SSWAP/F_DSWAP/F_CSWAP/F_ZSWAP Interchange vectors . . . . . . . . . . . . . 200
F_SWAXPBY/F_DWAXPBY/F_CWAXPBY/F_ZWAXPBY Scaled vector addition. 202
3 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
What you need to know to use these subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Subroutine naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Operator arguments in the BLAS Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Subprograms for basic matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
SGBMV/DGBMV/CGBMV/ZGBMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 212
SGECPY/DGECPY/CGECPY/ZGECPY Copy general matrix . . . . . . . . . . . . . . . . 219
SGEMM/DGEMM/CGEMM/ZGEMM Matrix-matrix multiply . . . . . . . . . . . . . . . 222
viii Table of Contents
DGEMMS/ZGEMMS Strassen matrix-matrix multiply. . . . . . . . . . . . . . . . . . . . . 227
SGEMV/DGEMV/CGEMV/ZGEMV Matrix-vector multiply . . . . . . . . . . . . . . . . . 232
SGER/DGER/CGERC/CGERU/ZGERC/ZGERU Rank-1 update . . . . . . . . . . . . . . 237
SGETRA/DGETRA/CGETRA/ZGETRA In-place transpose of a general square matrix
241
SSBMV/DSBMV/CHBMV/ZHBMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 244
SSPMV/DSPMV/CHPMV/ZHPMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 249
SSPR/DSPR/CHPR/ZHPR Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
SSPR2/DSPR2/CHPR2/ZHPR2 Rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
SSYMM/DSYMM/CHEMM/CSYMM/ZHEMM/ZSYMM Matrix-matrix multiply 265
SSYMV/DSYMV/CHEMV/ZHEMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 270
SSYR/DSYR/CHER/ZHER Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
SSYR2/DSYR2/CHER2/ZHER2 Rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
SSYR2K/DSYR2K/CHER2K/CSYR2K/ZHER2K/ZSYR2K Rank-2k update . . . . . 284
SSYRK/DSYRK/CHERK/CSYRK/ZHERK/ZSYRK Rank-k update . . . . . . . . . . . . 289
STBMV/DTBMV/CTBMV/ZTBMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 294
STBSV/DTBSV/CTBSV/ZTBSV Solve triangular band system . . . . . . . . . . . . . . . 301
STPMV/DTPMV/CTPMV/ZTPMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 308
STPSV/DTPSV/CTPSV/ZTPSV Solve triangular system . . . . . . . . . . . . . . . . . . . . 313
STRMM/DTRMM/CTRMM/ZTRMM Triangular matrix-matrix multiply . . . . . . 318
STRMV/DTRMV/CTRMV/ZTRMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 323
STRSM/DTRSM/CTRSM/ZTRSM Solve triangular systems . . . . . . . . . . . . . . . . . 327
STRSV/DTRSV/CTRSV/ZTRSV Solve triangular system . . . . . . . . . . . . . . . . . . . 332
XERBLA Error handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
F_CHBMV/F_ZHBMV Hermitian banded matrix-vector multiply . . . . . . . . . . . . 340
F_CHEMV/F_ZHEMV Hermitian matrix-vector multiply. . . . . . . . . . . . . . . . . . . 342
F_CHER/F_ZHER Hermitian rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
F_CHER2/F_ZHER2 Hermitian rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
F_CHPMV/F_ZHPMV Hermitian packed matrix-vector multiply. . . . . . . . . . . . . 348
F_CHPR/F_ZHPR Hermitian rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
F_CHPR2/F_ZHPR2 Hermitian rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
F_SFPINFO/F_DFPINFO Environmental inquiry. . . . . . . . . . . . . . . . . . . . . . . . . 354
F_SGBMV/F_DGBMV/F_CGBMV/F_ZGBMV General band matrix-vector multiply .
355
F_SGE_COPY/F_DGE_COPY/F_CGE_COPY/F_ZGE_COPY Matrix copy. . . . . . 358
F_SGE_TRANS/F_DGE_TRANS/F_CGE_TRANS/F_ZGE_TRANS Matrix
transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Table of Contents i x
F_SGEMM/F_DGEMM/F_CGEMM/F_ZGEMM General matrix-matrix multiply 362 F_SGEMV/F_DGEMV/F_CGEMV/F_ZGEMV General matrix-vector multiply . . 365 F_SGEMVER/F_DGEMVER/F_CGEMVER/F_ZGEMVER Multiple matrix-vector
multiply, rank 2 update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
F_SGEMVT/F_DGEMVT/F_CGEMVT/F_ZGEMVT Multiple matrix-vector multiply
372
F_SGER/F_DGER/F_CGER/F_ZGER General rank-1 update . . . . . . . . . . . . . . . . 375
F_SSBMV/F_DSBMV/F_CSBMV/F_ZSBMV Symmetric band matrix-vector multiply
378 F_SSPMV/F_DSPMV/F_CSPMV/F_ZSPMV Symmetric packed matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
F_SSPR/F_DSPR/F_CSPR/F_ZSPR Symmetric packed rank-1 update. . . . . . . . . 384
F_SSPR2/F_DSPR2/F_CSPR2/F_ZSPR2 Symmetric rank-2 update . . . . . . . . . . . 386
F_SSYMV/F_DSYMV/F_CSYMV/F_ZSYMV Symmetric matrix-vector multiply . 389
F_SSYR/F_DSYR/F_CSYR/F_ZSYR Symmetric rank-1 update . . . . . . . . . . . . . . . 392
F_SSYR2/F_DSYR2/F_CSYR2/F_ZSYR2 Symmetric rank-2 update . . . . . . . . . . . 394
F_STBMV/F_DTBMV/F_CTBMV/F_ZTBMV Triangular banded matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
F_STBSV/F_DTBSV/F_CTBSV/F_ZTBSV Triangular banded solve . . . . . . . . . . . 400
F_STPMV/F_DTPMV/F_CTPMV/F_ZTPMV Triangular packed matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
F_STPSV/F_DTPSV/F_CTPSV/F_ZTPSV Triangular packed solve. . . . . . . . . . . . 406
F_STRMV/F_DTRMV/F_CTRMV/F_ZTRMV Triangular matrix-vector multiply 408 F_STRMVT/F_DTRMVT/F_CTRMVT/F_ZTRMVT Multiple triangular matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
F_STRSM/F_DTRSM/F_CTRSM/F_ZTRSM Triangular solve . . . . . . . . . . . . . . . . 414
F_STRSV/F_DTRSV/F_CTRSV/F_ZTRSV Triangular solve . . . . . . . . . . . . . . . . . 417
4 Sparse BLAS Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
What you need to know to use these subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Subroutine naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Sparse matrix storage formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Operator arguments in the Sparse BLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
x Table of Contents
Common arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
SM arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Order of arguments for args(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
SBCOMM/DBCOMM/CBCOMM/ZBCOMM Block coordinate matrix-matrix multiply
441 SBDIMM/DBDIMM/CBDIMM/ZBDIMM Block diagonal matrix-matrix multiply 445 SBDISM/DBDISM/CBDISM/ZBDISM Block diagonal format triangular solve . . 449 SBELMM/DBELMM/CBELMM/ZBELMM Block Ellpack matrix-matrix multiply . . .
453 SBELSM/DBELSM/CBELSM/ZBELSM Block Ellpack format triangular solve . 457 SBSCMM/DBSCMM/CBSCMM/ZBSCMM Block sparse column matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
SBSCSM/DBSCSM/CBSCSM/ZBSCSM Block sparse column format triangular solve
465 SBSRMM/DBSRMM/CBSRMM/ZBSRMM Block sparse row matrix-matrix multiply
469 SBSRSM/DBSRSM/CBSRSM/ZBSRSM Block sparse row format triangular solve . . .
473 SCOOMM/DCOOMM/CCOOMM/ZCOOMM Coordinate matrix-matrix multiply 477 SCSCMM/DCSCMM/CCSCMM/ZCSCMM Compressed sparse column matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
SCSCSM/DCSCSM/CCSCSM/ZCSCSM Compressed sparse column format triangular
solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
SCSRMM/DCSRMM/CCSRMM/ZCSRMM Compressed sparse row matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
SCSRSM/DCSRSM/CCSRSM/ZCSRSM Compressed sparse row format triangular
solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
SDIAMM/DDIAMM/CDIAMM/ZDIAMM Diagonal matrix-matrix multiply . . . . 497
SDIASM/DDIASM/CDIASM/ZDIASM Diagonal format triangular solve. . . . . . . 501
SELLMM/DELLMM/CELLMM/ZELLMM Ellpack matrix-matrix multiply . . . . 505
SELLSM/DELLSM/CELLSM/ZELLSM Ellpack format triangular solve. . . . . . . 509
SJADMM/DJADMM/CJADMM/ZJADMM Jagged diagonal matrix-matrix multiply .
513 SJADSM/DJADSM/CJADSM/ZJADSM Jagged diagonal format triangular solve 517
SSKYMM/DSKYMM/CSKYMM/ZSKYMM Skyline matrix-matrix multiply . . . . 521
SSKYSM/DSKYSM/CSKYSM/ZSKYSM Skyline format triangular solve . . . . . . 525
SVBRMM/DVBRMM/CVBRMM/ZVBRMM Variable block row matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Table of Contents x i
SVBRSM/DVBRSM/CVBRSM/ZVBRSM Variable block row format triangular solve
533
xii Table of Contents
Figures
List of Figures xiii
xiv List of Figures
Tables
Table 1-1 VECLIB and VECLIB8 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Table 1-2 Compiler Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Table 1-3 VECLIB Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Table 1-4 BLAS Standard Operator Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Table 2-1 FPINFO return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Table 3-1 Extended BLAS Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . 207
Table 3-2 Extended BLAS Naming Convention—Matrix Form . . . . . . . . . . . . . . . . . . . 208
Table 3-3 Extended BLAS Naming Convention—Computation . . . . . . . . . . . . . . . . . . . 208
Table 3-4 Extended BLAS Naming Convention—Subprogram Names . . . . . . . . . . . . . 209
Table 4-1 Sparse BLAS Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . . . . 423
Table 4-2 Sparse BLAS Naming Convention—Matrix Form . . . . . . . . . . . . . . . . . . . . . . 424
Table 4-3 4 x 5 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Table 4-4 COO Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Table 4-5 CSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Table 4-6 MSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Table 4-7 CSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Table 4-8 MSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Table 4-9 5 x 4 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Table 4-10 DIA Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Table 4-11 ELL Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Table 4-12 JAD Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Table 4-13 JAD Row-Permuted Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Table 4-14 5 x 5 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Table 4-15 SKY Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Table 4-16 4 x 6 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Table 4-17 BCO Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Table 4-18 BSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
List of Tables xiii
Table 4-19 BSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Table 4-20 6 x 6 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Table 4-21 VBR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
xiv List of Tables

VECLIB

VECLIB

Preface

Hewlett-Packard’s high-performance math libraries (HP MLIB) help you speed development of applications and shorten execution time of long-running technical applications.
HP MLIB is a collection of subprograms optimized for use on HP servers and workstations, providing mathematical software and computational kernels for engineering and scientific applications. HP MLIB can be used on systems ranging from single-processor workstations to multiprocessor high-end servers. HP MLIB is optimized for HP PA-RISC 2.0, Itanium 2, and Opteron processors. HP MLIB has six components; VECLIB, LAPACK, ScaLAPACK, SuperLU, SOLVERS, and VMATH.
HP VECLIB contains robust callable subprograms. Together with a subset of the BLAS Standard subroutines, HP MLIB supports the legacy BLAS, a collection of routines for the solution of sparse symmetric systems of equations, a collection of commonly used Fast Fourier Transforms (FFTs), and convolutions. Although VECLIB was designed for use with Fortran programs, C programs can call VECLIB subprograms, as described in Appendix A. Refer to Part 1 of this manual for HP VECLIB information.
Throughout this document there are references to legacy BLAS and BLAS Standard routines. Legacy BLAS routines include Basic Linear Algebra Subprograms (BLAS), that is the level 1, 2, and 3 BLAS, as well as the Sparse BLAS.
A BLAS standardization effort began with a BLAS Technical (BLAST) Forum meeting in November 1995 at the University of Tennessee. The efforts of the BLAST Forum resulted in a BLAS Standard specification in 1999. BLAS Standard routines refer to routines as defined by this BLAS Standard specification. HP MLIB supports a subset of BLAS Standard routines. Refer to Chapter 2, “Basic Vector Operations,” and Chapter 3, “Basic Matrix Operations,” for details about supported subprograms.
Preface xv

LAPACK

LAPACK
HP Linear Algebra Package (LAPACK) is a collection of subprograms that provide mathematical software for applications involving linear equations, least squares, eigenvalue problems, and the singular value decomposition. LAPACK is designed to supersede the linear equation and eigenvalue packages, LINPACK and EISPACK. The National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy supported the development of the public-domain version of LAPACK, from which the HP version was derived.
HP LAPACK fully conforms with public domain version 3.0 of LAPACK in all user-visible usage conventions. Refer to Part 2 of this manual for information specific to HP LAPACK. To supplement the HP specific information provided in Part 2 of this document, refer to the standard LAPACK Users’ Guide. You can access the latest edition of the LAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/lapack/lug/index..html

ScaLAPACK

ScaLAPACK is a library of high-performance linear algebra routines capable of solving systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. ScaLAPACK can also handle many associated computations such as matrix factorizations or estimating condition numbers.
ScaLAPACK is a public domain software that was developed by Oak Ridge National Laboratory. It is designed for distributed computing and uses the Message Passing Interface (MPI) for parallelism. This implementation provides a version of ScaLAPACK tuned on HP servers and built with HP’s MPI. The ScaLAPACK library routines are written in Fortran 77 and are callable from Fortran 90 and C routines. Unlike other MLIB libraries, there is not a version of ScaLAPACK that assumes all integers are 8 bytes in length.
xvi HP MLIB VECLIB User’s Guide
Refer to Part 3 of this manual for information specific to HP ScaLAPACK. To supplement the HP specific information provided in Part 3 of this document, refer to the standard ScaLAPACK Users’ Guide. You can access the latest edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/scalapack/slug/index..html

SuperLU

This implementation provides the Distributed SuperLU library designed for distributed memory parallel computers. It was based on the public-domain SuperLU_DIST, which was developed at the Lawrence Berkeley National Lab and the University of California at Berkeley.
The library contains a set of subroutines to solve a sparse linear system. The library is implemented in ANSI C, using HP Message Passing Interface (MPI) for communication. The library includes routines to handle both real and complex matrices using double precision. The parallel routine names for the double-precision real version start with the letters “pd” (e.g. pdgstrf). The parallel routine names for the double-precision complex version start with letters “pz” (e.g. pzgstrf). Unlike other MLIB libraries, there is not a version of SuperLU that assumes all integers are 8 bytes in length.
The routines can be called directly from C applications. They may also be called from Fortran applications, however, “bridge” routines (in C) must be supplied. For details about creating bridge routines, please refer to Section 2.9.2 in the SuperLU User’s Guide available at:
http://www.nersc.gov/~xiaoye/SuperLU
SuperLU

SOLVERS

Solvers is a collection of direct sparse linear system solvers and graph partitioning routines. Symmetric systems can be solved using SMP parallelism and structurally-symmetric systems can be solved using out-of-core functionality. These routines have been optimized for use on Hewlett-Packard servers. Features are:
Preface xvii

VMATH

• Sparse symmetric and structurally-symmetric linear equation solutions.
• Sparse symmetric ordinary and generalized eigensystem solutions.
• Out-of-core symmetric and structurally-symmetric linear equation and eigensystems solutions.
• Full METIS functionality This implementation provides the METIS Version 4.0.1 library. It is based
on the public-domain METIS, which was developed at the University of Minnesota, Department of Computer Science, and the Army HPC Research Center. The library contains a set of subroutines for graph partitioning, mesh partitioning, and sparse matrix reordering, as well as auxiliary routines. HP MLIB contains the full METIS functionality as that in the public domain METIS, however, the routine names are different. HP MLIB METIS routine names have been prepended with mlib_ to avoid name conflict on applications and libraries that contain their own local version of METIS.
For more information about METIS, please refer to: http://www-users.cs.umn.edu/~karpis/metis/metis/index.html
VMATH
VMATH is a library of vector math routines corresponding to many of the widely used scalar math routines available with C, C++, and Fortran90.
VMATH is intended for computationally intensive mathematical applications amenable to a vector programming style.
VMATH provides two libraries: VMATH, whose interface uses 4-byte integers; and VMATH8, whose interface uses 8-byte integers and is otherwise equivalent to VMATH. VMATH routines come with both Fortran and C interfaces.
For more detailed information on VMATH as well as subprogram specifications, please refer to the VMATH chapter in Part 5 of this book. The VMATH man pages provide a man page for each subprogram.
xviii HP MLIB VECLIB User’s Guide

Purpose and audience

This guide describes the MLIB software library and shows how to use it. This library provides mathematical software and computational kernels for applications.
The HP MLIB User’s Guide addresses experienced programmers who:
• Convert, develop, or optimize programs for use on HP servers and workstations
• Optimize existing software to improve performance and increase productivity
Purpose and audience
Preface xix

Organization

Organization
The HP MLIB User’s Guide describes HP MLIB VECLIB in Part 1, HP MLIB LAPACK in Part 2, HP MLIB ScaLAPACK in Part 3, and HP MLIB Distributed SuperLU in Part 4.
To learn fundamental information necessary for using the VECLIB library, read Chapter 1 and the introductory sections of the other chapters. These sections of background information will help you efficiently use the library subprograms.
To learn more about the subject of any chapter, refer to the literature cited in the “Associated documentation” section of each chapter.
Part 1 of this document is organized as follows:
• Chapter 1 introduces general concepts about VECLIB
• Chapter 2 describes basic vector operations included in VECLIB
• Chapter 3 describes basic matrix operations included in VECLIB
• Chapter 4 describes sparse BLAS operations
• Chapter 5 describes the discrete Fourier transforms in VECLIB
• Chapter 6 describes subprograms to compute convolutions and correlations of data sets
• Chapter 7 describes miscellaneous subprograms to produce random numbers, sort the elements of a vector in ascending or descending order, measure time, allocate dynamic memory, and report errors
Part 2 of this document is organized as follows:
• Chapter 8 describes information specific to Hewlett-Packard’s implementation of LAPACK
• Chapter 9 describes selected LAPACK auxiliary subprograms
Part 3 of this document is organized as follows:
• Chapter 10 describes ScaLAPACK functionality
Part 4 of this document is organized as follows:
• Chapter 11 describes Distributed SuperLU functionality
Part 5 of this document is organized as follows:
• Chapter 12 describes VMATH functionality
xx HP MLIB VECLIB User’s Guide
Organization
Part 6 of this document is organized as follows:
• Chapter 13 explains sparse symmetric linear equation subprograms
• Chapter 14 describes METIS subprograms
• Chapter 15 describes sparse symmetric eigenvalue subprograms
• Chapter 16 describes BCSLIB-EXT functionality Supplemental material is provided as follows:
• Appendix A describes how to call VECLIB and LAPACK subprograms from within C programs
• Appendix B describes LINPACK subprograms available in HP MLIB
• Appendix C lists parallelized subprograms in VECLIB and LAPACK
• An index is included at the back of the manual
Supplemental information for Part 2 of this HP MLIB User’s Guide, is found in the LAPACK Users’ Guide.
The LAPACK Users’ Guide is a publication from the Society for Industrial and Applied Mathematics that provides an introduction to the design of LAPACK as well as complete specifications for all the driver and computational routines. You can access the latest edition of the LAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/lapack/lug Supplemental information for Part 3 of this HP MLIB User’s Guide, is found in
the ScaLAPACK User’s Guide. The ScaLAPACK Users’ Guide is a publication from the Society for Industrial
and Applied Mathematics that provides an informal introduction to the design of ScaLAPACK as well as a detailed description of its contents and a reference manual. You can access the latest edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/scalapack/slug Supplemental information for Part 4 of this HP MLIB User’s Guide, is found in
the SuperLU User’s Guide, which is only available online at: http://www.nersc.gov/~xiaoye/SuperLU
Preface xxi

Notational conventions

Notational conventions
The following conventions are used in this manual:
Italics
Italics within text indicate mathematical entities used
or manipulated by the program: for example, solve the
n-by-n system of linear equations Ax = b. Italics within command lines indicate generic
commands, file names, or subprogram names. Substitute actual commands, file names, or subprograms for the italicized words. For example, the command line
f90 prog_name.o
instructs you to type the command f90, followed by the name of a program or subprogram object file.
UPPERCASE BOLDFACE
UPPERCASE BOLDFACE
Fortran statements indicates Fortran keywords and subprogram names that must be typed just as they appear. For example, CALL DAXPY.
within text and in prototype
lowercase boldface
xxii HP MLIB VECLIB User’s Guide
lowercase boldface
generic variable or array names. You should substitute actual variable or array names. The italicized mathematical entities and the lowercase boldface variable and array names usually correspond. For example, A is a matrix and a is the Fortran array containing the matrix:
CALL DAXPY (n, a, x, incx, y, incy) lowercase boldface
ASCII characters that must be typed just as they appear. For example, the command line
f90 prog_name.o
instructs you to type the command f90, followed by the name of a program or subprogram object file.
within text indicates Fortran
within command lines indicates
UPPERCASE MONOSPACE
UPPERCASE MONOSPACE indicates Fortran programs.
Brackets ( [ ] )
Square brackets in command examples designate optional entries.
NOTE
A NOTE highlights important supplemental information.

Documentation resources

The HP MLIB User’s Guide, the LAPACK Users’ Guide, and the ScaLAPACK Users’ Guide are available in hardcopy and online formats.
For the HP MLIB User’s Guide, refer to: http:// www.hp.com/go/mlib
Documentation resources
For the latest edition of the LAPACK Users’ Guide, refer to: http://www.netlib.org/lapack/lug
For the latest edition of the ScaLAPACK User’s Guide, refer to: http://www.netlib.org/scalapack/slug
For the latest edition of the SuperLU User’s Guide, refer to: http://www.nersc.gov/~xiaoye/SuperLU
The following documents provide supplemental information:
LAPACK Users’ Guide Philadelphia, PA: Society for Industrial and Applied Mathematics, 1995. This guide provides information on the subprograms provided with the LAPACK library.
• ScaLAPACK Users’ Guide Philadelphia, PA: Society for Industrial and Applied Mathematics. This guide provides information on the subprograms provided with the ScaLAPACK library.
Parallel Programming Guide for HP-UX Systems. Describes efficient shared-memory parallel programming techniques using HP compilers.
Fortran/9000 Programmer’s Guide. Describes features and requirements in terms of the tasks a programmer might perform. These tasks include how to compile, link, run, debug, and optimize programs.
Preface xxiii
Documentation resources
HP-UX Floating-Point Guide. Describes how floating-point arithmetic is implemented on HP 9000 systems and discusses how floating-point behavior affects the programmer.
HP Fortran 90 Programmer’s Guide. Provides extensive usage information (including how to compile and link), suggestions and tools for migrating to HP Fortran 90, and how to call C and HP-UX routines for HP Fortran 90.
HP Fortran 90 Programmer’s Reference. Provides complete Fortran 90 language reference information. It also covers compiler options, compiler directives, and library information.
HP C Programming Guide. Contains detailed discussions of selected C topics.
HP C/HP-UX Reference Manual. Presents reference information on the C programming language as implemented by HP.
HP aC++ Online Programmer’s Guide. Presents reference and tutorial information on aC++. (This manual is accessed by specifying aCC with the +help command-line option.)
HP MPI User’s Guide. Describes how to use HP MPI (Message Passing Interface), a library of C- and Fortran-callable routines used for message-passing programming.
• Software Optimization for High Performance Computing: Creating Faster Applications: Provides state-of-the-art solutions for every key aspect of
software performance; both code-based and algorithm-based.
HP-UX Assembly Language Reference Manual. Describes the HP-UX assembler for the PA-RISC processor.
HP PA-RISC 2.0 Architecture Reference. Describes the architecture of the PA-RISC 2.0 processor.
PA-RISC Procedure Calling Conventions Reference. Describes the conventions for creating PA-RISC assembly language procedure calls.
IA-64 and Elementary Functions. Describes the architecture of the IA-64 processor and how to implement elementary functions.
• METIS documentation is available online at http://www-users.cs.umn.edu/~karypis/metis/metis/index.html.
• OpenMP documentation is available online at http://www.openmp.org.
xxiv HP MLIB VECLIB User’s Guide
Part 1
HP VECLIB

1 Introduction to VECLIB

Overview

VECLIB, a component of HP MLIB, is a collection of subprograms optimized for use on Hewlett-Packard servers and workstations, providing mathematical software and computational kernels for engineering and scientific applications. This library contains subprograms for:
• Dense vector operations, including the Level 1 BLAS
• Sparse vector and matrix operations, including the Sparse BLAS
• BLAS 3 routine DGEMM is highly tuned
• Matrix operations, including the Level 2 and Level 3 BLAS
• CXML Blas extenstions Compaq Extended Math Library (CXML) is a collection of routines that
performs numerically intensive operations that occur frequently in engineering and scientific computing, such as linear algebra and signal processing.
HP MLIB adds support for the unique CXML extensions to the legacy BLAS and for the array math functions in CXML. However, the additional support of XCML is not exhaustive. For more information about CXML see the CXML Reference Manual part number AA-PV6VE-TE.
• Discrete Fourier transforms
• Convolution and correlation
• Miscellaneous tasks, such as sorting and generating random numbers
Overview
VECLIB provides two sets of libraries: VECLIB and VECLIB8. To determine if a subprogram is included in VECLIB8, refer to the VECLIB8 section under each subprogram specification in the following chapters.
Although VECLIB was designed for use with Fortran programs, C programs can call VECLIB subprograms. Refer to Appendix A, “Calling MLIB routines from C,” for details. Examples are in Fortran, unless otherwise indicated.
Except for subprograms described in Appendix B, “LINPACK Subprograms,” LINPACK, EISPACK, and SKYLINE subprograms are not included in the
Chapter 1 Introduction to VECLIB 1

Chapter objectives

Hewlett-Packard scientific libraries. Refer to the Appendix, “Converting from LINPACK or EISPACK” in the LAPACK Users’ Guide, for assistance converting programs that currently call LINPACK or EISPACK routines to call LAPACK or VECLIB routines instead.
This chapter provides information necessary for efficient use of VECLIB and includes discussions of:
• Standardization
• Accessing VECLIB
• Optimization
• Parallel processing
• Roundoff effects
• Data types and precision
• VECLIB naming convention
• Data type and byte length
• Operator arguments
• Error handling
• HP MLIB man pages
• Troubleshooting
Chapter objectives
After reading this chapter you will:
• Know how to access VECLIB library subprograms
• Understand how VECLIB works in a parallel computing environment
• Understand VECLIB naming conventions
• Understand roundoff effects
• Understand how VECLIB handles errors
• Know how to access the online HP MLIB man pages
• Know what to do if you experience trouble using VECLIB subprograms
2 HP MLIB User’s Guide

Standardization

VECLIB conforms to a variety of existing standards. For example, it includes Basic Linear Algebra Subprograms (BLAS), levels 1, 2, and 3, and Sparse BLAS.
These products are available with standardized user interfaces on computers ranging from microcomputers to supercomputers. Because VECLIB conforms to these standards, it is a software bridge from other computers to Hewlett-Packard servers and workstations. However, even though the user interface is standardized, internal workings of HP VECLIB subprograms have been specialized for supported computers.
HP MLIB supports a subset of the BLAS Standard routines. BLAS standardization efforts, supported by software and hardware vendors and university groups, began with a BLAS Technical (BLAST) Forum meeting in November 1995 at the University of Tennessee. The efforts of the BLAST Forum resulted in a BLAS Standard specification in 1999. Refer to Chapter 2, “Basic Vector Operations,” and Chapter 3, “Basic Matrix Operations,” for details of supported subprograms.
Standardization

Accessing VECLIB

The VECLIB and VECLIB8 libraries are available as archive and shared libraries. They consist of compiled subprograms ready for you to incorporate into your programs with the linker. Include the appropriate declarations and CALL statements in your Fortran source program and specify that VECLIB or VECLIB8 be used as an object library at link time. Refer to the LAPACK chapter for details about accessing the LAPACK library, The ScaLAPACK chapter for details about accessing the ScaLAPACK library, and the Distributed SuperLU chapter for details about accessing the Distributed SuperLU library.
MLIB libraries are installed in the /opt/mlib/ directory. The entire path depends on your system type, as follows:
Chapter 1 Introduction to VECLIB 3
Accessing VECLIB

Table 1-1 VECLIB and VECLIB8 Libraries

Processor
Type
PA 2.0
Itanium 2
Itanium 2
Itanium 2
OS
Version
HP-UX
11i or later
HP-UX
11i V1.6 or
later
Red Hat
Linux
Enterprise 3
Intel V8.0
Compiler
Red Hat
Linux
Enterprise 3
Intel IA-32
V8.0
Compiler
Address
Width
32-bit /opt/mlib/lib/pa2.0
64-bit /opt/mlib/lib/pa20_64
32-bit /opt/mlib/lib/hpux32
64-bit /opt/mlib/lib/hpux64
64-bit /opt/mlib/intel_8.0/hpmpi_2.1/lib/64
32-bit /opt/mlib/intel_8.0/hpmpi_2.1/lib/32
Installation Directory Libraries
libveclib.a libveclib.sl
libveclib.a libveclib.sl libveclib8.a libveclib8.sl
libveclib.a libveclib.so
libveclib.a libveclib.so libveclib8.a libveclib8.so
libveclib.a libveclib.so libveclib8.a libveclib8.so
libveclib.a libveclib.so libveclib8.a libveclib8.so
Itanium 2
4 HP MLIB User’s Guide
HP XC6000
Intel V8.0
Compiler
64-bit /opt/mlib/intel_8.0/hpmpi_2.1/lib/64
libveclib.a libveclib.so libveclib8.a libveclib8.so
Accessing VECLIB
Processor
Type
Itanium 2
Itanium 2
Itanium 2
Only 64-bit addressing is supported in the VECLIB8 library. The file name of the archive VECLIB and VECLIB8 libraries are libveclib.a
and libveclib8.a respectively. The VECLIB shared library for PA is libveclib.sl and is libveclib.so for all other operating systems.
Performance of your applications is better when you use archive libraries. However, if you need to keep executable files to a minimum size, you can use shared libraries. See “Linking with libisamstub.a” on page 8 if your application calls Fortran77 routines from C language code, and you are linking with the VECLIB archive library.
The HP VECLIB library contains parallelized subprograms. Refer to Appendix C, “Parallelized Subprograms,” for details. If your program is run on an HP-UX system with multiple processors, see “Parallel processing” on page 17 for additional information about linking your program.
OS
Version
HP XC6000
Intel V7.1
Compiler
HP XC4000
PGI V5.1 Compiler
Opteron PGI V5.1 Compiler
Address
Width
64-bit /opt/mlib/intel_7.1/hpmpi_2.1/lib/64
64-bit /opt/mlib/intel_5.1/hpmpi_2.1/lib/64
64-bit /opt/mlib/intel_5.1/hpmpi_2.1/lib/64
Installation Directory Libraries
libveclib.a libveclib.so libveclib8.a libveclib8.so
libveclib.a libveclib.so libveclib8.a libveclib8.so
libveclib.a libveclib.so libveclib8.a libveclib8.so

Compiling and linking (VECLIB)

There are several ways to link your program with the version of VECLIB tuned for the machine on which the program runs. By default, the lveclib option on the f90, cc, or c89 command line that links your program selects the library that corresponds to 32-bit addressing on the machine on which you link.
cc is the HP C compiler and c89 is the HP POSIX-conforming C compiler. The
remainder of this book refers to the cc and f90 compilers. cc and f90 examples also apply to c89.
Chapter 1 Introduction to VECLIB 5
Accessing VECLIB
When you use the –aarchive_shared flag on your compiler command line for HP-UX, it ensures that the compiler links the archive library. If the archive library is not available, then it links the shared library. If you omit
–aarchive_shared and –ashared_archive, the linker defaults to linking the
shared library. Link with –Bstatic on Linux systems. If your program uses subprograms from VECLIB, LAPACK, ScaLAPACK,
Distributed SuperLU, SOLVERS and VMATH, specify llapack, lveclib,
lscalapack, −lsuperlu_dist, −lsolvers, and −lvmath or −llapack8, −lveclib8,
lscalapack8, −lsuperlu_dist8, −lsolvers8 and −lvmath8 on the compiler
command line.
NOTE Do not mix subprograms from the two types of 64-bit address libraries
(those with 32-bit integers and those with 64-bit integers) in the same program.
Each of the VECLIB and LAPACK library files is complete in itself, meaning that you do not need to link one library because you have called subprograms from another. This is because various subprograms are included in both libraries. For example, VECLIB subroutine SGEMV is called by several LAPACK subprograms, and therefore, is included in the LAPACK library. Thus, in general, you have to link only the library or libraries you need.
For PA-Based Systems
1. To link a program that uses VECLIB for use on the same machine, use one of the following commands:
f90 [options] file ... –Wl,–aarchive_shared lveclib
[options] file ... –Wl,–aarchive_shared lveclib lcl lm
cc
[options] file ... –Wl,–aarchive_shared lveclib lcl lm
aCC
2. Specify the entire path of the library file on the compiler command line that links your program. For example, to link your program with VECLIB for use with 32- or 64-bit addressing on a PA-based system, use one of the following:
f90 [options] file ... /opt/mlib/lib/[pa2.0|pa20_64]/libveclib.a
[options] file ... /opt/mlib/lib/[pa2.0|pa20_64]/libveclib.a lcl lm
cc
[options] file ... /opt/mlib/lib/[pa2.0|pa20_64]/libveclib.a lcl lm
aCC
Replace libveclib.a with libveclib.sl on your compiler command line if you want to link the shared library on a PA-based system.
3. Use the lveclib option on the compiler command line that links your program, preceded by:
Wl,–aarchive_shared,L/opt/mlib/lib/[pa2.0|pa20_64]
6 HP MLIB User’s Guide
Accessing VECLIB
For example, the command lines in Method 2 for PA could be written:
f90 [options] file ... −Wl,–aarchive_shared,L/opt/mlib/lib/[pa2.0|pa20_64]
lveclib
cc [options] file ... −Wl,–aarchive_shared,L/opt/mlib/lib/[pa2.0|pa20_64]
lveclib lcl −lm [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64]
aCC
lveclib lcl −lm
4. Set the LDOPTS environment variable to include:
–aarchive_shared,L/opt/mlib/lib/[pa2.0|pa20_64]
For example:
setenv LDOPTS “–aarchive_shared,–L/opt/mlib/lib/pa2.0”
Then use the lveclib option on the compiler command line that links your program:
f90 [options] file ... lveclib
[options] file ... lveclib lcl lm
cc
[options] file ... lveclib lcl lm
aCC
NOTE An LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line.
5. To link with the VECLIB8 libraries on a PA-based system, use the +DA2.0W option to specify memory addresses are 64-bit.
Compile Fortran applications with the +i8, +autodbl, or +autodbl4 option where:
+i8 promotes 4-byte integer of logical constants, intrinsics, and user
variables (declared as integer or logical) to 8-byte quantities.
+autodbl promotes all integer, logical, and real items to 8 bytes, and all
double-precision and complex items to 16 bytes.
+autodbl4 promotes all integer logical, and real items to 8 bytes, and
complex items to 16 bytes. The +autodbl4 option does not promote the size of double-precision and double-complex items.
f90 +DA2.0W +i8 [options] file ... Wl,–aarchive_shared lveclib8 cc +DA2.0W aCC +DA2.0W
[options] file ... Wl,–aarchive_shared lveclib8 lcl lm
[options] file ... Wl,–aarchive_shared lveclib8 lcl lm
Chapter 1 Introduction to VECLIB 7
Accessing VECLIB
Linking with libisamstub.a
C language codes that call Fortran77 routines from the BLAS Standard, the sparse linear equation system, or the sparse eigenvalue system, must explicitly link the ISAM (Indexed Sequential Access Method) stubs library into the program. For example,
cc [options] file ... –Wl,–aarchive_shared,–L/opt/fortran/lib/libisamstub.a
lveclib
lcl −lm
This only applies if you are linking with the VECLIB archive library. This option is only valid for 32-bit PA systems.
For Itanium-Based HP-UX Systems
1. To link a program that uses VECLIB for use on the same machine, use one of the following commands:
f90 [options] file ... –Wl,–aarchive_shared lveclib
[options] file ... –Wl,–aarchive_shared lveclib lcl lm
cc
[options] file ... –Wl,–aarchive_shared lveclib lcl lm
aCC
2. Specify the entire path of the library file on the compiler command line that links your program. To link your program with VECLIB for use with 32- or 64-bit addressing on an HP-UX system, use one of the following:
f90 [options] file ... /opt/mlib/lib/[hpux32|hpux64]/libveclib.a
[options] file ... /opt/mlib/lib/[hpux32|hpux64]/libveclib.a lcl lm
cc
[options] file ... /opt/mlib/lib/[hpux32|hpux64]/libveclib.a lcl lm
aCC
Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an Itanium-based system.
3. Use the lveclib option on the compiler command line that links your program, preceded by:
Wl,–aarchive_shared,L/opt/mlib/lib/[hpux32|hpux64]
For example, the command lines in Method 2 for HP-UX could be written:
f90 [+DD32|+DD64] [options] file...Wl,–aarchive_shared,L/opt/mlib/lib/ [hpux32|hpux64]
cc [+DD32|+DD64] [hpux32|hpux64]
aCC [+DD32|+DD64] [hpux32|hpux64]
4. Set the LDOPTS environment variable to include:
lveclib [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/
lveclib −lcl −lm [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/
lveclib −lcl −lm
8 HP MLIB User’s Guide
Accessing VECLIB
–aarchive_shared,L/opt/mlib/lib/[hpux32|hpux64]
For example:
setenv LDOPTS “–aarchive_shared,–L/opt/mlib/lib/hpux32”
Then use the lveclib option on the compiler command line that links your program:
f90 [options] file ... lveclib cc [options] file ... lveclib lcl lm
[options] file ... lveclib lcl lm
aCC
NOTE An LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line.
5. To link with the VECLIB8 libraries on an HP-UX system, use the +DD64 option to specify memory addresses are 64-bit.
Compile Fortran applications with the +i8, +autodbl, or +autodbl4 option where:
+i8 promotes 4-byte integer of logical constants, intrinsics, and user
variables (declared as integer or logical) to 8-byte quantities.
+autodbl promotes all integer, logical, and real items to 8 bytes, and all
double-precision and complex items to 16 bytes.
+autodbl4 promotes all integer logical, and real items to 8 bytes, and
complex items to 16 bytes. The +autodbl4 option does not promote the size of double-precision and double-complex items.
f90 +DD64 +i8 [options] file ... Wl,–aarchive_shared lveclib8 cc +DD64 aCC +DD64
[options] file ... Wl,–aarchive_shared lveclib8 lcl lm
[options] file ... Wl,–aarchive_shared lveclib8 lcl lm
For Itanium-Based Red Hat Linux Systems
1. To link a program that uses VECLIB for use on the same machine, use one of the following commands:
ifort [options] file ... –Wl,–Bstatic lveclib openmp
[options] file ... –Wl,–Bstatic lveclib openmp
icc
Link with -Bdynamic if you want the archive library on a linux system.
Chapter 1 Introduction to VECLIB 9
Accessing VECLIB
NOTE When you use the Intel V8.0 C compiler to link the SOLVERS library,
you may require one or more of -lifcore, -lifport, or -ldl.
2. Specify the entire path of the library file on the compiler command line that links your program. For example, to link your program with VECLIB for use with 32- or 64-bit addressing on a Linux system, use one of the following:
ifort [options] file ... /opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/libveclib.a
openmp icc
[options] file ... /opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/libveclib.a
openmp
Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on a Linux system.
When you use the C compiler for linking, you may require one or more of
-1CEPCF90, -1F90, -1IEPCF90, -1PEPCF90, or -1POSF90.
3. Use the lveclib option on the compiler command line that links your program, preceded by:
Wl,–Bstatic,L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]
For example, the command lines in Method 2 for Linux could be written:
ifort [options] file...Wl,–Bstatic,L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]
lveclib openmp
icc
[options] file ... −Wl,–Bstatic,L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]
lveclib openmp
When you use the C compiler for linking, you may require one or more of
-1CEPCF90, -1F90, -1IEPCF90, -1PEPCF90, or -1POSF90.
4. Set the LDOPTS environment variable to include:
–Bstatic,L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]
For example:
setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/linux”
Then use the lveclib option on the compiler command line that links your program:
ifort [options] file ... lveclib openmp
[options] file ... lveclib openmp
icc
NOTE An LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment
10 HP MLIB User’s Guide
variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line.
5. Use the following commands to link your programs for use with the VECLIB8 library on a Red Hat Linux system.
ifort i8 [options] file ... -L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/lveclib8
openmp
icc
[options] file ... −L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/lveclib8
openmp
When you use the C compiler for linking, you may require one or more of
-1CEPCF90, -1F90, -1IEPCF90, -1PEPCF90, or -1POSF90.
For XC6000 Systems with the Intel V8.0 Compiler
1. Use one of the following compile line commands to link VECLIB:
ifort [options] file ... –L/opt/mlib/intel_8.0/hpmpi_2.1/lib/64 lveclib openmp
[options] file ... –L/opt/mlib/intel_8.0/hpmpi_2.1/lib/64 lveclib openmp
icc
2. Specify the entire path of the library file on the compiler command line that links your program. For example, to link your program with VECLIB for use with 64-bit addressing on an XC6000 system with the Intel V8.0 compiler, use one of the following:
ifort [options] file ... /opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64/libveclib.a
openmp
Accessing VECLIB
icc
[options] file ... /opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64/libveclib.a
openmp
Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an XC6000 system.
3. Use the lveclib option on the compiler command line that links your program, preceded by:
Wl,–Bstatic,L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64
For example, the command lines in Method 2 for XC6000 could be written:
ifort [options] file...Wl,–Bstatic,L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64
lveclib openmp
icc
[options] file ... −Wl,–Bstatic,L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64
lveclib openmp
4. Set the LDOPTS environment variable to include:
–Bstatic,L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64
Chapter 1 Introduction to VECLIB 11
Accessing VECLIB
For example:
setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64”
Then use the lveclib option on the compiler command line that links your program:
ifort [options] file ... lveclib openmp
[options] file ... lveclib openmp
icc
NOTE An LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line.
For XC6000 Systems with the Intel V7.1 Compiler
1. Use one of the following compile line commands to link VECLIB:
efc [options] file ... –L/opt/mlib/intel_7.1/hpmpi_2.1/lib/64 lveclib openmp
[options] file ... –L/opt/mlib/intel_7.1/hpmpi_2.1/lib/64 lveclib openmp
icc
2. Specify the entire path of the library file on the compiler command line that links your program. For example, to link your program with VECLIB for use with 64-bit addressing on an XC6000 system with the Intel V7.1 compiler, use one of the following:
efc [options] file ... /opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64/libveclib.a
openmp icc
[options] file ... /opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64/libveclib.a
openmp
Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an XC6000 system.
3. Use the lveclib option on the compiler command line that links your program, preceded by:
Wl,–Bstatic,L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64
For example, the command lines in Method 2 for XC6000 could be written:
efc [options] file...Wl,–Bstatic,L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64
lveclib openmp
icc
lveclib openmp
4. Set the LDOPTS environment variable to include:
12 HP MLIB User’s Guide
[options] file ...Wl,–Bstatic,L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64
Accessing VECLIB
–Bstatic,L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64
For example:
setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64”
Then use the lveclib option on the compiler command line that links your program:
efc [options] file ... lveclib openmp icc [options] file ... lveclib openmp
NOTE An LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line.
For XC4000 Systems with the PGI V5.1 Compiler
1. Use one of the following compile line commands to link VECLIB:
pgf90 [options] file ... –L/opt/mlib/pgi_5.1/hpmpi_2.1/lib/64 lveclib mp
[options] file ... –L/opt/mlib/pgi_5.1/hpmpi_2.1/lib/64 lveclib mp
pgcc
lpgf90 lpgf90_rpml lpgf902 lpgf90rtl lpgftnrtl
2. Specify the entire path of the library file on the compiler command line that links your program. For example, to link your program with VECLIB for use with 64-bit addressing on an XC4000 system with the PGI V5.1 compiler, use one of the following:
pgf90 [options] file ... /opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64/libveclib.a mp
[options] file ... /opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64/libveclib.a mp
pgcc
lpgf90 lpgf90_rpml lpgf902 lpgf90rtl lpgftnrtl
Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an XC4000 system.
3. Use the lveclib option on the compiler command line that links your program, preceded by:
Wl,–Bstatic,L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64
For example, the command lines in Method 2 for XC4000 could be written:
pgf90 [options] file...Wl,–Bstatic,L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64
lveclib mp
pgcc
[options] file ...−Wl,–Bstatic,L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64
lveclib mp
lpgf90 lpgf90_rpml lpgf902 lpgf90rtl lpgftnrtl
Chapter 1 Introduction to VECLIB 13
Accessing VECLIB
4. Set the LDOPTS environment variable to include:
–Bstatic,L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64
For example:
setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/pgi5.1/hpmpi_2.1/lib/64”
Then use the lveclib option on the compiler command line that links your program:
pgf90 [options] file ... lveclib mp
[options] file ... lveclib mp lpgf90 lpgf90_rpml lpgf902 lpgf90rtl
pgcc
lpgftnrtl
NOTE An LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line.

Problem with +ppu compatibility and duplicated symbols

All MLIB subprograms documented in the HP MLIB User’s Guide have two entry points: one is compatible with the Fortran compiler option +noppu (no postpend underbar) and the second is compatible with the Fortran compiler option +ppu (postpend underbar). For example, the MLIB BLAS error handler XERBLA has entry points xerbla and xerbla_. Table 1-1 shows the Fortran 90 compiler defaults.
Table 1-2 Compiler Defaults
Different compiler defaults can cause duplicate symbol definitions at link time as the following examples illustrate.
PA-RISC processors
Suppose you write your own Fortran version of the XERBLA subroutine. You compile on PA-RISC with +DA2.0W and the compiler turns on +ppu (See Table 1-1), so your xerbla.o object file has an entry point xerbla_. However, the PA-RISC MLIB libraries were compiled with +noppu and the entry points with underbars were added as synonyms to the entry points without underbars. Hence, the various MLIB subprograms that call XERBLA reference them as xerbla (without the postpended underbar). Therefore, your xerbla_ does not satisfy their CALL XERBLA, so the linker accesses xerbla.o from the library.
14 HP MLIB User’s Guide
32-bit Addressing 64-bit Addressing
PA-RISC +noppu +ppu
Itanium +ppu +ppu
Accessing VECLIB
This file includes both entry points xerbla and xerbla_, and xerbla_ conflicts with your XERBLA subroutine.
Use one of the following workarounds on PA-RISC if you are compiling your own Fortran version of XERBLA:
• Compile your XERBLA with +noppu. This implies that you also compile any of your program that calls XERBLA with +noppu, and that you apply this rule recursively.
• Use f90 ALIAS directives:
% !$HP$ ALIAS xerbla='xerbla'
This prevents the compiler from postpending the underbar onto the entry point and external references for XERBLA.
This problem is not confined to the XERBLA subroutine. It occurs on PA-RISC when the following conditions are met:
• Any subprogram with the same name as a user-visible MLIB subprogram is compiled in your program with +ppu
• Another MLIB subprogram used by your program also calls that subprogram
Furthermore, the reverse can occur in the PA-RISC 32-bit libraries if you compile with the +ppu compiler option.
Itanium processors
If you compile on Itanium using the compiler option +noppu your xerbla.o object has an entry point xerbla (without the postpended underbar). However the Itanium MLIB libraries were compiled with +ppu option (following the default behavior described in Table 1-1) and the entry points without underbars were added as synonyms to the entry points with underbars. Hence the various MLIB Fortran source subprograms that call XERBLA reference them as xerbla_. Therefore, your xerbla (without the postpended underbar) does not satisfy the Fortran calls to XERBLA, so the linker accesses xerbla.o from the library. This file contains both entry points xerbla and xerbla_, and xerbla conflicts with your XERBLA subroutine.
Moreover, the various MLIB C source subprograms that call XERBLA still reference them as xerbla (without the postpended underbar). It means that both xerbla and xerbla_ entry points must be provided in your xerbla.o.
Use one of the following workarounds on Itanium if you are compiling your own Fortran version of XERBLA:
• Compile your Fortran source for XERBLA with +noppu and provide both xerbla and xerbla_ entry points:
Chapter 1 Introduction to VECLIB 15
Accessing VECLIB
SUBROUTINE XERBLA(SRNAME,INFO) ENTRY XERBLA_SRNAME,INFO)
...
• Use f90 ALIAS directives and provide both xerbla and xerbla_ entry points:
!$HP$ ALIAS xerbla='xerbla' !$HP$ ALIAS xerbla_='xerbla_' SUBROUTINE XERBLA(SRNAME,INFO) ENTRY XERBLA_(SRNAME,INFO)
...
The ALIAS directives prevent the compiler from postpending the underbar onto the entry points and external references to XERBLA.
Use the following workaround on Itanium if you are compiling your own C version of XERBLA:
• Undefine any previous aliasing from include files and provide both xerbla and xerbla_ entry points explicitly:
#undef xerbla #undef xerbla_
void xerbla (char*name, int*iarg, size_t len_name){ ... }
void xerbla_ (char*name, int*iarg, size_t len_name){ xerbla (name, iarg, len_name); }
This problem is not confined to the XERBLA subroutine. It occurs on Itanium when the following conditions are met:
• Any subprogram with the same name as a user-visible MLIB
subprogram is compiled in your program with
+noppu
• Another MLIB subprogram used by your program also calls that
subprogram
PA-RISC and Itanium processors
Also note that all the workarounds listed for Itanium are more generic and can also be used for PA-RISC. Therefore, if you are coding your own version of a MLIB routine called "mlib_routine" on PA-RISC or Itanium, a Fortran version might be implemented as:
!$HP$ ALIAS mlib_routine='mlib_routine' !$HP$ ALIAS mlib_routine_='mlib_routine_'
16 HP MLIB User’s Guide
SUBROUTINE mlib_routine(...) ENTRY mlib_routine_(...) ...
And a C version might be:
#undef mlib_routine #undif mlib_routine_
void mlib_routine (...){ ... }
void mlib_routine_(...){ mlib_routine(...); }

Optimization

The key computational kernels in VECLIB have been optimized to take full advantage of both PA-RISC and Itanium tightly integrated architectures. Optimizations include:
Optimization
• Instruction scheduling to maximize the number of machine instructions executed per clock cycle
• Algorithm restructuring to increase the number of computations per memory access
• Cache management to decrease the number of data cache misses
Refer to “Accessing VECLIB” on page 3 for instructions on linking the desired library with your program.

Parallel processing

Parallel processing is available on multi-processor HP platforms running the HP-UX 11i or greater operating system and Linux. These systems can divide a single computational process into small streams of execution, called threads. The result is that you can have more than one processor executing on behalf of the same process. Also, refer to Appendix C, “Parallelized Subprograms,” for a list of HP VECLIB parallelized subprograms.
Chapter 1 Introduction to VECLIB 17
Parallel processing
You can enable or disable parallel processing at link time or at runtime. A program does not use parallelism in VECLIB unless parallel processing is enabled at both link time and at runtime.

Linking for parallel or non parallel processing

To enable parallel processing at link time, your link step must produce a multithreaded executable. On HP-UX systems, use the + compiler options to get a multithreaded executable when you link with the HP Fortran compiler; use +O3 and +Oopenmp when you link with the HP C compiler; and use +Oopenmp when you link with the HP aC++ compiler:
f90 [options including +O3 +Oparallel] file ... –Wl,–aarchive_sharedlveclib
[options including +O3 +Oopenmp] file ... –Wl,–aarchive_sharedlveclib lcl
cc
lm [options including +Oopenmp] file ... –Wl,–aarchive_sharedlveclib lcl
aCC
lm
To disable VECLIB’s automatic parallelism at link time, omit the +Oparallel and +Oopenmp options:
f90 [options] file ... –Wl,–aarchive_sharedlveclib
[options] file ... –Wl,–aarchive_sharedlveclib lcl lm
cc
[options] file ... –Wl,–aarchive_sharedlveclib lcl lm
aCC
VECLIB for Linux is always multithreaded enabled.
O3 and +Oparallel

Controlling VECLIB parallelism at runtime

When you enable parallelism at link time, three methods are available at runtime to specify the extent of parallel processing in MLIB.
• Use MLIB_NUMBER_OF_THREADS, a shell environment variable that
allows you to enable parallelism within MLIB subprograms and to specify the maximum number of threads that can be used in parallel regions.
Not setting MLIB_NUMBER_OF_THREADS has the same result as setting it to 1; that is, parallel processing is disabled within MLIB subroutines. Setting MLIB_NUMBER_OF_THREADS to the number of CPUs in the system, or greater, allows parallelized MLIB subprograms to use as many CPUs as are available to the process.
The following command lines show the C shell syntax and Korn shell syntax to use when setting the variable to eight processors:
For C shell:
setenv MLIB_NUMBER_OF_THREADS 8
18 HP MLIB User’s Guide
Parallel processing
For Korn shell:
export MLIB_NUMBER_OF_THREADS=8
MLIB_NUMBER_OF_THREADS is examined on the first call to a parallelized VECLIB subprogram to establish the default parallel action within VECLIB.
Use the subroutine MLIB_SETNUMTHREADS to restore VECLIB parallel processing to its run-time default that was specified by MLIB_NUMBER_OF_THREADS. Refer to the mlib_setnumthreads(3m) man page for usage information.
• Use the subroutine MLIB_SETNUMTHREADS. You can call this subroutine at any time to set the maximum number of
parallel threads used in subsequent VECLIB or LAPACK calls. The specified value overrides the absence of the MLIB_NUMBER_OF_THREADS environment variable or any value assigned to it.
• To modify the effect of the MLIB_NUMBER_OF_THREADS and MLIB_SETNUMTHREADS, you can also use MP_NUMBER_OF_THREADS at runtime. When MP_NUMBER_OF_THREADS is set to a value smaller than MLIB_NUMBER_OF_THREADS and MLIB_SETNUMTHREADS, the value of MP_NUMBER_OF_THREADS takes precedence and limits the amount of parallelism to MP_NUMBER_OF_THREADS.
These controls set the maximum amount of SMP parallelism that your program can use, and the VECLIB-specific mechanisms offer finer control within that maximum. Refer to “Performance benefits” below for more information.
Performance benefits
If VECLIB parallelism is enabled, each parallelized VECLIB subprogram determines at runtime if multiple processors are available. If multiple processors are available, VECLIB detects whether the program is already using multiple threads. If the application calling VECLIB is not running multiple threads, VECLIB will attempt to use all available threads (limited by MLIB_NUMBER_THREADS). If the application calling VECLIB is already running multiple threads, VECLIB will attempt to use the remaining threads without over-subscribing. This is a form of nested parallelism.
If you are using an HP server with multiple processors, you can realize the performance benefits of parallel processing in three ways:
• Call any parallelized VECLIB subprogram. Let it use parallelism internally if it determines that it is appropriate to do so based on such factors as problem size, system configuration, and user environment.
Chapter 1 Introduction to VECLIB 19
Parallel processing
• Call VECLIB subprograms in a parallelized loop or region. VECLIB
• Use the Message Passing Interface (MPI) explicit parallel model. Refer to
VECLIB subprograms are reentrant, meaning that they may be called several times in parallel to do independent computations without one call interfering with another. You can use this feature to call VECLIB subprograms in a parallelized loop or region.
The compiler does not automatically parallelize loops containing a function reference or subroutine call. You can force it to parallelize such a loop by defining OpenMP parallel regions.
For example, the following Fortran code makes parallel calls to subprogram DAXPY:
supports nested parallelism where the outer parallelism is implemented through OpenMP while the inner parallelism is implemented with VECLIB SMP parallelism. To use this mechanism, you must be familiar with the techniques of parallel processing. Refer to the Parallel Programming Guide for HP-UX Systems for details.
the HP MPI User’s Guide or the MPI(1) man page for details.
NTHREADS = 4 C$OMP PARALLEL DO NUM_THREADS(NTHREADS) DO J=1, N CALL DAXPY (N-I,A(I,J),A(I+1,I),1,A(I+1,J),1) ENDO C$OMP END PARALLEL DO
While optimizing a parallel program, you may want to make parallel calls to a VECLIB subprogram to execute independent operations where the call statements are not in a loop. OpenMP supports the PARALLEL and END PARALLEL directions that define a block of code that is to be executed by multiple threads in parallel.

OpenMP-based nested parallelism

Nested parallelism can be achieved when calling VECLIB parallelized subprograms from an OpenMP parallel region. (See “Parallelized subprograms in VECLIB” on page 1104.) Consider the following code running on an HP platform with at least four processors:
... call omp_set_nested (.true.) c$omp parallel NUM_THREADS(2) myid = omp_get_thread_num
if (myid.eq.0) then call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta, c,ldc)
else
call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,
f,ldf)
20 HP MLIB User’s Guide
Parallel processing
endif c$omp end parallel call omp_set_nested(.false.) ...
Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one OpenMP thread for
C αAB βC+=
and another for
F αDE βF+=
Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.
If a parallel VECLIB subprogram is called from a parallelized loop or region, VECLIB will automatically avoid over-subscription of the CPUs. The number of threads spawned by each call to a parallelized VECLIB subroutine on a nested parallel region is limited by:
• MLIB_NUMBER_OF_THREADS
• The number of threads still available in the system
• will never be larger than four. Specifically:
MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)

Message passing-based nested parallelism

Nested parallelism can be achieved when calling VECLIB parallelized subprograms from an MPI process. (See “Parallelized subprograms in VECLIB” on page 1104.) Consider the following code:
... call mpi_init (ierr) call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)
if (myid.eq.0) then call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta, c,ldc)
else
call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,
f,ldf)
endif
...
Chapter 1 Introduction to VECLIB 21
Parallel processing
Assume the application started on two MPI processes. Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one MPI process for
C αAB βC+=
and another for
F αDE βF+=
Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.

Default CPS library stack is too small for MLIB

In libcps, the HP Compiler Parallel Support library, a CPS thread has a default stack size of 8M bytes. For performance reasons, several subprograms in HP MLIB use the stack for temporary arrays that exceed the default value. Using the default CPS stack size, these routines overwrite neighboring stacks, resulting in errors that are difficult to diagnose.
The solution is to change the CPS thread stacksize attribute to a value that is large enough to accommodate all the MLIB subprograms the thread may encounter. Currently, 8 MB*(the number of threads) should be sufficient for all MLIB subprograms.
The environment variable CPS_STACK_SIZE expects values in K bytes. Setting the stack size as follows would be sufficient for programs that execute on two threads:
For C shell:
% setenv CPS_STACK_SIZE 16384
For Korn shell:
% export CPS_STACK_SIZE=16384
Default Pthread library stack is too small for MLIB
The stack allocated for each new thread created using direct pthread calls to “pthread_create” might not be large enough for HP MLIB. Several subprograms in HP MLIB use the stack for storing temporary work arrays and improve performance. If the stack size is not large enough, these routines overwrite neighboring stacks, resulting in errors that are difficult to diagnose.
22 HP MLIB User’s Guide
Currently, 8 MB*(the number of threads) should be sufficient for all MLIB subprograms. If your application launchs threads directly from pthread library calls, set the minimum allocated stack size to match HP MLIB needs on each new thread. Setting the stack size on HP-UX as follows would be sufficient for programs that execute on two threads:
int stacksize = 8388608;
(...)
pthread_attr_setstacksize(&thread_attr, stacksize);
pthread_create(&thread_id, &thread_attr, thread_func, &thread_parm);
(...)

Roundoff effects

VECLIB subprograms may use a different arithmetic order of evaluation than other implementations. Different roundoff characteristics may result. Accuracy of results is usually similar to other implementations, so using VECLIB should not affect the accumulation of roundoff errors in a complete application. If it does, examine the mathematical analysis of the problem to determine if it is ill-conditioned. Ill-conditioned means that the small roundoff errors that are inadvertently introduced into any computation are magnified out of proportion to the desired result. Similarly, if results with and without VECLIB differ materially, both sets of answers are probably inaccurate and you should investigate further. If the program correctly applies stable computational algorithms, the problem itself is probably ill-posed.
Roundoff effects

Data types and precision

In general, VECLIB provides the same range of functionality for both real and complex data. For most computations, there are matching subprograms for real and complex data, but there are a few exceptions.
For example, corresponding to the subprograms for real dot products, there are subprograms for complex dot products in both the conjugated and unconjugated forms because both types of complex dot products occur. However, there is no complex analogue of the subprograms for solving a real symmetric sparse linear system.
Chapter 1 Introduction to VECLIB 23

VECLIB naming convention

Matching subprograms for real and complex data have been coded to maintain a close correspondence between the two. However, in some areas, the correspondence is necessarily weaker, and this has not been possible.
Subprograms in VECLIB are provided in both 32-bit and 64-bit addressing versions, except in the VECLIB8 library where only 64-bit addressing is supported.
VECLIB naming convention
The name of each VECLIB subprogram is a coded specification of its function within the limits of standard Fortran 77 6-character names. Usually, the first character of a subprogram name, denoted by T, shows the predominant data type according to Table 1-3:

Table 1-3 VECLIB Naming Convention—Data Type

T Data Type VECLIB VECLIB8
S Single Precision REAL*4 REAL*4 D Double Precision REAL*8 REAL*8
I Integer INTEGER*4 INTEGER*8 C Complex COMPLEX*8 COMPLEX*8 Z Double Complex COMPLEX*16 COMPLEX*16
Basic Linear Algebra Subprograms, that is, level 1, 2, and 3 BLAS, as well as the Sparse BLAS use this naming convention. For example, subprograms that compute the sum of vector elements are named according to data type: SSUM, DSUM, ISUM, CSUM, and ZSUM. Some function subprograms use two of these letters, the first describing the data type of the function and the second indicating the type of data on which it operates.
Fortran 77 BLAS Standard subprograms use a similar convention, with F_ prepended to each routine name.
For example, the legacy BLAS single-precision, triangular-solve routine is named STRSM and its BLAS Standard counterpart is named F_STRSM. BLAS Standard subprograms that compute the sum of vector elements are named F_SSUM, F_DSUM, F_CSUM, and F_ZSUM. Refer to “Legacy BLAS routines” on page 211 and “BLAS Standard routines” on page 339 for more information.
C BLAS Standard subprograms begin with C_ prepended to each routine name.
24 HP MLIB User’s Guide

Data type and byte length

There is a relationship between the data type of a subprogram, designated by the first character of its name (refer to T in Table 1-3), and the byte lengths of its arguments. This relationship is shown in Table :

MLIB Argument Lengths

MLIB Argument Lengths
Data type and byte length
T
S4 8 4 8 D4 8 8 16 C4 8 4 8
Z4 8 8 16
VECLIB INTEGER LOGICAL

Operator arguments

Some BLAS routines take input-only arguments called operators that allow for the specification of multiple related operations to be performed by a single function. Operator arguments used by the BLAS Standard routines are NORM,
SORT, SIDE, UPLO, TRANS, CONJ, DIAG
defined as follows:
norm Used by routines computing the norm of a vector or
sort Used by sorting routines.There are two valid values
side Used by functions computing the product of two
VECLIB8 INTEGER LOGICAL
, and JROT. Their meanings are
matrix. There are seven valid values that specify the norm to be computed, namely one-norm, real one-norm, infinity-norm and real infinity-norm for vectors and matrices, two-norm for vectors, and Frobenius-norm, max-norm, and real max-norm for matrices.
that specify whether the data should be specified in increasing or decreasing order.
matrices A and B. There are two valid values that specify whether A*B or B*A should be computed.
REAL COMPLEX
Chapter 1 Introduction to VECLIB 25
Operator arguments
uplo Refers to triangular matrices. There are two valid
values to specify whether a matrix is upper or lower triangular.
trans Used by routines applying a matrix, say A, to another
vector or another matrix. There are three valid values to specify whether the matrix (A), its transpose (AT), or
its conjugate transpose (A*) should be applied. op(A)
T
refers to A, A
, or A* depending on the input value of the trans operator argument. Some BLAS routines have more than one
trans operator. For example, a
general matrix multiply operation can be specified as
CopA()op B()
where A, B, and C are general matrices. A trans argument is needed for each of the input matrices A and B. These arguments are denoted transA and
transB.
conj Used by complex routines operating with x or
x.
diag Refers to triangular matrices. Two values are valid to
specify whether the triangular matrix has unit-diagonal or not.
jrot Used by the routine to generate Jacobi rotations. There
are three valid values to specify whether the rotation is an inner rotation, an outer rotation, or a sorted rotation.
For BLAS Standard routines, specify an operator argument with a named constant value. Table 1-4 on page 27 lists the operator arguments and their associated named constants.
The actual numeric value assigned to the named constant is defined in the appropriate language include file. For example, the f77blas.h include file defines assigned values for Fortran 77.
Table 1-4 lists the operator arguments and their associated named constants.
26 HP MLIB User’s Guide

Table 1-4 BLAS Standard Operator Arguments

Operator arguments
Operator
Argument
norm
sort blas_increasing_order sort in increasing order
side blas_left_side operate on the left hand side
uplo blas_upper reference upper triangle only
transx blas_trans
conj blas_conj
diag blas_non_unit_diag non-unit triangular
jrot blas_jrot_inner
Named Constant Meaning
blas_one_norm 1-norm
blas_real_one_norm real 1-norm
blas_two_norm 2-norm
blas_frobenius_norm Frobenius-norm
blas_real_inf_norm real infinity-norm
blas_max_norm max-norm
blas_real_max_norm real-norm
blas_decreasing_order sort in decreasing order
blas_right_side operate on the right hand side
blas_conj_trans
blas_unit_diag unit triangular
blas_jrot_outer
blas_jrot_sorted
blas_inf_norm infinity-norm
blas_lower reference lower triangle only
operate with x
blas_no_trans operate with x
1
-------
x
x
1
-------
c
2
2
operate with
operate with
blas_no_conj operate with x
inner rotation
0 c
outer rotation
sorted rotation
≤≤
abs a() abs b()
T
Chapter 1 Introduction to VECLIB 27

Error handling

Error handling
VECLIB subprograms are divided into two classes according to the way they detect and report usage errors:
• Low-level subprograms
• High-level subprograms

Low-level subprograms

Low-level subprograms are only minimally capable of detecting or handling errors. These subprograms attempt to do what is reasonable when a usage error occurs, but they do not warn you that something is wrong. For example, SDOT, which computes the dot product of two dense real vectors of length N, gives the mathematically proper result, zero, when N 0. This is considered mathematically correct because, by definition, an empty sum is zero. Because SDOT conforms to the published BLAS argument list and usage, however, SDOT does not notify you that you may have made a mistake. Issuing a warning would add severe usage conflicts with codes that already use the BLAS.
Because these low-level subprograms do what is reasonable, you can simplify a program and perhaps speed it up by using VECLIB subprograms without checking for special cases. For example, relying on SDOT having a zero result for N 0 may eliminate an IF statement that would take the same branch almost every time through a loop.

High-level subprograms

High-level subprograms detect and report errors. These subprograms usually do more work than the low-level subprograms, so the relative expense of checking and reporting errors may be small. Some possible errors are:
• Argument value errors, such as negative matrix order
• Computational errors, such as a singular matrix
• Computational problems, such as ill-conditioning When a high-level subprogram detects an error, it responds with a success/error
code, usually with the output argument ier. The convention used for ier is:
ier = 0: Successful exit
ier < 0: Invalid value of an argument—computation not completed
28 HP MLIB User’s Guide

Troubleshooting

ier > 0: Failure during the computation Some VECLIB subprograms do not have a success/error code in their argument
lists, but instead call another VECLIB subprogram to process the error condition. MLIB provides the following error handlers:
• XERBLA
• XERVEC
• F_BLASERROR Refer to the documentation for individual VECLIB subprogram to determine if
one of these error handlers is used. For example, all BLAS Standard subprograms (those subprograms whose names begin with F_) use F_BLASERROR.
The standard versions of XERBLA, XERVEC, and F_BLASERROR write an error message onto the standard error file. Execution is then terminated with a nonzero exit status. You can supply a version of the error handler that alters this action. Refer to “XERBLA” on page 337, “XERVEC” on page 623, and “F_BLASERROR” on page 605 for more information about these routines.
Routine-specific error conditions are listed in the respective subprogram documentation.
Troubleshooting
The following are suggestions to help you find the cause of a problem:
1. Verify that the subprogram usage in the program matches the subprogram specifications in the documentation. Pay attention to the number of arguments in the CALL statement and to the declarations of arrays and integer constants or variables that describe them. Write out all the arguments immediately before and after the CALL statement.
2. Make sure there really is a problem. For example, if you get an apparently incorrect answer, check if the answer satisfies the problem as defined in the program. For problems with more than one answer, VECLIB may produce a different answer or give the answers in a different order than expected. If the problem is ill-conditioned, VECLIB may not be able to compute a reliable answer. Error messages often suggest the cause of the problem.
3. Isolate the problem. If possible, write a small test program that encounters the same difficulty. For example, write the data causing the problem from
Chapter 1 Introduction to VECLIB 29

HP MLIB man pages

the original program into the small one. In this way, you eliminate extraneous code from suspicion. If the problem area is large, try to pare it to a manageable size. For example, if a 50-by-50 linear system fails, try to produce a 2-by-2 system that fails in the same way.
HP MLIB man pages
The HP MLIB man pages contain online documentation that includes information from the HP MLIB User’s Guide.
The HP MLIB man pages are installed in the directory /opt/mlib/share/man. Set this path in your MANPATH environment variable to access man pages for VECLIB and LAPACK.
HP VECLIB man pages provide:
• An introduction to VECLIB (veclib (3m))
• An introduction to each group of subprograms, for example, blas2(3m)
• A man page for each subprogram The HP MLIB User’s Guide has more detailed information than the man pages
because of the limited number of fonts supported and the difficulty of presenting mathematical equations in the man(1) system.
For further explanation and a table of contents of reference entries for VECLIB, refer to the veclib(3) man page.
30 HP MLIB User’s Guide

2 Basic Vector Operations

Overview

This chapter explains how to use the VECLIB vector subprograms that serve as building blocks for many user programs. It describes subprograms for performing dense and sparse vector operations. This set of VECLIB subprograms includes:
• Basic Linear Algebra Subprograms
• Sparse BLAS
• Hewlett-Packard extensions to the BLAS
• BLAS Standard
The term BLAS, as used in this section, refers to all of the above listed BLAS and the Hewlett-Packard extensions to the BLAS.
BLAS standardization efforts, supported by software and hardware vendors and university groups, began with a BLAS Technical (BLAST) Forum meeting in November 1995 at the University of Tennessee. The efforts of the BLAST Forum resulted in a BLAS Standard specification in 1999.
HP MLIB 8.5 supports a subset of the BLAS Standard routines. This chapter describes dense and banded BLAS Standard functionality in “BLAS Standard routines” on page 152.
Overview
This section discusses commonly used or computationally expensive operations of linear algebra. Even though you can code most of these operations in fewer than 10 lines of Fortran or C, using VECLIB subprograms can improve program performance as well as program modularity and readability. However, in some situations, you can achieve better computational performance by entering Fortran or C code than by calling one of these subprograms.
Chapter 2 Basic Vector Operations 31

Chapter objectives

Chapter objectives
After reading this chapter you will:
• Understand BLAS storage conventions
• Know how to specify array sections
• Know how to handle backward storage
• Know how to use increment (also called stride) arguments
• Understand the vector subprograms included with HP VECLIB, both Legacy BLAS and BLAS Standard subprograms

Associated documentation

The following documents provide supplemental material for this chapter: Dodson, D.S., R.G. Grimes, and J.G. Lewis. “Sparse Extensions to the Fortran
Basic Linear Algebra Subprograms.” ACM Transactions on Mathematical Software. June, 1991. Vol. 17, No. 2.
Dodson, D.S., R.G. Grimes, and J.G. Lewis. “Algorithm 692: Model Implementation and Test Package for the Sparse Basic Linear Algebra Subprograms.” ACM Transactions on Mathematical Software. June, 1991. Vol. 17, No. 2.
Lawson, C., R. Hanson, D. Kincaid, and F. Krogh. “Basic Linear Algebra Subprograms for Fortran Usage.” ACM Transactions on Mathematical Software. September, 1979. Vol. 5, No. 3.
32 HP MLIB User’s Guide

What you need to know to use vector subprograms

What you need to know to use vector subprograms
The following sections describe overall considerations for using vector subprograms:
• BLAS storage conventions
Fortran storage of arraysFortran array argument association
• BLAS indexing conventions
Forward storageBackward storageIncrement arguments
• Operator arguments in the BLAS Standard
• Representation of a permutation matrix
• Representation of a Householder matrix

BLAS storage conventions

The Basic Linear Algebra Subprograms (BLAS) were developed to enhance the portability of published linear algebra codes. In particular LAPACK, the high-level public-domain linear equation package, uses the BLAS. When your routines use the VECLIB BLAS, you increase the efficiency of those routines on your Hewlett-Packard servers.
You need not limit your use of the VECLIB BLAS to linear algebra codes. Because these subprograms are portable, modular, and efficient, you can incorporate them into your programs. To realize the full power of the BLAS, you should understand storage and indexing conventions.
Chapter 2 Basic Vector Operations 33
What you need to know to use vector subprograms
Fortran storage of arrays
Two-dimensional arrays in Fortran are stored by columns. Consider the following specifications:
DIMENSION A(N1,N2),B(N3) EQUIVALENCE (A,B)
where N3 = N1 X N2. Then A(I,J) is associated with the same memory location as B(K) where
K = I + (J1) × N1
Successive elements of a column of A are adjacent in memory, while successive elements of a row of A are stored with a difference of N1 storage units between them. Remember that the size of a storage unit depends on the data type.
Fortran array argument association
When a Fortran subprogram is called with an array element as an argument, the value is not passed. Instead, the subprogram receives the address in memory of the element. Consider the following code segment:
REAL A(10,10) J = 3 L = 10 CALL SUBR (A(1,J),L) . . . SUBROUTINE SUBR (X,N) REAL X(N) . . .
SUBR is given the address of the first element of the third column of A. Because it treats that argument as a one-dimensional array, successive elements X(1), X(2),..., occupy the same memory locations as the successive elements of the
third column of A, that is, A(1,3), A(2,3),... Hence, the entire third column of A is available to the subprogram.
34 HP MLIB User’s Guide

BLAS indexing conventions

This section describes handling stride arguments and forward and backward storage.
A vector in the BLAS is defined by three quantities:
• Vector length
• Array or starting element within an array
• Increment, sometimes called stride—defines the number of storage units between successive vector elements
Forward storage
Suppose that X is a real array. Let N be the vector length and let INCX be the increment. Suppose that a vector x with components xi, i= 1, 2, ..., N, is stored
in X. If INCX 0, then xi is stored in X(1 + (I1) × INCX). This is forward storage starting from X(1) with stride
equal to INCX, ending with X(1 + (N1) × INCX). Thus, if N = 4 and INCX = 2, the vector components x1, x2, x3, and x4 are stored in the array
elements X(1), X(3), X(5), and X(7), respectively.
What you need to know to use vector subprograms
Backward storage
Some BLAS subprograms permit the backward storage of vectors, which is specified by using a negative INCX. If INCX < 0, then xi is stored in X(1 + (NI) × |INCX|) or equivalently in X(1 (NI) × INCX). This is backward storage starting from X(1 (N1) × INCX) with stride equal to INCX, ending with X(1). Thus, if N = 4 and INCX = 2, the vector components
x1, x2, x3, and x4 are stored in the array elements
X(7), X(5), X(3), and X(1), respectively.
INCX = 0 is only permitted by COPY routines in the legacy BLAS subprograms.
When INCX = 0 is allowed, it means that x is a vector of length N, whose components all equal the value of X(1).
The notation (N,X,INCX) describes a BLAS vector. For example, if X is an array of dimension N, then (N,X,1) represents forward storage and (N,X,1) represents backward storage. If A is an M-by-N array, then (M,A(1,J),1) represents column J and (N,A(I,1),M) represents row I. Finally, if an M-by-N matrix is embedded in the upper left-hand corner of an array B of size LDB by NMAX, then column J is (M,B(1,J),1) and row is I(N,B(I,1),LDB).
Chapter 2 Basic Vector Operations 35
What you need to know to use vector subprograms
Increment arguments
The following examples illustrate how to use increment arguments to perform different operations with the same subprogram. These examples use the function F_SDOT with the following usage:
SUBROUTINE F_SDOT (CONJ, N, ALPHA, X, INCX, BETA, Y, INCY, R) INTEGER CONJ, INCX, INCY, N REAL*4 ALPHA, BETA, R, X(*), Y(*)
This usage adds the scaled dot product of the vectors X( * ) and Y( * ) to a scaled scalar R.
Example 1
Compute the dot product
R = X(1)*Y(1) + X(2)*Y(2) + X(3)*Y(3) + X(4)*Y(4):
REAL*4 ALPHA,BETA,R,X(*),Y(*) SUBROUTINE F_SDOT (CONJ, 4, 1.0, X, 1, 0.0, Y, 1)
Example 2
Compute the dot product
T = X(1)*Y(4) + X(2)*Y(3) + X(3)*Y(2) + X(4)*Y(1):
REAL*4 ALPHA,BETA,R,X(*),Y(*) SUBROUTINE F_SDOT (CONJ, 4, 1.0, X, 1, 0.0, Y, -1)
Example 3
Compute the dot product
Y(2) = A(2,1)*X(1) + A(2,2)*X(2) + A(2,3)*X(3)
This is the dot product of the second row of an M-by-3 matrix A, stored in a 10-by-3 array, with a 3-vector X:
PARAMETER (LDA = 10) REAL*4 SDOT,A(LDA,3),X(3),Y(LDA) N = 3
Y(2) = SDOT (N, A(2,1),LDA, X,1)
Operator arguments in the BLAS Standard
Some routines in the BLAS Standard take input-only arguments called operators. Operators allow for the specification of multiple related operations to be performed by a single function. The BLAS Standard specifies the type and the valid values these arguments should have according to the specific programming language.
Operator arguments used by the BLAS Standard routines are NORM, SORT,
SIDE, UPLO, TRANS, CONJ, DIAG,
on page 25 for explanations of the valid operator values.
36 HP MLIB User’s Guide
and JROT. Refer to “Operator arguments”
In BLAS Standard routines, you specify an operator argument with a named constant value. The actual numeric value assigned to the named constant is defined in the appropriate language’s include file. Operator arguments are represented in the Fortran 77 interface as INTEGERs. This specification is different from the legacy BLAS, where operator arguments are defined as CHARACTER*1.
Refer to individual routines in “BLAS Standard routines” on page 152 for the named constants you can use to specify operator arguments for basic vector subprograms.

Representation of a permutation matrix

This section explains how the BLAS Standard represents a permutation matrix.
An n-by-n permutation matrix P is represented as a product of at most n interchange permutations. An interchange permutation E is a permutation obtained by swapping two rows of the identity matrix. An efficient way to represent a general permutation matrix P is with an integer vector p of length n. In other words, P = En...E1 and each Ei is the identity with rows i and p interchanged:
What you need to know to use vector subprograms
i
For i = n to 1 and incp < 0,
For i = 1 to n and incp > 0,
xi() xpi()()
xi() xpi()()
Chapter 2 Basic Vector Operations 37
What you need to know to use vector subprograms

Representation of a Householder matrix

This section explains how the BLAS Standard represents a Householder matrix.
An elementary reflector (or elementary Householder matrix) H of order n is a unitary matrix of the form
HIτυυ
where τ is a scalar, and υ is an n-vector, with
2
τ
υ is often referred to as the Householder vector. Often, υ can have leading or
trailing zero elements, but for the purposes of this discussion, assume that H has no special structure.
This representation sets υ1 = 1, meaning that υ1 need not be stored. In real arithmetic, except that τ = 0 implies H = I. This representation agrees with that used in LAPACK.
In complex arithmetic, τ may be complex, and satisfies
1 τ 2,≤≤
1 Re τ() 2 and τ 1–1≤≤
H
=
2
υ
2Re τ()=
2
Thus a complex H is not Hermitian, as with other representations, but it is unitary. The latter is the important property. The advantage of allowing τ to be complex is that, given an arbitrary complex vector x, H can be computed so that
with real β. This is useful, for example, when reducing a complex Hermitian matrix to a real symmetric tridiagonal matrix, or a complex rectangular matrix to real bidiagonal form.
38 HP MLIB User’s Guide
H∗x β 10… 0,, ,()
=
T

Subprograms for basic vector operations

Subprograms for basic vector operations
The following sections in this chapter describe the vector subprograms included with VECLIB:
• Legacy BLAS routines
• BLAS Standard routines
Note that the specification for operator arguments is different in legacy BLAS routines than in BLAS Standard routines. Operator arguments are represented in the BLAS Standard Fortran 77 interface as INTEGERs; in the legacy BLAS they are defined as CHARACTER*1.
In BLAS Standard routines, you specify an operator argument with a named constant value. Refer to the individual routines in “BLAS Standard routines” on page 152 for the named constants you can use to specify operator arguments. The actual numeric value assigned to the named constant is defined in the f77blas.h include file.
Chapter 2 Basic Vector Operations 39
ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes

Legacy BLAS routines

Name ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX

Index of maximum of magnitudes
Purpose Given a real or integer vector x of length n, ISAMAX, IDAMAX, or IIAMAX
determines the index of the element of the vector of maximum magnitude. Specifically, the subprograms determine the smallest index i such that

x
x
=
max
i
Given a complex vector x of length n, ICAMAX or IZAMAX determines the smallest index i:
Re xi() Im xi()+ max Re xj() Im xj()+:j 12n,, ,=()=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual definition of complex magnitude is
j
: j 12n,, ,=

Re xi()2Im xi()
 
This definition is not used because of computational speed. If the index i is used for pivot selection in matrix factorization, no significant difference in numerical stability should result.
The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 i, ISAMAX, n, incx REAL*4 x(lenx)
i = ISAMAX(n, x, incx)
INTEGER*4 i, IDAMAX, n, incx REAL*8 x(lenx)
i = IDAMAX(n, x, incx)
INTEGER*4 i, IIAMAX, n, incx, x(lenx) i = IIAMAX(n, x, incx)
40 HP MLIB User’s Guide
12
+
2
Index of maximum of magnitudes ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX
INTEGER*4 i, ICAMAX, n, incx COMPLEX*8 x(lenx)
i = ICAMAX(n, x, incx)
INTEGER*4 i, IZAMAX, n, incx COMPLEX*16 x(lenx)
i = IZAMAX(n, x, incx)
VECLIB8:
INTEGER*8 i, ISAMAX, n, incx REAL*4 x(lenx)
i = ISAMAX(n, x, incx)
INTEGER*8 i, IDAMAX, n, incx REAL*8 x(lenx)
i = IDAMAX(n, x, incx)
INTEGER*8 i, IIAMAX, n, incx, x(lenx) i = IIAMAX(n, x, incx)
INTEGER*8 i, ICAMAX, n, incx COMPLEX*8 x(lenx)
i = ICAMAX(n, x, incx)
INTEGER*8 i, IZAMAX, n, incx COMPLEX*16 x(lenx)
i = IZAMAX(n, x, incx)
Input n Number of elements of vector x to be used. If n 0, the
subprograms do not reference x.
x Array of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1) ×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the
element of x of maximum magnitude.
Chapter 2 Basic Vector Operations 41
ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes
Fortran Equivalent
INTEGER*4 FUNCTION ISAMAX (N,X,INCX) REAL*4 X(*),TEMP,XMAX ISAMAX = 1 IF ( N .GT. 1 ) THEN XMAX = ABS ( X(1) ) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N TEMP = ABS ( X(IX) ) IF ( TEMP .GT. XMAX ) THEN ISAMAX = I XMAX = TEMP END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISAMAX = 0 END IF RETURN END
Example Locate the largest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDAMAX,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDAMAX (N,X,INCX)
42 HP MLIB User’s Guide
Index of minimum of magnitudes ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN

Name ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN

Index of minimum of magnitudes
Purpose Given a real or integer vector x of length n, ISAMIN, IDAMIN, or IIAMIN
determines the index of element of the vector of minimum magnitude. Specifically, the subprograms determine the smallest index i such that
x
min xj: j 12… n,, ,=()=
i
Given a complex vector x of length n, ICAMIN or IZAMIN determines the smallest index i:
 
|Re(xi)|+|Im(xi)| min
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual definition of complex magnitude is
=

)|+|Im(xj)| : j 1, 2, ..., n=
|Re(x
j

Re x
()2Im xi()

i

This definition is not used because of computational speed. The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 i, ISAMIN, n, incx REAL*4 x(lenx)
i = ISAMIN(n, x, incx)
INTEGER*4 i, IDAMIN, n, incx REAL*8 x(lenx)
i = IDAMIN(n, x, incx)
INTEGER*4 i, IIAMIN, n, incx, x(lenx) i = IIAMIN(n, x, incx)
INTEGER*4 i, ICAMIN, n, incx COMPLEX*8 x(lenx)
i = ICAMIN(n, x, incx)
INTEGER*4 i, IZAMIN, n, incx COMPLEX*16 x(lenx)
i = IZAMIN(n, x, incx)
12
+
2
Chapter 2 Basic Vector Operations 43
ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN Index of minimum of magnitudes
VECLIB8:
INTEGER*8 i, ISAMIN, n, incx REAL*4 x(lenx)
i = ISAMIN(n, x, incx)
INTEGER*8 i, IDAMIN, n, incx REAL*8 x(lenx)
i = IDAMIN(n, x, incx)
INTEGER*8 i, IIAMIN, n, incx, x(lenx) i = IIAMIN(n, x, incx)
INTEGER*8 i, ICAMIN, n, incx COMPLEX*8 x(lenx)
i = ICAMIN(n, x, incx)
INTEGER*8 i, IZAMIN, n, incx COMPLEX*16 x(lenx)
i = IZAMIN(n, x, incx)
Input n Number of elements of vector x to be used. If n 0, the
subprograms do not reference x.
x Array of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the
element of x of minimum magnitude.
44 HP MLIB User’s Guide
Index of minimum of magnitudes ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN
Fortran Equivalent
INTEGER*4 FUNCTION ISAMIN (N,X,INCX) REAL*4 X(*),TEMP,XMIN ISAMIN = 1 IF ( N .GT. 1 ) THEN XMIN = ABS ( X(1) ) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N TEMP = ABS ( X(IX) ) IF ( TEMP .LT. XMIN ) THEN ISAMIN = I XMIN = TEMP END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISAMIN = 0 END IF RETURN END
Example Locate the smallest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDAMIN,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDAMIN (N,X,INCX)
Chapter 2 Basic Vector Operations 45
ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements

Name ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx

Count selected vector elements
Purpose Given a real, integer, or complex vector x of length n, these subprograms count
the number of elements of the vector that satisfy a specified relationship to a given scalar a.
The last two characters of the subprogram name specify the relation of interest between the elements of the vector and the scalar. For real and integer subprograms, these characters, represented by “xx” in the prototype Fortran statements, and the corresponding function values can be:
xx Function value
EQ GE GT LE
LT
NE
#{i : x #{i : x #{i : x #{i : x #{i : x #{i : x
For complex subprograms, these characters and corresponding function values are:
i
i
i
i
i
i
= a} a} > a} a} < a} a}
The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 i, ISCTxx, n, incx REAL*4 a, x(lenx)
i = ISCTxx(n, x, incx, a)
INTEGER*4 i, IDCTxx, n, incx REAL*8 a, x(lenx)
i = IDCTxx(n, x, incx, a)
INTEGER*4 i, IICTxx, n, incx, a, x(lenx) i = IICTxx(n, x, incx, a)
46 HP MLIB User’s Guide
xx Function value
EQ NE
#{i : x #{i : x
i
i
= a} a}
Count selected vector elements ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx
INTEGER*4 i, ICCTxx, n, incx COMPLEX*8 a, x(lenx)
i = ICCTxx(n, x, incx, a)
INTEGER*4 i, IZCTxx, n, incx COMPLEX*16 a, x(lenx)
i = IZCTxx(n, x, incx, a)
VECLIB8:
INTEGER*8 i, ISCTxx, n, incx REAL*4 a, x(lenx)
i = ISCTxx(n, x, incx, a)
INTEGER*8 i, IDCTxx, n, incx REAL*8 a, x(lenx)
i = IDCTxx(n, x, incx, a)
INTEGER*8 i, IICTxx, n, incx, a, x(lenx) i = IICTxx(n, x, incx, a)
INTEGER*8 i, ICCTxx, n, incx COMPLEX*8 a, x(lenx)
i = ICCTxx(n, x, incx, a)
INTEGER*8 i, IZCTxx, n, incx COMPLEX*16 a, x(lenx)
i = IZCTxx(n, x, incx, a)
Input n Number of elements of vector x to be compared to a. If
n 0, the subprograms do not reference x.
x Array of length lenx = (n1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
a The scalar a.
Chapter 2 Basic Vector Operations 47
ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements
Output i If n ≤ 0, then i = 0. Otherwise, i is the number of
elements of x that satisfy the relationship with a specified by the subprogram name.
Fortran Equivalent
INTEGER*4 FUNCTION ISCTEQ (N,X,INCX,A) REAL*4 A,X(*) ISCTEQ = 0 INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N IF ( X(IX) .EQ. A ) ISCTEQ = ISCTEQ + 1 IX = IX + INCXA 10 CONTINUE RETURN END
Example Count the number of positive elements of a REAL*8 vector x, where x is a
vector 10 elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDCTGT,N,INCX REAL*8 A,X(20) N = 10 INCX = 1 A = 0.0D0 I = IDCTGT (N,X,INCX,A)
48 HP MLIB User’s Guide
Index of maximum element of vector ISMAX/IDMAX/IIMAX

Name ISMAX/IDMAX/IIMAX

Index of maximum element of vector
Purpose Given a real or integer vector x of length n, these subprograms determine the
index of the maximum element of the vector. Specifically, the subprograms determine the smallest index i such that
ximax xj: j 12n,, ,=
=
The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 i, ISMAX, n, incx REAL*4 x(lenx)
i = ISMAX(n, x, incx)
INTEGER*4 i, IDMAX, n, incx REAL*8 x(lenx)
i = IDMAX(n, x, incx)
INTEGER*4 i, IIMAX, n, incx, x(lenx) i = IIMAX(n, x, incx)
VECLIB8:
INTEGER*8 i, ISMAX, n, incx REAL*4 x(lenx)
i = ISMAX(n, x, incx)
INTEGER*8 i, IDMAX, n, incx REAL*8 x(lenx)
i = IDMAX(n, x, incx)
()
INTEGER*8 i, IIMAX, n, incx, x(lenx) i = IIMAX(n, x, incx)
Input n Number of elements of vector x to be used. If n 0, the
subprograms do not reference x.
x Array of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Chapter 2 Basic Vector Operations 49
ISMAX/IDMAX/IIMAX Index of maximum element of vector
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the
maximum element of x.
Fortran Equivalent
INTEGER*4 FUNCTION ISMAX (N,X,INCX) REAL*4 X(*),XMAX ISMAX = 1 IF ( N .GT. 1 ) THEN XMAX = X(1) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N IF ( X(IX) .GT. XMAX ) THEN ISMAX = I XMAX = X(IX) END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISMAX = 0 END IF RETURN END
Example Locate the largest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDMAX,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDMAX (N,X,INCX)
50 HP MLIB User’s Guide
Index of minimum element of vector ISMIN/IDMIN/IIMIN

Name ISMIN/IDMIN/IIMIN

Index of minimum element of vector
Purpose Given a real or integer vector x of length n, these subprograms determine the
index of minimum element of the vector. Specifically, the subprograms determine the smallest index i such that
ximin xj: j 12… n,, ,=()=
The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 i, ISMIN, n, incx REAL*4 x(lenx)
i = ISMIN(n, x, incx)
INTEGER*4 i, IDMIN, n, incx REAL*8 x(lenx)
i = IDMIN(n, x, incx)
INTEGER*4 i, IIMIN, n, incx, x(lenx) i = IIMIN(n, x, incx)
VECLIB8:
INTEGER*8 i, ISMIN, n, incx REAL*4 x(lenx)
i = ISMIN(n, x, incx)
INTEGER*8 i, ISMIN, n, incx REAL*8 x(lenx)
i = ISMIN(n, x, incx)
INTEGER*8 i, IIMIN, n, incx, x(lenx) i = IIMIN(n, x, incx)
Input n Number of elements of vector x to be used. If n 0, the
subprograms do not reference x.
x Array of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Chapter 2 Basic Vector Operations 51
ISMIN/IDMIN/IIMIN Index of minimum element of vector
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the
minimum element of x.
Fortran Equivalent
INTEGER*4 FUNCTION ISMIN (N,X,INCX) REAL*4 X(*),XMIN ISMIN = 1 IF ( N .GT. 1 ) THEN XMIN = X(1) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N IF ( X(IX) .LT. XMIN ) THEN ISMIN = I XMIN = X(IX) END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISMIN = 0 END IF RETURN END
Example Locate the smallest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDMIN,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDMIN (N,X,INCX)
52 HP MLIB User’s Guide
Search vector for element ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx

Name ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx

Search vector for element
Purpose Given a real, integer, or complex vector x of length n, these subprograms search
sequentially through the vector for the first element xi that satisfies a specified relationship to a given scalar a and return the index i of that element.
The last two characters of the subprogram name specify the relationship of interest between the element of the vector and the scalar. For real and integer subprograms, these characters, represented by “xx” in the prototype Fortran statements, and the corresponding function values can be:
xx Function value
EQ GE GT LE
LT
NE
min{i : x min{i : x min{i : x min{i : x min{i : x min{i : x
For complex subprograms, these characters and corresponding function values are:
i
i
i
i
i
i
= a} a} > a} a} < a} a}
The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 i, ISSVxx, n, incx REAL*4 a, x(lenx)
i = ISSVxx(n, x, incx, a)
INTEGER*4 i, IDSVxx, n, incx REAL*8 a, x(lenx)
i = IDSVxx(n, x, incx, a)
xx Function value
EQ NE
min{i : x
min{i : x
i
i
= a}
a}
Chapter 2 Basic Vector Operations 53
ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx Search vector for element
INTEGER*4 i, IISVxx, n, incx, a, x(lenx) i = IISVxx(n, x, incx, a)
INTEGER*4 i, ICSVxx, n, incx COMPLEX*8 a, x(lenx)
i = ICSVxx(n, x, incx, a)
INTEGER*4 i, IZSVxx, n, incx COMPLEX*16 a, x(lenx)
i = IZSVxx(n, x, incx, a)
VECLIB8:
INTEGER*8 i, ISSVxx, n, incx REAL*4 a, x(lenx)
i = ISSVxx(n, x, incx, a)
INTEGER*8 i, IDSVxx, n, incx REAL*8 a, x(lenx)
i = IDSVxx(n, x, incx, a)
INTEGER*8 i, IISVxx, n, incx, a, x(lenx) i = IISVxx(n, x, incx, a)
INTEGER*8 i, ICSVxx, n, incx COMPLEX*8 a, x(lenx)
i = ICSVxx(n, x, incx, a)
INTEGER*8 i, IZSVxx, n, incx COMPLEX*16 a, x(lenx)
i = IZSVxx(n, x, incx, a)
Input n Number of elements of vector x to be compared to a. If
n 0, the subprograms do not reference x.
x Array of length lenx = (n1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
a The scalar a.
54 HP MLIB User’s Guide
Search vector for element ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx
Output i If n ≤ 0 or if no element of x satisfies the relationship
with a specified by the subprogram name, then i = 0. Otherwise, i is the index i of the first element xi of x that satisfies the relationship with a specified by the subprogram name. Recall that xi is stored in
x((i1)×|incx|+1).
Fortran Equivalent
INTEGER*4 FUNCTION IISVEQ (N, X,INCX, A) INTEGER*4 X(*),A IISVEQ = 0 INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N IF ( X(IX) .EQ. A ) THEN IISVEQ = I RETURN END IF IX = IX + INCXA 10 CONTINUE RETURN END
Example Search for the first positive element of a REAL*8 vector x, where x is a vector
10 elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDSVGT,N,INCX REAL*8 A,X(20) N = 10 INCX = 1 A = 0.0D0 I = IDSVGT (N,X,INCX,A)
Chapter 2 Basic Vector Operations 55
SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes

Name SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX

Maximum of magnitudes
Purpose Given a real or integer vector x of length n, SAMAX, DAMAX, or IAMAX
computes the l∞ norm of x, that is, the maximum of the magnitudes of the elements of the vector
sx
Given a complex vector x of length n, SCAMAX or DZAMAX computes
s max Re xi() Im xi()+:i 12… n,, ,=().=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual definition of the maximum of magnitudes of a complex vector is
tx∞max Re xi()2Im xi()
s is computed instead of t because, with its lack of square roots, it is faster to
compute. Because , s is often an acceptable substitute for t. The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 n, incx REAL*4 s, SAMAX, x(lenx)
s = SAMAX(n, x, incx)
INTEGER*4 n, incx REAL*8 s, DAMAX, x(lenx)
s = DAMAX(n, x, incx)
max xi: i 12n,, ,=().==
  
+{}
t s 2t
12
2
: i 12n,, ,=
.==
56 HP MLIB User’s Guide
INTEGER*4 n, incx, s, IAMAX, x(lenx) s = IAMAX(n, x, incx)
INTEGER*4 n, incx REAL*4 s, SCAMAX COMPLEX*8 x(lenx)
s = SCAMAX(n, x, incx)
Maximum of magnitudes SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX
INTEGER*4 n, incx REAL*8 s, DZAMAX COMPLEX*16 x(lenx)
s = DZAMAX(n, x, incx)
VECLIB8:
INTEGER*8 n, incx REAL*4 s, SAMAX, x(lenx)
s = SAMAX(n, x, incx)
INTEGER*8 n, incx REAL*8 s, DAMAX, x(lenx)
s = DAMAX(n, x, incx)
INTEGER*8 n, incx, s, IAMAX, x(lenx) s = IAMAX(n, x, incx)
INTEGER*8 n, incx REAL*4 s, SCAMAX COMPLEX*8 x(lenx)
s = SCAMAX(n, x, incx)
INTEGER*8 n, incx REAL*8 s, DZAMAX COMPLEX*16 x(lenx)
s = DZAMAX(n, x, incx)
Input n Number of elements of vector x to be used. If n 0, the
subprograms do not reference x.
x Array of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
Output s If n ≤ 0, then s = 0. Otherwise, s is the maximum of the
magnitudes of the elements of x.
Chapter 2 Basic Vector Operations 57
SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes
Fortran Equivalent
REAL*4 FUNCTION SAMAX (N,X,INCX) REAL*4 X(*) SAMAX = 0.0 INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N SAMAX = MAX ( SAMAX , ABS ( X(IX) ) ) IX = IX + INCXA 10 CONTINUE RETURN END
Example Compute the maximum of the magnitudes of the elements of a REAL*8 vector
x, where x is a vector 10 elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 N,INCX REAL*8 S,DAMAX,X(20) N = 10 INCX = 1 S = DAMAX (N,X,INCX)
58 HP MLIB User’s Guide
Minimum of magnitudes SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN

Name SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN

Minimum of magnitudes
Purpose Given a real or integer vector x of length n, SAMIN, DAMIN, or IAMIN
computes the minimum of the magnitudes of the elements of the vector
s min xi: i 12… n,, ,=().=
Given a complex vector x of length n, SCAMIN or DZAMIN computes
s min Re xi() Im xi(): i+12… n,, ,=()=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual definition of the minimum of magnitudes of a complex vector is
12
t min Re xi()2Im xi()
+{}
s is computed instead of t because, with its lack of square roots, it is faster to compute. Because , s is often an acceptable substitute for t.
The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
2
: i 12… n,, ,=().=
t s 2t
Usage VECLIB:
INTEGER*4 n, incx REAL*4 s, SAMIN, x(lenx)
s = SAMIN(n, x, incx)
INTEGER*4 n, incx REAL*8 s, DAMIN, x(lenx)
s = DAMIN(n, x, incx)
INTEGER*4 n, incx, s, IAMIN, x(lenx) s = IAMIN(n, x, incx)
INTEGER*4 n, incx REAL*4 s, SCAMIN COMPLEX*8 x(lenx)
s = SCAMIN(n, x, incx)
INTEGER*4 n, incx REAL*8 s, DZAMIN COMPLEX*16 x(lenx)
s = DZAMIN(n, x, incx)
VECLIB8:
Chapter 2 Basic Vector Operations 59
SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN Minimum of magnitudes
INTEGER*8 n, incx REAL*4 s, SAMIN, x(lenx)
s = SAMIN(n, x, incx)
INTEGER*8 n, incx REAL*8 s, DAMIN, x(lenx)
s = DAMIN(n, x, incx)
INTEGER*8 n, incx, s, IAMIN, x(lenx) s = IAMIN(n, x, incx)
INTEGER*8 n, incx REAL*4 s, SCAMIN COMPLEX*8 x(lenx)
s = SCAMIN(n, x, incx)
INTEGER*8 n, incx REAL*8 s, DZAMIN COMPLEX*16 x(lenx)
s = DZAMIN(n, x, incx)
60 HP MLIB User’s Guide
Minimum of magnitudes SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN
Input n Number of elements of vector x to be used. If n ≤ 0, the
subprograms do not reference x.
x Array of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incx Increment for array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
Output s If n ≤ 0, then s = , the largest representable machine
number. Otherwise, s is the minimum of the magnitudes of the elements of x.
Fortran Equivalent
REAL*4 FUNCTION SAMIN (N,X,INCX) REAL*4 X(*) SAMIN = INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N SAMIN = MIN ( SAMIN , ABS ( X(IX) ) ) IX = IX + INCXA 10 CONTINUE RETURN END
Example Compute the minimum of the magnitudes of the elements of a REAL*8 vector
x, where x is a vector 10 elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 N,INCX REAL*8 S,DAMIN,X(20) N = 10 INCX = 1 S = DAMIN (N,X,INCX)
Chapter 2 Basic Vector Operations 61
SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes

Name SASUM/DASUM/IASUM/SCASUM/DZASUM

Sum of magnitudes
Purpose Given a real or integer vector x of length n, SASUM, DASUM, or IASUM
computes the l1 norm of x, that is, the sum of magnitudes of the elements of the vector
n
sx
==
1
Given a complex vector x of length n, SCASUM or DZASUM computes
n
s
=
i 1=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively.
The usual definition of sum of magnitudes of a complex vector is
i 1=
Re x
()
x
i
+
Im x
()
i
i
tX
==
2
s is computed instead of t because, with its lack of square roots, it is faster to compute. Because , s is often an acceptable substitute for t.
The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
Usage VECLIB:
INTEGER*4 n, incx REAL*4 s, SASUM, x(lenx)
s = SASUM(n, x, incx)
INTEGER*4 n, incx REAL*8 s, DASUM, x(lenx)
s = DASUM(n, x, incx)
INTEGER*4 n, incx, s, IASUM, x(lenx) s = IASUM(n, x, incx)
n
 
Re xi()2Im xi()

i 1=
+
t s 2t
12
2
62 HP MLIB User’s Guide
Sum of magnitudes SASUM/DASUM/IASUM/SCASUM/DZASUM
INTEGER*4 n, incx REAL*4 s, SCASUM COMPLEX*8 x(lenx)
s = SCASUM(n, x, incx)
INTEGER*4 n, incx REAL*8 s, DZASUM COMPLEX*16 x(lenx)
s = DZASUM(n, x, incx)
VECLIB8:
INTEGER*8 n, incx REAL*4 s, SASUM, x(lenx)
s = SASUM(n, x, incx)
INTEGER*8 n, incx REAL*8 s, DASUM, x(lenx)
s = DASUM(n, x, incx)
INTEGER*8 n, incx, s, IASUM, x(lenx) s = IASUM(n, x, incx)
INTEGER*8 n, incx REAL*4 s, SCASUM COMPLEX*8 x(lenx)
s = SCASUM(n, x, incx)
INTEGER*8 n, incx REAL*8 s, DZASUM COMPLEX*16 x(lenx)
s = DZASUM(n, x, incx)
Input n Number of elements of vector x to be used in the sum of
magnitudes. If n 0, the subprograms do not reference
x.
x Array of length lenx = (n1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS
Chapter 2 Basic Vector Operations 63
SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes
Indexing Conventions” in the introduction to this chapter.
Output s If n ≤ 0, then s = 0. Otherwise, s is the sum of
magnitudes of the elements of x.
Fortran Equivalent
REAL*4 FUNCTION SASUM (N, X,INCX) REAL*4 X(*) SASUM = 0.0 IF ( N .LE. 0 ) RETURN IX = 1 INCXA = ABS ( INCX ) DO 10 I = 1, N SASUM = SASUM + ABS ( X(IX) ) IX = IX + INCXA 10 CONTINUE RETURN END
Example Compute the sum of magnitudes of the elements of a REAL*8 vector x, where x
is a vector 10 elements long stored in a one-dimensional array X of dimension
20.
INTEGER*4 N,INCX REAL*8 S,DASUM,X(20) N = 10 INCX = 1 S = DASUM (N,X,INCX)
64 HP MLIB User’s Guide
Elementary vector operation SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC

Name SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC

Elementary vector operation
Purpose Given a real or complex scalar a and real or complex vectors x and y of length n,
these subprograms perform the elementary vector operations
yax y+
and
y ax+y
where is the complex conjugate of x. The vectors can be stored in
x
one-dimensional arrays or in either rows or columns of two-dimensional arrays, and the indexing through the arrays can be either forward or backward.
Usage VECLIB:
INTEGER*4 n, incx, incy REAL*4 a, x(lenx), y(leny)
CALL SAXPY(n, a, x, incx, y, incy)
INTEGER*4 n, incx, incy REAL*8 a, x(lenx), y(leny)
CALL DAXPY(n, a, x, incx, y, incy)
INTEGER*4 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)
CALL CAXPY(n, a, x, incx, y, incy)
INTEGER*4 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)
CALL CAXPYC(n, a, x, incx, y, incy)
INTEGER*4 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)
CALL ZAXPY(n, a, x, incx, y, incy)
INTEGER*4 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)
CALL ZAXPYC(n, a, x, incx, y, incy)
VECLIB8:
INTEGER*8 n, incx, incy REAL*4 a, x(lenx), y(leny)
CALL SAXPY(n, a, x, incx, y, incy)
INTEGER*8 n, incx, incy REAL*8 a, x(lenx), y(leny)
CALL DAXPY(n, a, x, incx, y, incy)
Chapter 2 Basic Vector Operations 65
SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC Elementary vector operation
INTEGER*8 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)
CALL CAXPY(n, a, x, incx, y, incy)
INTEGER*8 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)
CALL CAXPYC(n, a, x, incx, y, incy)
INTEGER*8 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)
CALL ZAXPY(n, a, x, incx, y, incy)
INTEGER*8 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)
CALL ZAXPYC(n, a, x, incx, y, incy)
Input n Number of elements of vectors x and y to be used in the
elementary vector operation. If n 0, the subprograms do not reference x or y.
a The scalar a. x Array of length lenx = (n1)×|incx|+1 containing the
n-vector x. x is used in conjugated form by CAXPYC and ZAXPYC and in unconjugated form by the other subprograms.
incx Increment for the array x:
incx 0 x is stored forward in array x; that is,
xi is stored in x((i1)×incx+1).
66 HP MLIB User’s Guide
incx < 0 x is stored backward in array x; that
is, xi is stored in x((in)×incx+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
y Array of length leny = (n−1)×|incy|+1 containing the
n-vector y.
incy Increment for the array y, incy 0:
incy > 0 y is stored forward in array y; that is,
yi is stored in y((i1)×incy+1).
incy < 0 y is stored backward in array y; that
is, yi is stored in y((in)×incy+1).
Elementary vector operation SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC
Use incy = 1 if the vector y is stored contiguously in array y; that is, if yi is stored in y(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
Output y If n ≤ 0 or a = 0, then y is unchanged. Otherwise, ax+y
overwrites the input.
Notes If incx = 0, then xi= x(1) for all i.
The result is unspecified if incy = 0 or if x and y overlap such that any element of x shares a memory location with any element of y.
Fortran Equivalent
SUBROUTINE SAXPY (N, A, X,INCX, Y,INCY) REAL*4 X(*),Y(*) IF ( N .LE. 0 ) RETURN IF ( A .EQ. 0.0 ) RETURN IX = 1 IY = 1 IF ( INCX .LT. 0 ) IX = 1 - (N-1) * INCX IF ( INCY .LT. 0 ) IY = 1 - (N-1) * INCY DO 10 I = 1, N Y(IY) = A * X(IX) + Y(IY) IX = IX + INCX IY = IY + INCY 10 CONTINUE RETURN END
Example 1 Compute the REAL*8 elementary vector operation
y 2x y,+
where x and y are vectors 10 elements long stored in one-dimensional arrays X and Y of dimension 20.
INTEGER*4 N,INCX,INCY REAL*8 A,X(20),Y(20) N = 10 A = 2.0D0 INCX = 1 INCY = 1 CALL DAXPY (N,A,X,INCX,Y,INCY)
Example 2 Subtract 3 times the 4th row of a 10-by-10 matrix from the 5th row. The matrix
is stored in a two-dimensional array B of dimension 20-by-21.
INTEGER*4 N REAL*8 A,B(20,21) N = 10 A = -3.0D0 CALL DAXPY (N,A,B(4,1),20,B(5,1),20)
Chapter 2 Basic Vector Operations 67
SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation

Name SAXPYI/DAXPYI/CAXPYI/ZAXPYI

Sparse elementary vector operation
Purpose Given a real or complex scalar a, a sparse vector x stored in compact form via a
set of indices and a dense vector y stored in full storage form, these subprograms perform the elementary vector operation
yax y.+
More precisely, let x be a sparse n-vector with m n interesting (usually nonzero) elements, and let {k1, k2, ..., km} be the indices of these elements. All uninteresting elements of x are assumed to be zero. Let y be an ordinary n-vector. If x is represented by arrays x and indx such that indx(i)=ki and
x(i)=x
y
k
i
Usage VECLIB:
then these subprograms compute
k
i
axiy
,+ i 12… m.,, ,=
k
i
INTEGER*4 m, indx(m) REAL*4 a, x(m), y(n)
CALL SAXPYI(m, a, x, indx, y)
INTEGER*4 m, indx(m) REAL*8 a, x(m), y(n)
CALL DAXPYI(m, a, x, indx, y)
VECLIB8:
68 HP MLIB User’s Guide
INTEGER*4 m, indx(m) COMPLEX*8 a, x(m), y(n)
CALL CAXPYI(m, a, x, indx, y)
INTEGER*4 m, indx(m) COMPLEX*16 a, x(m), y(n)
CALL ZAXPYI(m, a, x, indx, y)
INTEGER*8 m, indx(m) REAL*4 a, x(m), y(n)
CALL SAXPYI(m, a, x, indx, y)
INTEGER*8 m, indx(m) REAL*8 a, x(m), y(n)
CALL DAXPYI(m, a, x, indx, y)
Sparse elementary vector operation SAXPYI/DAXPYI/CAXPYI/ZAXPYI
INTEGER*8 m, indx(m) COMPLEX*8 a, x(m), y(n)
CALL CAXPYI(m, a, x, indx, y)
INTEGER*8 m, indx(m) COMPLEX*16 a, x(m), y(n)
CALL ZAXPYI(m, a, x, indx, y)
Input m Number of interesting elements of x, m n, where n is
the length of y. If m 0, the subprograms do not reference x, indx, or y.
a The scalar a. x Array of length m containing the interesting elements
of x. x(j)=xi if indx(j)=i.
Chapter 2 Basic Vector Operations 69
SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation
indx Array containing the indices {k
elements of x. The indices must satisfy
} of the interesting
i
1 indx i() n≤≤i 12m,, ,=
and
indx i() indx j() 1 i j m,
where n is the length of y.
y Array containing the elements of y, y(i)=y
.
i
Output y If m ≤ 0 or a = 0, then y is unchanged. Otherwise, ax+y
overwrites the input. Only the elements of y whose indices are included in indx are changed.
Notes The result is unspecified if any element of indx is out of range, if any two
elements of indx have the same value, or if x, indx, and y overlap such that any element of x or any index shares a memory location with any element of y.
Fortran Equivalent
SUBROUTINE SAXPYI (M, A, X,INDX, Y) REAL*4 A,X(*),Y(*) INTEGER*4 INDX(*) IF ( M .LE. 0 ) RETURN IF ( A .EQ. 0.0 ) RETURN DO 10 I = 1, M Y(INDX(I)) = A * X(I) + Y(INDX(I)) 10 CONTINUE RETURN END
Example Compute the REAL*8 elementary vector operation
y 2x y,+
where x is a sparse vector with interesting elements x1, x4, x5, and x9 stored in one-dimensional array X, and y is stored in a one-dimensional array Y of
dimension 20.
INTEGER*4 M,INDX(4) REAL*8 A,X(4),Y(20) DATA INDX / 1, 4, 5, 9 / M = 4 A = 2.0D0 CALL DAXPYI (M,A,X,INDX,Y)
70 HP MLIB User’s Guide
Two sided vector clip SCLIP/DCLIP/ICLIP

Name SCLIP/DCLIP/ICLIP

Two sided vector clip
Purpose Given scalars a and b and a vector x of length n, these subprograms form the
vector y by the clip operation
a if xia
 
x
=
y
i
b if bx
The vectors can be stored in one-dimensional arrays or in either rows or columns of two-dimensional arrays. Indexing through the arrays can be either forward or backward.
Usage VECLIB:
if axib<< i 12n,, ,=
i
i
INTEGER*4 n, incx, incy REAL*4 a, b, x(lenx), y(leny)
CALL SCLIP(n, a, b, x, incx, y, incy)
INTEGER*4 n, incx, incy REAL*8 a, b, x(lenx), y(leny)
CALL DCLIP(n, a, b, x, incx, y, incy)
INTEGER*4 n, incx, incy, a, b, x(lenx), y(leny) CALL ICLIP(n, a, b, x, incx, y, incy)
VECLIB8:
INTEGER*8 n, incx, incy REAL*4 a, b, x(lenx), y(leny)
CALL SCLIP(n, a, b, x, incx, y, incy)
INTEGER*8 n, incx, incy REAL*8 a, b, x(lenx), y(leny)
CALL DCLIP(n, a, b, x, incx, y, incy)
INTEGER*8 n, incx, incy, a, b, x(lenx), y(leny) CALL ICLIP(n, a, b, x, incx, y, incy)
Input n Number of elements of vectors x and y to be used. If
n 0
, the subprograms do not reference x or y.
a The scalar a. b The scalar b.
Chapter 2 Basic Vector Operations 71
SCLIP/DCLIP/ICLIP Two sided vector clip
x Array of length lenx = (n1)×|incx|+1 containing the
n-vector x.
incx Increment for the array x:
incx 0 x is stored forward in array x; that is,
xi is stored in x((i1)×incx+1).
incx < 0 x is stored backward in array x; that
is, xi is stored in x((in)×incx+1).
Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.
72 HP MLIB User’s Guide
Loading...