HP HP-UX Performance Tools User Manual

HP MLIB User’s Guide
Seventh Edition
HP Part No. B6061-96027 and B6061-96028
HP MLIB
December 2004
Printed in USA
Edition: Seventh
Document Numbers: B6061-96027 and B6061-96028 Remarks: Released Dcember 2004 with HP MLIB software version9.0.
User’s Guide split into two volumes with this release.
Edition: Sixth
Document Number: B6061-96023 Remarks: Released September 2003 with HP MLIB software version 8.50.
Edition: Fifth
Document Number: B6061-96020 Remarks: Released September 2002 with HP MLIB software version 8.30.
Edition: Fourth
Document Number: B6061-96017 Remarks: Released June 2002 with HP MLIB software version 8.20.
Edition: Third
Document Number: B6061-96015 Remarks: Released September 2001 with HP MLIB software version 8.10.
Edition: Second
Document Number: B6061-96012 Remarks: Released June 2001 with HP MLIB software version 8.00.
Edition: First
Document Number: B6061-96010 Remarks: Released December 1999 with HP MLIB software version
B.07.00. This HP MLIB User’s Guide contains both HP VECLIB and HP LAPACK information. This document supercedes both the HP MLIB VECLIB User’s Guide, fifth edition (B6061-96006) and the HP MLIB LAPACK User’s Guide, sixth edition (B6061-96005).
Notice
Copyright 1979-2004 Hewlett-Packard Development Company. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws.
The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material,
including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance or use of this material.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
LAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
ScaLAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
SuperLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
SOLVERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
VMATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Purpose and audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
Documentation resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

Part 1

1 Introduction to VECLIB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Accessing VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Compiling and linking (VECLIB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Problem with +ppu compatibility and duplicated symbols . . . . . . . . . . . . . . . . . . . . 14
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Linking for parallel or non parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Controlling VECLIB parallelism at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Performance benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
OpenMP-based nested parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
v
Message passing-based nested parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Default CPS library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Default Pthread library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . 22
Roundoff effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Data types and precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
VECLIB naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Data type and byte length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Operator arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Low-level subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
High-level subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
HP MLIB man pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 Basic Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
What you need to know to use vector subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
BLAS storage conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
BLAS indexing conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Operator arguments in the BLAS Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Representation of a permutation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Representation of a Householder matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Subprograms for basic vector operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes 40
ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN Index of minimum of magnitudes . . 43
ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements. . . . . . . . 46
ISMAX/IDMAX/IIMAX Index of maximum element of vector . . . . . . . . . . . . . . . . . 49
ISMIN/IDMIN/IIMIN Index of minimum element of vector. . . . . . . . . . . . . . . . . . . 51
ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx Search vector for element . . . . . . . . . . . . 53
SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes . . . . . . . . 56
SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN Minimum of magnitudes . . . . . . . . . . 59
vi Table of Contents
SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes . . . . . . . . . . . . 62
SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC Elementary vector operation 65
SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation . . . . . . . 68
SCLIP/DCLIP/ICLIP Two sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
SCLIPL/DCLIPL/ICLIPL Left sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
SCLIPR/DCLIPR/ICLIPR Right sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . 77
SCOPY/DCOPY/ICOPY/CCOPY/CCOPYC/ZCOPY/ZCOPYC Copy vector . . . . . . . 80

SDOT/DDOT/CDOTC/CDOTU/ZDOTC/ZDOTU Dot product . . . . . . . . . . . . . . . . . 84

SDOTI/DDOTI/CDOTCI/CDOTUI/ZDOTCI/ZDOTUI Sparse dot product . . . . . . . 88

SFRAC/DFRAC Extract fractional parts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
SGTHR/DGTHR/IGTHR/CGTHR/ZGTHR Gather sparse vector . . . . . . . . . . . . . . 94
SGTHRZ/DGTHRZ/IGTHRZ/CGTHRZ/ZGTHRZ Gather and zero sparse vector . 96
SLSTxx/DLSTxx/ILSTxx/CLSTxx/ZLSTxx List selected vector elements . . . . . . . 99
SMAX/DMAX/IMAX Maximum of vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
SMIN/DMIN/IMIN Minimum of vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
SNRM2/DNRM2/SCNRM2/DZNRM2 Euclidean norm . . . . . . . . . . . . . . . . . . . . . 107
SNRSQ/DNRSQ/SCNRSQ/DZNRSQ Euclidean norm squared . . . . . . . . . . . . . . . 110
SRAMP/DRAMP/IRAMP Generate linear ramp. . . . . . . . . . . . . . . . . . . . . . . . . . . 112

SROT/DROT/CROT/CSROT/ZROT/ZDROT Apply Givens rotation . . . . . . . . . . . 114

SROTG/DROTG/CROTG/ZROTG Construct Givens rotation . . . . . . . . . . . . . . . . 118
SROTI/DROTI Apply sparse Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
SROTM/DROTM Apply modified Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . 123
SROTMG/DROTMG Construct modified Givens rotation . . . . . . . . . . . . . . . . . . . 127
SRSCL/DRSCL/CRSCL/CSRSCL/ZRSCL/ZDRSCL Scale vector. . . . . . . . . . . . . . 130
SSCAL/DSCAL/CSCAL/CSSCAL/CSCALC/ZSCAL/ZDSCAL/ZSCALC Scale vector . .
133
SSCTR/DSCTR/ISCTR/CSCTR/ZSCTR Scatter sparse vector. . . . . . . . . . . . . . . . 136
SSUM/DSUM/ISUM/CSUM/ZSUM Vector sum . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
SSWAP/DSWAP/ISWAP/CSWAP/ZSWAP Swap two vectors . . . . . . . . . . . . . . . . . 141

SWDOT/DWDOT/CWDOTC/CWDOTU/ZWDOTC/ZWDOTU Weighted dot product .

145
SZERO/DZERO/IZERO/CZERO/ZZERO Clear vector . . . . . . . . . . . . . . . . . . . . . . 150
F_SAMAX_VAL/F_DAMAX_VAL/F_CAMAX_VAL/F_ZAMAX_VAL Maximum
absolute value and location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
F_SAMIN_VAL/F_DAMIN_VAL/F_CAMIN_VAL/F_ZAMIN_VAL Minimum absolute
value and location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
F_SAPPLY_GROT/F_DAPPLY_GROT/F_CAPPLY_GROT/F_ZAPPLY_GROT Apply
Table of Contents v i i
plane rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
F_SAXPBY/F_DAXPBY/F_CAXPBY/F_ZAXPBY Scaled vector accumulation . . . 161 F_SAXPY_DOT/F_DAXPY_DOT/F_CAXPY_DOT/F_ZAXPY_DOT Combine AXPY
and DOT routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
F_SCOPY/F_DCOPY/F_CCOPY/F_ZCOPY Copy vector. . . . . . . . . . . . . . . . . . . . . 167
F_SDOT/F_DDOT/F_CDOT/F_ZDOT Add scaled dot product . . . . . . . . . . . . . . . . 169
F_SFPINFO/F_DFPINFO Environmental inquiry . . . . . . . . . . . . . . . . . . . . . . . . . 172
F_SGEN_GROT/F_DGEN_GROT/F_CGEN_GROT/F_ZGEN_GROT Generate
Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
F_SGEN_HOUSE/F_DGEN_HOUSE/F_CGEN_HOUSE/F_ZGEN_HOUSE Generate
Householder transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
F_SGEN_JROT/F_DGEN_JROT/F_CGEN_JROT/F_ZGEN_JROT Generate Jacobi
rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
F_SMAX_VAL/F_DMAX_VAL Maximum value and location. . . . . . . . . . . . . . . . . 181
F_SMIN_VAL/F_DMIN_VAL Minimum value and location. . . . . . . . . . . . . . . . . . 183
F_SNORM/F_DNORM Norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
F_SPERMUTE/F_DPERMUTE/F_CPERMUTE/F_ZPERMUTE Permute vector 187
F_SRSCALE/F_DRSCALE/F_CRSCALE/F_ZRSCALE Reciprocal Scale . . . . . . . 190
F_SSORT/F_DSORT Sort vector entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
F_SSORTV/F_DSORTV Sort vector and return index vector. . . . . . . . . . . . . . . . . 193
F_SSUM/F_DSUM/F_CSUM/F_ZSUM Sum of entries of a vector. . . . . . . . . . . . . 195
F_SSUMSQ/F_DSUMSQ/F_CSUMSQ/F_ZSUMSQ Sum of squares . . . . . . . . . . . 197
F_SSWAP/F_DSWAP/F_CSWAP/F_ZSWAP Interchange vectors . . . . . . . . . . . . . 200
F_SWAXPBY/F_DWAXPBY/F_CWAXPBY/F_ZWAXPBY Scaled vector addition. 202
3 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
What you need to know to use these subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Subroutine naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Operator arguments in the BLAS Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Subprograms for basic matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
SGBMV/DGBMV/CGBMV/ZGBMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 212
SGECPY/DGECPY/CGECPY/ZGECPY Copy general matrix . . . . . . . . . . . . . . . . 219
SGEMM/DGEMM/CGEMM/ZGEMM Matrix-matrix multiply . . . . . . . . . . . . . . . 222
viii Table of Contents
DGEMMS/ZGEMMS Strassen matrix-matrix multiply. . . . . . . . . . . . . . . . . . . . . 227
SGEMV/DGEMV/CGEMV/ZGEMV Matrix-vector multiply . . . . . . . . . . . . . . . . . 232
SGER/DGER/CGERC/CGERU/ZGERC/ZGERU Rank-1 update . . . . . . . . . . . . . . 237
SGETRA/DGETRA/CGETRA/ZGETRA In-place transpose of a general square matrix
241
SSBMV/DSBMV/CHBMV/ZHBMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 244
SSPMV/DSPMV/CHPMV/ZHPMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 249
SSPR/DSPR/CHPR/ZHPR Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
SSPR2/DSPR2/CHPR2/ZHPR2 Rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
SSYMM/DSYMM/CHEMM/CSYMM/ZHEMM/ZSYMM Matrix-matrix multiply 265
SSYMV/DSYMV/CHEMV/ZHEMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 270
SSYR/DSYR/CHER/ZHER Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
SSYR2/DSYR2/CHER2/ZHER2 Rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
SSYR2K/DSYR2K/CHER2K/CSYR2K/ZHER2K/ZSYR2K Rank-2k update . . . . . 284
SSYRK/DSYRK/CHERK/CSYRK/ZHERK/ZSYRK Rank-k update . . . . . . . . . . . . 289
STBMV/DTBMV/CTBMV/ZTBMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 294
STBSV/DTBSV/CTBSV/ZTBSV Solve triangular band system . . . . . . . . . . . . . . . 301
STPMV/DTPMV/CTPMV/ZTPMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 308
STPSV/DTPSV/CTPSV/ZTPSV Solve triangular system . . . . . . . . . . . . . . . . . . . . 313
STRMM/DTRMM/CTRMM/ZTRMM Triangular matrix-matrix multiply . . . . . . 318
STRMV/DTRMV/CTRMV/ZTRMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 323
STRSM/DTRSM/CTRSM/ZTRSM Solve triangular systems . . . . . . . . . . . . . . . . . 327
STRSV/DTRSV/CTRSV/ZTRSV Solve triangular system . . . . . . . . . . . . . . . . . . . 332
XERBLA Error handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
F_CHBMV/F_ZHBMV Hermitian banded matrix-vector multiply . . . . . . . . . . . . 340
F_CHEMV/F_ZHEMV Hermitian matrix-vector multiply. . . . . . . . . . . . . . . . . . . 342
F_CHER/F_ZHER Hermitian rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
F_CHER2/F_ZHER2 Hermitian rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
F_CHPMV/F_ZHPMV Hermitian packed matrix-vector multiply. . . . . . . . . . . . . 348
F_CHPR/F_ZHPR Hermitian rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
F_CHPR2/F_ZHPR2 Hermitian rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
F_SFPINFO/F_DFPINFO Environmental inquiry. . . . . . . . . . . . . . . . . . . . . . . . . 354
F_SGBMV/F_DGBMV/F_CGBMV/F_ZGBMV General band matrix-vector multiply .
355
F_SGE_COPY/F_DGE_COPY/F_CGE_COPY/F_ZGE_COPY Matrix copy. . . . . . 358
F_SGE_TRANS/F_DGE_TRANS/F_CGE_TRANS/F_ZGE_TRANS Matrix
transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Table of Contents i x
F_SGEMM/F_DGEMM/F_CGEMM/F_ZGEMM General matrix-matrix multiply 362 F_SGEMV/F_DGEMV/F_CGEMV/F_ZGEMV General matrix-vector multiply . . 365 F_SGEMVER/F_DGEMVER/F_CGEMVER/F_ZGEMVER Multiple matrix-vector
multiply, rank 2 update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
F_SGEMVT/F_DGEMVT/F_CGEMVT/F_ZGEMVT Multiple matrix-vector multiply
372
F_SGER/F_DGER/F_CGER/F_ZGER General rank-1 update . . . . . . . . . . . . . . . . 375
F_SSBMV/F_DSBMV/F_CSBMV/F_ZSBMV Symmetric band matrix-vector multiply
378 F_SSPMV/F_DSPMV/F_CSPMV/F_ZSPMV Symmetric packed matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
F_SSPR/F_DSPR/F_CSPR/F_ZSPR Symmetric packed rank-1 update. . . . . . . . . 384
F_SSPR2/F_DSPR2/F_CSPR2/F_ZSPR2 Symmetric rank-2 update . . . . . . . . . . . 386
F_SSYMV/F_DSYMV/F_CSYMV/F_ZSYMV Symmetric matrix-vector multiply . 389
F_SSYR/F_DSYR/F_CSYR/F_ZSYR Symmetric rank-1 update . . . . . . . . . . . . . . . 392
F_SSYR2/F_DSYR2/F_CSYR2/F_ZSYR2 Symmetric rank-2 update . . . . . . . . . . . 394
F_STBMV/F_DTBMV/F_CTBMV/F_ZTBMV Triangular banded matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
F_STBSV/F_DTBSV/F_CTBSV/F_ZTBSV Triangular banded solve . . . . . . . . . . . 400
F_STPMV/F_DTPMV/F_CTPMV/F_ZTPMV Triangular packed matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
F_STPSV/F_DTPSV/F_CTPSV/F_ZTPSV Triangular packed solve. . . . . . . . . . . . 406
F_STRMV/F_DTRMV/F_CTRMV/F_ZTRMV Triangular matrix-vector multiply 408 F_STRMVT/F_DTRMVT/F_CTRMVT/F_ZTRMVT Multiple triangular matrix-vector
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
F_STRSM/F_DTRSM/F_CTRSM/F_ZTRSM Triangular solve . . . . . . . . . . . . . . . . 414
F_STRSV/F_DTRSV/F_CTRSV/F_ZTRSV Triangular solve . . . . . . . . . . . . . . . . . 417
4 Sparse BLAS Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
What you need to know to use these subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Subroutine naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Sparse matrix storage formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Operator arguments in the Sparse BLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
x Table of Contents
Common arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
SM arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Order of arguments for args(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
SBCOMM/DBCOMM/CBCOMM/ZBCOMM Block coordinate matrix-matrix multiply
441 SBDIMM/DBDIMM/CBDIMM/ZBDIMM Block diagonal matrix-matrix multiply 445 SBDISM/DBDISM/CBDISM/ZBDISM Block diagonal format triangular solve . . 449 SBELMM/DBELMM/CBELMM/ZBELMM Block Ellpack matrix-matrix multiply . . .
453 SBELSM/DBELSM/CBELSM/ZBELSM Block Ellpack format triangular solve . 457 SBSCMM/DBSCMM/CBSCMM/ZBSCMM Block sparse column matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
SBSCSM/DBSCSM/CBSCSM/ZBSCSM Block sparse column format triangular solve
465 SBSRMM/DBSRMM/CBSRMM/ZBSRMM Block sparse row matrix-matrix multiply
469 SBSRSM/DBSRSM/CBSRSM/ZBSRSM Block sparse row format triangular solve . . .
473 SCOOMM/DCOOMM/CCOOMM/ZCOOMM Coordinate matrix-matrix multiply 477 SCSCMM/DCSCMM/CCSCMM/ZCSCMM Compressed sparse column matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
SCSCSM/DCSCSM/CCSCSM/ZCSCSM Compressed sparse column format triangular
solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
SCSRMM/DCSRMM/CCSRMM/ZCSRMM Compressed sparse row matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
SCSRSM/DCSRSM/CCSRSM/ZCSRSM Compressed sparse row format triangular
solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
SDIAMM/DDIAMM/CDIAMM/ZDIAMM Diagonal matrix-matrix multiply . . . . 497
SDIASM/DDIASM/CDIASM/ZDIASM Diagonal format triangular solve. . . . . . . 501
SELLMM/DELLMM/CELLMM/ZELLMM Ellpack matrix-matrix multiply . . . . 505
SELLSM/DELLSM/CELLSM/ZELLSM Ellpack format triangular solve. . . . . . . 509
SJADMM/DJADMM/CJADMM/ZJADMM Jagged diagonal matrix-matrix multiply .
513 SJADSM/DJADSM/CJADSM/ZJADSM Jagged diagonal format triangular solve 517
SSKYMM/DSKYMM/CSKYMM/ZSKYMM Skyline matrix-matrix multiply . . . . 521
SSKYSM/DSKYSM/CSKYSM/ZSKYSM Skyline format triangular solve . . . . . . 525
SVBRMM/DVBRMM/CVBRMM/ZVBRMM Variable block row matrix-matrix
multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Table of Contents x i
SVBRSM/DVBRSM/CVBRSM/ZVBRSM Variable block row format triangular solve
533
xii Table of Contents
Figures
List of Figures xiii
xiv List of Figures
Tables
Table 1-1 VECLIB and VECLIB8 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Table 1-2 Compiler Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Table 1-3 VECLIB Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Table 1-4 BLAS Standard Operator Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Table 2-1 FPINFO return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Table 3-1 Extended BLAS Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . 207
Table 3-2 Extended BLAS Naming Convention—Matrix Form . . . . . . . . . . . . . . . . . . . 208
Table 3-3 Extended BLAS Naming Convention—Computation . . . . . . . . . . . . . . . . . . . 208
Table 3-4 Extended BLAS Naming Convention—Subprogram Names . . . . . . . . . . . . . 209
Table 4-1 Sparse BLAS Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . . . . 423
Table 4-2 Sparse BLAS Naming Convention—Matrix Form . . . . . . . . . . . . . . . . . . . . . . 424
Table 4-3 4 x 5 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Table 4-4 COO Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Table 4-5 CSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Table 4-6 MSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Table 4-7 CSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Table 4-8 MSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Table 4-9 5 x 4 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Table 4-10 DIA Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Table 4-11 ELL Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Table 4-12 JAD Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Table 4-13 JAD Row-Permuted Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Table 4-14 5 x 5 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Table 4-15 SKY Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Table 4-16 4 x 6 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Table 4-17 BCO Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Table 4-18 BSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
List of Tables xiii
Table 4-19 BSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Table 4-20 6 x 6 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Table 4-21 VBR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
xiv List of Tables

VECLIB

VECLIB

Preface

Hewlett-Packard’s high-performance math libraries (HP MLIB) help you speed development of applications and shorten execution time of long-running technical applications.
HP MLIB is a collection of subprograms optimized for use on HP servers and workstations, providing mathematical software and computational kernels for engineering and scientific applications. HP MLIB can be used on systems ranging from single-processor workstations to multiprocessor high-end servers. HP MLIB is optimized for HP PA-RISC 2.0, Itanium 2, and Opteron processors. HP MLIB has six components; VECLIB, LAPACK, ScaLAPACK, SuperLU, SOLVERS, and VMATH.
HP VECLIB contains robust callable subprograms. Together with a subset of the BLAS Standard subroutines, HP MLIB supports the legacy BLAS, a collection of routines for the solution of sparse symmetric systems of equations, a collection of commonly used Fast Fourier Transforms (FFTs), and convolutions. Although VECLIB was designed for use with Fortran programs, C programs can call VECLIB subprograms, as described in Appendix A. Refer to Part 1 of this manual for HP VECLIB information.
Throughout this document there are references to legacy BLAS and BLAS Standard routines. Legacy BLAS routines include Basic Linear Algebra Subprograms (BLAS), that is the level 1, 2, and 3 BLAS, as well as the Sparse BLAS.
A BLAS standardization effort began with a BLAS Technical (BLAST) Forum meeting in November 1995 at the University of Tennessee. The efforts of the BLAST Forum resulted in a BLAS Standard specification in 1999. BLAS Standard routines refer to routines as defined by this BLAS Standard specification. HP MLIB supports a subset of BLAS Standard routines. Refer to Chapter 2, “Basic Vector Operations,” and Chapter 3, “Basic Matrix Operations,” for details about supported subprograms.
Preface xv

LAPACK

LAPACK
HP Linear Algebra Package (LAPACK) is a collection of subprograms that provide mathematical software for applications involving linear equations, least squares, eigenvalue problems, and the singular value decomposition. LAPACK is designed to supersede the linear equation and eigenvalue packages, LINPACK and EISPACK. The National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy supported the development of the public-domain version of LAPACK, from which the HP version was derived.
HP LAPACK fully conforms with public domain version 3.0 of LAPACK in all user-visible usage conventions. Refer to Part 2 of this manual for information specific to HP LAPACK. To supplement the HP specific information provided in Part 2 of this document, refer to the standard LAPACK Users’ Guide. You can access the latest edition of the LAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/lapack/lug/index..html

ScaLAPACK

ScaLAPACK is a library of high-performance linear algebra routines capable of solving systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. ScaLAPACK can also handle many associated computations such as matrix factorizations or estimating condition numbers.
ScaLAPACK is a public domain software that was developed by Oak Ridge National Laboratory. It is designed for distributed computing and uses the Message Passing Interface (MPI) for parallelism. This implementation provides a version of ScaLAPACK tuned on HP servers and built with HP’s MPI. The ScaLAPACK library routines are written in Fortran 77 and are callable from Fortran 90 and C routines. Unlike other MLIB libraries, there is not a version of ScaLAPACK that assumes all integers are 8 bytes in length.
xvi HP MLIB VECLIB User’s Guide
Refer to Part 3 of this manual for information specific to HP ScaLAPACK. To supplement the HP specific information provided in Part 3 of this document, refer to the standard ScaLAPACK Users’ Guide. You can access the latest edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/scalapack/slug/index..html

SuperLU

This implementation provides the Distributed SuperLU library designed for distributed memory parallel computers. It was based on the public-domain SuperLU_DIST, which was developed at the Lawrence Berkeley National Lab and the University of California at Berkeley.
The library contains a set of subroutines to solve a sparse linear system. The library is implemented in ANSI C, using HP Message Passing Interface (MPI) for communication. The library includes routines to handle both real and complex matrices using double precision. The parallel routine names for the double-precision real version start with the letters “pd” (e.g. pdgstrf). The parallel routine names for the double-precision complex version start with letters “pz” (e.g. pzgstrf). Unlike other MLIB libraries, there is not a version of SuperLU that assumes all integers are 8 bytes in length.
The routines can be called directly from C applications. They may also be called from Fortran applications, however, “bridge” routines (in C) must be supplied. For details about creating bridge routines, please refer to Section 2.9.2 in the SuperLU User’s Guide available at:
http://www.nersc.gov/~xiaoye/SuperLU
SuperLU

SOLVERS

Solvers is a collection of direct sparse linear system solvers and graph partitioning routines. Symmetric systems can be solved using SMP parallelism and structurally-symmetric systems can be solved using out-of-core functionality. These routines have been optimized for use on Hewlett-Packard servers. Features are:
Preface xvii

VMATH

• Sparse symmetric and structurally-symmetric linear equation solutions.
• Sparse symmetric ordinary and generalized eigensystem solutions.
• Out-of-core symmetric and structurally-symmetric linear equation and eigensystems solutions.
• Full METIS functionality This implementation provides the METIS Version 4.0.1 library. It is based
on the public-domain METIS, which was developed at the University of Minnesota, Department of Computer Science, and the Army HPC Research Center. The library contains a set of subroutines for graph partitioning, mesh partitioning, and sparse matrix reordering, as well as auxiliary routines. HP MLIB contains the full METIS functionality as that in the public domain METIS, however, the routine names are different. HP MLIB METIS routine names have been prepended with mlib_ to avoid name conflict on applications and libraries that contain their own local version of METIS.
For more information about METIS, please refer to: http://www-users.cs.umn.edu/~karpis/metis/metis/index.html
VMATH
VMATH is a library of vector math routines corresponding to many of the widely used scalar math routines available with C, C++, and Fortran90.
VMATH is intended for computationally intensive mathematical applications amenable to a vector programming style.
VMATH provides two libraries: VMATH, whose interface uses 4-byte integers; and VMATH8, whose interface uses 8-byte integers and is otherwise equivalent to VMATH. VMATH routines come with both Fortran and C interfaces.
For more detailed information on VMATH as well as subprogram specifications, please refer to the VMATH chapter in Part 5 of this book. The VMATH man pages provide a man page for each subprogram.
xviii HP MLIB VECLIB User’s Guide

Purpose and audience

This guide describes the MLIB software library and shows how to use it. This library provides mathematical software and computational kernels for applications.
The HP MLIB User’s Guide addresses experienced programmers who:
• Convert, develop, or optimize programs for use on HP servers and workstations
• Optimize existing software to improve performance and increase productivity
Purpose and audience
Preface xix

Organization

Organization
The HP MLIB User’s Guide describes HP MLIB VECLIB in Part 1, HP MLIB LAPACK in Part 2, HP MLIB ScaLAPACK in Part 3, and HP MLIB Distributed SuperLU in Part 4.
To learn fundamental information necessary for using the VECLIB library, read Chapter 1 and the introductory sections of the other chapters. These sections of background information will help you efficiently use the library subprograms.
To learn more about the subject of any chapter, refer to the literature cited in the “Associated documentation” section of each chapter.
Part 1 of this document is organized as follows:
• Chapter 1 introduces general concepts about VECLIB
• Chapter 2 describes basic vector operations included in VECLIB
• Chapter 3 describes basic matrix operations included in VECLIB
• Chapter 4 describes sparse BLAS operations
• Chapter 5 describes the discrete Fourier transforms in VECLIB
• Chapter 6 describes subprograms to compute convolutions and correlations of data sets
• Chapter 7 describes miscellaneous subprograms to produce random numbers, sort the elements of a vector in ascending or descending order, measure time, allocate dynamic memory, and report errors
Part 2 of this document is organized as follows:
• Chapter 8 describes information specific to Hewlett-Packard’s implementation of LAPACK
• Chapter 9 describes selected LAPACK auxiliary subprograms
Part 3 of this document is organized as follows:
• Chapter 10 describes ScaLAPACK functionality
Part 4 of this document is organized as follows:
• Chapter 11 describes Distributed SuperLU functionality
Part 5 of this document is organized as follows:
• Chapter 12 describes VMATH functionality
xx HP MLIB VECLIB User’s Guide
Organization
Part 6 of this document is organized as follows:
• Chapter 13 explains sparse symmetric linear equation subprograms
• Chapter 14 describes METIS subprograms
• Chapter 15 describes sparse symmetric eigenvalue subprograms
• Chapter 16 describes BCSLIB-EXT functionality Supplemental material is provided as follows:
• Appendix A describes how to call VECLIB and LAPACK subprograms from within C programs
• Appendix B describes LINPACK subprograms available in HP MLIB
• Appendix C lists parallelized subprograms in VECLIB and LAPACK
• An index is included at the back of the manual
Supplemental information for Part 2 of this HP MLIB User’s Guide, is found in the LAPACK Users’ Guide.
The LAPACK Users’ Guide is a publication from the Society for Industrial and Applied Mathematics that provides an introduction to the design of LAPACK as well as complete specifications for all the driver and computational routines. You can access the latest edition of the LAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/lapack/lug Supplemental information for Part 3 of this HP MLIB User’s Guide, is found in
the ScaLAPACK User’s Guide. The ScaLAPACK Users’ Guide is a publication from the Society for Industrial
and Applied Mathematics that provides an informal introduction to the design of ScaLAPACK as well as a detailed description of its contents and a reference manual. You can access the latest edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following URL:
http://www.netlib.org/scalapack/slug Supplemental information for Part 4 of this HP MLIB User’s Guide, is found in
the SuperLU User’s Guide, which is only available online at: http://www.nersc.gov/~xiaoye/SuperLU
Preface xxi

Notational conventions

Notational conventions
The following conventions are used in this manual:
Italics
Italics within text indicate mathematical entities used
or manipulated by the program: for example, solve the
n-by-n system of linear equations Ax = b. Italics within command lines indicate generic
commands, file names, or subprogram names. Substitute actual commands, file names, or subprograms for the italicized words. For example, the command line
f90 prog_name.o
instructs you to type the command f90, followed by the name of a program or subprogram object file.
UPPERCASE BOLDFACE
UPPERCASE BOLDFACE
Fortran statements indicates Fortran keywords and subprogram names that must be typed just as they appear. For example, CALL DAXPY.
within text and in prototype
lowercase boldface
xxii HP MLIB VECLIB User’s Guide
lowercase boldface
generic variable or array names. You should substitute actual variable or array names. The italicized mathematical entities and the lowercase boldface variable and array names usually correspond. For example, A is a matrix and a is the Fortran array containing the matrix:
CALL DAXPY (n, a, x, incx, y, incy) lowercase boldface
ASCII characters that must be typed just as they appear. For example, the command line
f90 prog_name.o
instructs you to type the command f90, followed by the name of a program or subprogram object file.
within text indicates Fortran
within command lines indicates
UPPERCASE MONOSPACE
UPPERCASE MONOSPACE indicates Fortran programs.
Brackets ( [ ] )
Square brackets in command examples designate optional entries.
NOTE
A NOTE highlights important supplemental information.

Documentation resources

The HP MLIB User’s Guide, the LAPACK Users’ Guide, and the ScaLAPACK Users’ Guide are available in hardcopy and online formats.
For the HP MLIB User’s Guide, refer to: http:// www.hp.com/go/mlib
Documentation resources
For the latest edition of the LAPACK Users’ Guide, refer to: http://www.netlib.org/lapack/lug
For the latest edition of the ScaLAPACK User’s Guide, refer to: http://www.netlib.org/scalapack/slug
For the latest edition of the SuperLU User’s Guide, refer to: http://www.nersc.gov/~xiaoye/SuperLU
The following documents provide supplemental information:
LAPACK Users’ Guide Philadelphia, PA: Society for Industrial and Applied Mathematics, 1995. This guide provides information on the subprograms provided with the LAPACK library.
• ScaLAPACK Users’ Guide Philadelphia, PA: Society for Industrial and Applied Mathematics. This guide provides information on the subprograms provided with the ScaLAPACK library.
Parallel Programming Guide for HP-UX Systems. Describes efficient shared-memory parallel programming techniques using HP compilers.
Fortran/9000 Programmer’s Guide. Describes features and requirements in terms of the tasks a programmer might perform. These tasks include how to compile, link, run, debug, and optimize programs.
Preface xxiii
Documentation resources
HP-UX Floating-Point Guide. Describes how floating-point arithmetic is implemented on HP 9000 systems and discusses how floating-point behavior affects the programmer.
HP Fortran 90 Programmer’s Guide. Provides extensive usage information (including how to compile and link), suggestions and tools for migrating to HP Fortran 90, and how to call C and HP-UX routines for HP Fortran 90.
HP Fortran 90 Programmer’s Reference. Provides complete Fortran 90 language reference information. It also covers compiler options, compiler directives, and library information.
HP C Programming Guide. Contains detailed discussions of selected C topics.
HP C/HP-UX Reference Manual. Presents reference information on the C programming language as implemented by HP.
HP aC++ Online Programmer’s Guide. Presents reference and tutorial information on aC++. (This manual is accessed by specifying aCC with the +help command-line option.)
HP MPI User’s Guide. Describes how to use HP MPI (Message Passing Interface), a library of C- and Fortran-callable routines used for message-passing programming.
• Software Optimization for High Performance Computing: Creating Faster Applications: Provides state-of-the-art solutions for every key aspect of
software performance; both code-based and algorithm-based.
HP-UX Assembly Language Reference Manual. Describes the HP-UX assembler for the PA-RISC processor.
HP PA-RISC 2.0 Architecture Reference. Describes the architecture of the PA-RISC 2.0 processor.
PA-RISC Procedure Calling Conventions Reference. Describes the conventions for creating PA-RISC assembly language procedure calls.
IA-64 and Elementary Functions. Describes the architecture of the IA-64 processor and how to implement elementary functions.
• METIS documentation is available online at http://www-users.cs.umn.edu/~karypis/metis/metis/index.html.
• OpenMP documentation is available online at http://www.openmp.org.
xxiv HP MLIB VECLIB User’s Guide
Part 1
HP VECLIB

1 Introduction to VECLIB

Overview

VECLIB, a component of HP MLIB, is a collection of subprograms optimized for use on Hewlett-Packard servers and workstations, providing mathematical software and computational kernels for engineering and scientific applications. This library contains subprograms for:
• Dense vector operations, including the Level 1 BLAS
• Sparse vector and matrix operations, including the Sparse BLAS
• BLAS 3 routine DGEMM is highly tuned
• Matrix operations, including the Level 2 and Level 3 BLAS
• CXML Blas extenstions Compaq Extended Math Library (CXML) is a collection of routines that
performs numerically intensive operations that occur frequently in engineering and scientific computing, such as linear algebra and signal processing.
HP MLIB adds support for the unique CXML extensions to the legacy BLAS and for the array math functions in CXML. However, the additional support of XCML is not exhaustive. For more information about CXML see the CXML Reference Manual part number AA-PV6VE-TE.
• Discrete Fourier transforms
• Convolution and correlation
• Miscellaneous tasks, such as sorting and generating random numbers
Overview
VECLIB provides two sets of libraries: VECLIB and VECLIB8. To determine if a subprogram is included in VECLIB8, refer to the VECLIB8 section under each subprogram specification in the following chapters.
Although VECLIB was designed for use with Fortran programs, C programs can call VECLIB subprograms. Refer to Appendix A, “Calling MLIB routines from C,” for details. Examples are in Fortran, unless otherwise indicated.
Except for subprograms described in Appendix B, “LINPACK Subprograms,” LINPACK, EISPACK, and SKYLINE subprograms are not included in the
Chapter 1 Introduction to VECLIB 1

Chapter objectives

Hewlett-Packard scientific libraries. Refer to the Appendix, “Converting from LINPACK or EISPACK” in the LAPACK Users’ Guide, for assistance converting programs that currently call LINPACK or EISPACK routines to call LAPACK or VECLIB routines instead.
This chapter provides information necessary for efficient use of VECLIB and includes discussions of:
• Standardization
• Accessing VECLIB
• Optimization
• Parallel processing
• Roundoff effects
• Data types and precision
• VECLIB naming convention
• Data type and byte length
• Operator arguments
• Error handling
• HP MLIB man pages
• Troubleshooting
Chapter objectives
After reading this chapter you will:
• Know how to access VECLIB library subprograms
• Understand how VECLIB works in a parallel computing environment
• Understand VECLIB naming conventions
• Understand roundoff effects
• Understand how VECLIB handles errors
• Know how to access the online HP MLIB man pages
• Know what to do if you experience trouble using VECLIB subprograms
2 HP MLIB User’s Guide
Loading...
+ 554 hidden pages