HP HP-UX Performance Tools User Manual

Download

HP MLIB User’s Guide

Seventh Edition

HP Part No. B6061-96027 and B6061-96028

HP MLIB

December 2004

Printed in USA

Edition: Seventh

Document Numbers: B6061-96027 and B6061-96028 Remarks: Released Dcember 2004 with HP MLIB software version9.0.

User’s Guide split into two volumes with this release.

Edition: Sixth

Document Number: B6061-96023 Remarks: Released September 2003 with HP MLIB software version 8.50.

Edition: Fifth

Document Number: B6061-96020 Remarks: Released September 2002 with HP MLIB software version 8.30.

Edition: Fourth

Document Number: B6061-96017 Remarks: Released June 2002 with HP MLIB software version 8.20.

Edition: Third

Document Number: B6061-96015 Remarks: Released September 2001 with HP MLIB software version 8.10.

Edition: Second

Document Number: B6061-96012 Remarks: Released June 2001 with HP MLIB software version 8.00.

Edition: First

Document Number: B6061-96010 Remarks: Released December 1999 with HP MLIB software version

B.07.00. This HP MLIB User’s Guide contains both HP VECLIB and HP LAPACK information. This document supercedes both the HP MLIB VECLIB User’s Guide, ﬁfth edition (B6061-96006) and the HP MLIB LAPACK User’s Guide, sixth edition (B6061-96005).

Notice

 Copyright 1979-2004 Hewlett-Packard Development Company. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws.

The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material,

including, but not limited to, the implied warranties of merchantability and ﬁtness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance or use of this material.

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

LAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

ScaLAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

SuperLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

SOLVERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

VMATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii

Purpose and audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

Documentation resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

Part 1

1 Introduction to VECLIB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Accessing VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Compiling and linking (VECLIB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Problem with +ppu compatibility and duplicated symbols . . . . . . . . . . . . . . . . . . . . 14

Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

Linking for parallel or non parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Controlling VECLIB parallelism at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Performance beneﬁts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

OpenMP-based nested parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Message passing-based nested parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Default CPS library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Default Pthread library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . 22

Roundoff effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Data types and precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

VECLIB naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Data type and byte length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Operator arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Low-level subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

High-level subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

HP MLIB man pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2 Basic Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32

Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

What you need to know to use vector subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

BLAS storage conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

BLAS indexing conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Operator arguments in the BLAS Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Representation of a permutation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Representation of a Householder matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Subprograms for basic vector operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes 40

ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN Index of minimum of magnitudes . . 43

ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements. . . . . . . . 46

ISMAX/IDMAX/IIMAX Index of maximum element of vector . . . . . . . . . . . . . . . . . 49

ISMIN/IDMIN/IIMIN Index of minimum element of vector. . . . . . . . . . . . . . . . . . . 51

ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx Search vector for element . . . . . . . . . . . . 53

SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes . . . . . . . . 56

SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN Minimum of magnitudes . . . . . . . . . . 59

vi Table of Contents

SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes . . . . . . . . . . . . 62

SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC Elementary vector operation 65

SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation . . . . . . . 68

SCLIP/DCLIP/ICLIP Two sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

SCLIPL/DCLIPL/ICLIPL Left sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

SCLIPR/DCLIPR/ICLIPR Right sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . 77

SCOPY/DCOPY/ICOPY/CCOPY/CCOPYC/ZCOPY/ZCOPYC Copy vector . . . . . . . 80

SDOT/DDOT/CDOTC/CDOTU/ZDOTC/ZDOTU Dot product . . . . . . . . . . . . . . . . . 84

SDOTI/DDOTI/CDOTCI/CDOTUI/ZDOTCI/ZDOTUI Sparse dot product . . . . . . . 88

SFRAC/DFRAC Extract fractional parts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

SGTHR/DGTHR/IGTHR/CGTHR/ZGTHR Gather sparse vector . . . . . . . . . . . . . . 94

SGTHRZ/DGTHRZ/IGTHRZ/CGTHRZ/ZGTHRZ Gather and zero sparse vector . 96

SLSTxx/DLSTxx/ILSTxx/CLSTxx/ZLSTxx List selected vector elements . . . . . . . 99

SMAX/DMAX/IMAX Maximum of vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

SMIN/DMIN/IMIN Minimum of vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

SNRM2/DNRM2/SCNRM2/DZNRM2 Euclidean norm . . . . . . . . . . . . . . . . . . . . . 107

SNRSQ/DNRSQ/SCNRSQ/DZNRSQ Euclidean norm squared . . . . . . . . . . . . . . . 110

SRAMP/DRAMP/IRAMP Generate linear ramp. . . . . . . . . . . . . . . . . . . . . . . . . . . 112

SROT/DROT/CROT/CSROT/ZROT/ZDROT Apply Givens rotation . . . . . . . . . . . 114

SROTG/DROTG/CROTG/ZROTG Construct Givens rotation . . . . . . . . . . . . . . . . 118

SROTI/DROTI Apply sparse Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

SROTM/DROTM Apply modified Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . 123

SROTMG/DROTMG Construct modified Givens rotation . . . . . . . . . . . . . . . . . . . 127

SRSCL/DRSCL/CRSCL/CSRSCL/ZRSCL/ZDRSCL Scale vector. . . . . . . . . . . . . . 130

SSCAL/DSCAL/CSCAL/CSSCAL/CSCALC/ZSCAL/ZDSCAL/ZSCALC Scale vector . .

133

SSCTR/DSCTR/ISCTR/CSCTR/ZSCTR Scatter sparse vector. . . . . . . . . . . . . . . . 136

SSUM/DSUM/ISUM/CSUM/ZSUM Vector sum . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

SSWAP/DSWAP/ISWAP/CSWAP/ZSWAP Swap two vectors . . . . . . . . . . . . . . . . . 141

SWDOT/DWDOT/CWDOTC/CWDOTU/ZWDOTC/ZWDOTU Weighted dot product .

145

SZERO/DZERO/IZERO/CZERO/ZZERO Clear vector . . . . . . . . . . . . . . . . . . . . . . 150

F_SAMAX_VAL/F_DAMAX_VAL/F_CAMAX_VAL/F_ZAMAX_VAL Maximum

absolute value and location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

F_SAMIN_VAL/F_DAMIN_VAL/F_CAMIN_VAL/F_ZAMIN_VAL Minimum absolute

value and location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

F_SAPPLY_GROT/F_DAPPLY_GROT/F_CAPPLY_GROT/F_ZAPPLY_GROT Apply

Table of Contents v i i

plane rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

F_SAXPBY/F_DAXPBY/F_CAXPBY/F_ZAXPBY Scaled vector accumulation . . . 161 F_SAXPY_DOT/F_DAXPY_DOT/F_CAXPY_DOT/F_ZAXPY_DOT Combine AXPY

and DOT routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

F_SCOPY/F_DCOPY/F_CCOPY/F_ZCOPY Copy vector. . . . . . . . . . . . . . . . . . . . . 167

F_SDOT/F_DDOT/F_CDOT/F_ZDOT Add scaled dot product . . . . . . . . . . . . . . . . 169

F_SFPINFO/F_DFPINFO Environmental inquiry . . . . . . . . . . . . . . . . . . . . . . . . . 172

F_SGEN_GROT/F_DGEN_GROT/F_CGEN_GROT/F_ZGEN_GROT Generate

Givens rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

F_SGEN_HOUSE/F_DGEN_HOUSE/F_CGEN_HOUSE/F_ZGEN_HOUSE Generate

Householder transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

F_SGEN_JROT/F_DGEN_JROT/F_CGEN_JROT/F_ZGEN_JROT Generate Jacobi

rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

F_SMAX_VAL/F_DMAX_VAL Maximum value and location. . . . . . . . . . . . . . . . . 181

F_SMIN_VAL/F_DMIN_VAL Minimum value and location. . . . . . . . . . . . . . . . . . 183

F_SNORM/F_DNORM Norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

F_SPERMUTE/F_DPERMUTE/F_CPERMUTE/F_ZPERMUTE Permute vector 187

F_SRSCALE/F_DRSCALE/F_CRSCALE/F_ZRSCALE Reciprocal Scale . . . . . . . 190

F_SSORT/F_DSORT Sort vector entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

F_SSORTV/F_DSORTV Sort vector and return index vector. . . . . . . . . . . . . . . . . 193

F_SSUM/F_DSUM/F_CSUM/F_ZSUM Sum of entries of a vector. . . . . . . . . . . . . 195

F_SSUMSQ/F_DSUMSQ/F_CSUMSQ/F_ZSUMSQ Sum of squares . . . . . . . . . . . 197

F_SSWAP/F_DSWAP/F_CSWAP/F_ZSWAP Interchange vectors . . . . . . . . . . . . . 200

F_SWAXPBY/F_DWAXPBY/F_CWAXPBY/F_ZWAXPBY Scaled vector addition. 202

3 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

What you need to know to use these subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Subroutine naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Operator arguments in the BLAS Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Subprograms for basic matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

SGBMV/DGBMV/CGBMV/ZGBMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 212

SGECPY/DGECPY/CGECPY/ZGECPY Copy general matrix . . . . . . . . . . . . . . . . 219

SGEMM/DGEMM/CGEMM/ZGEMM Matrix-matrix multiply . . . . . . . . . . . . . . . 222

viii Table of Contents

DGEMMS/ZGEMMS Strassen matrix-matrix multiply. . . . . . . . . . . . . . . . . . . . . 227

SGEMV/DGEMV/CGEMV/ZGEMV Matrix-vector multiply . . . . . . . . . . . . . . . . . 232

SGER/DGER/CGERC/CGERU/ZGERC/ZGERU Rank-1 update . . . . . . . . . . . . . . 237

SGETRA/DGETRA/CGETRA/ZGETRA In-place transpose of a general square matrix

241

SSBMV/DSBMV/CHBMV/ZHBMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 244

SSPMV/DSPMV/CHPMV/ZHPMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 249

SSPR/DSPR/CHPR/ZHPR Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

SSPR2/DSPR2/CHPR2/ZHPR2 Rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

SSYMM/DSYMM/CHEMM/CSYMM/ZHEMM/ZSYMM Matrix-matrix multiply 265

SSYMV/DSYMV/CHEMV/ZHEMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 270

SSYR/DSYR/CHER/ZHER Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

SSYR2/DSYR2/CHER2/ZHER2 Rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

SSYR2K/DSYR2K/CHER2K/CSYR2K/ZHER2K/ZSYR2K Rank-2k update . . . . . 284

SSYRK/DSYRK/CHERK/CSYRK/ZHERK/ZSYRK Rank-k update . . . . . . . . . . . . 289

STBMV/DTBMV/CTBMV/ZTBMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 294

STBSV/DTBSV/CTBSV/ZTBSV Solve triangular band system . . . . . . . . . . . . . . . 301

STPMV/DTPMV/CTPMV/ZTPMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 308

STPSV/DTPSV/CTPSV/ZTPSV Solve triangular system . . . . . . . . . . . . . . . . . . . . 313

STRMM/DTRMM/CTRMM/ZTRMM Triangular matrix-matrix multiply . . . . . . 318

STRMV/DTRMV/CTRMV/ZTRMV Matrix-vector multiply . . . . . . . . . . . . . . . . . . 323

STRSM/DTRSM/CTRSM/ZTRSM Solve triangular systems . . . . . . . . . . . . . . . . . 327

STRSV/DTRSV/CTRSV/ZTRSV Solve triangular system . . . . . . . . . . . . . . . . . . . 332

XERBLA Error handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

F_CHBMV/F_ZHBMV Hermitian banded matrix-vector multiply . . . . . . . . . . . . 340

F_CHEMV/F_ZHEMV Hermitian matrix-vector multiply. . . . . . . . . . . . . . . . . . . 342

F_CHER/F_ZHER Hermitian rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

F_CHER2/F_ZHER2 Hermitian rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

F_CHPMV/F_ZHPMV Hermitian packed matrix-vector multiply. . . . . . . . . . . . . 348

F_CHPR/F_ZHPR Hermitian rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

F_CHPR2/F_ZHPR2 Hermitian rank-2 update . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

F_SFPINFO/F_DFPINFO Environmental inquiry. . . . . . . . . . . . . . . . . . . . . . . . . 354

F_SGBMV/F_DGBMV/F_CGBMV/F_ZGBMV General band matrix-vector multiply .

355

F_SGE_COPY/F_DGE_COPY/F_CGE_COPY/F_ZGE_COPY Matrix copy. . . . . . 358

F_SGE_TRANS/F_DGE_TRANS/F_CGE_TRANS/F_ZGE_TRANS Matrix

transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

Table of Contents i x

F_SGEMM/F_DGEMM/F_CGEMM/F_ZGEMM General matrix-matrix multiply 362 F_SGEMV/F_DGEMV/F_CGEMV/F_ZGEMV General matrix-vector multiply . . 365 F_SGEMVER/F_DGEMVER/F_CGEMVER/F_ZGEMVER Multiple matrix-vector

multiply, rank 2 update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

F_SGEMVT/F_DGEMVT/F_CGEMVT/F_ZGEMVT Multiple matrix-vector multiply

372

F_SGER/F_DGER/F_CGER/F_ZGER General rank-1 update . . . . . . . . . . . . . . . . 375

F_SSBMV/F_DSBMV/F_CSBMV/F_ZSBMV Symmetric band matrix-vector multiply

378 F_SSPMV/F_DSPMV/F_CSPMV/F_ZSPMV Symmetric packed matrix-vector

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

F_SSPR/F_DSPR/F_CSPR/F_ZSPR Symmetric packed rank-1 update. . . . . . . . . 384

F_SSPR2/F_DSPR2/F_CSPR2/F_ZSPR2 Symmetric rank-2 update . . . . . . . . . . . 386

F_SSYMV/F_DSYMV/F_CSYMV/F_ZSYMV Symmetric matrix-vector multiply . 389

F_SSYR/F_DSYR/F_CSYR/F_ZSYR Symmetric rank-1 update . . . . . . . . . . . . . . . 392

F_SSYR2/F_DSYR2/F_CSYR2/F_ZSYR2 Symmetric rank-2 update . . . . . . . . . . . 394

F_STBMV/F_DTBMV/F_CTBMV/F_ZTBMV Triangular banded matrix-vector

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

F_STBSV/F_DTBSV/F_CTBSV/F_ZTBSV Triangular banded solve . . . . . . . . . . . 400

F_STPMV/F_DTPMV/F_CTPMV/F_ZTPMV Triangular packed matrix-vector

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

F_STPSV/F_DTPSV/F_CTPSV/F_ZTPSV Triangular packed solve. . . . . . . . . . . . 406

F_STRMV/F_DTRMV/F_CTRMV/F_ZTRMV Triangular matrix-vector multiply 408 F_STRMVT/F_DTRMVT/F_CTRMVT/F_ZTRMVT Multiple triangular matrix-vector

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

F_STRSM/F_DTRSM/F_CTRSM/F_ZTRSM Triangular solve . . . . . . . . . . . . . . . . 414

F_STRSV/F_DTRSV/F_CTRSV/F_ZTRSV Triangular solve . . . . . . . . . . . . . . . . . 417

4 Sparse BLAS Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

Chapter objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

Associated documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422

What you need to know to use these subprograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

Subroutine naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

Sparse matrix storage formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

Operator arguments in the Sparse BLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437

x Table of Contents

Common arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438

SM arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440

Order of arguments for args(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440

SBCOMM/DBCOMM/CBCOMM/ZBCOMM Block coordinate matrix-matrix multiply

441 SBDIMM/DBDIMM/CBDIMM/ZBDIMM Block diagonal matrix-matrix multiply 445 SBDISM/DBDISM/CBDISM/ZBDISM Block diagonal format triangular solve . . 449 SBELMM/DBELMM/CBELMM/ZBELMM Block Ellpack matrix-matrix multiply . . .

453 SBELSM/DBELSM/CBELSM/ZBELSM Block Ellpack format triangular solve . 457 SBSCMM/DBSCMM/CBSCMM/ZBSCMM Block sparse column matrix-matrix

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461

SBSCSM/DBSCSM/CBSCSM/ZBSCSM Block sparse column format triangular solve

465 SBSRMM/DBSRMM/CBSRMM/ZBSRMM Block sparse row matrix-matrix multiply

469 SBSRSM/DBSRSM/CBSRSM/ZBSRSM Block sparse row format triangular solve . . .

473 SCOOMM/DCOOMM/CCOOMM/ZCOOMM Coordinate matrix-matrix multiply 477 SCSCMM/DCSCMM/CCSCMM/ZCSCMM Compressed sparse column matrix-matrix

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

SCSCSM/DCSCSM/CCSCSM/ZCSCSM Compressed sparse column format triangular

solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

SCSRMM/DCSRMM/CCSRMM/ZCSRMM Compressed sparse row matrix-matrix

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

SCSRSM/DCSRSM/CCSRSM/ZCSRSM Compressed sparse row format triangular

solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

SDIAMM/DDIAMM/CDIAMM/ZDIAMM Diagonal matrix-matrix multiply . . . . 497

SDIASM/DDIASM/CDIASM/ZDIASM Diagonal format triangular solve. . . . . . . 501

SELLMM/DELLMM/CELLMM/ZELLMM Ellpack matrix-matrix multiply . . . . 505

SELLSM/DELLSM/CELLSM/ZELLSM Ellpack format triangular solve. . . . . . . 509

SJADMM/DJADMM/CJADMM/ZJADMM Jagged diagonal matrix-matrix multiply .

513 SJADSM/DJADSM/CJADSM/ZJADSM Jagged diagonal format triangular solve 517

SSKYMM/DSKYMM/CSKYMM/ZSKYMM Skyline matrix-matrix multiply . . . . 521

SSKYSM/DSKYSM/CSKYSM/ZSKYSM Skyline format triangular solve . . . . . . 525

SVBRMM/DVBRMM/CVBRMM/ZVBRMM Variable block row matrix-matrix

multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529

Table of Contents x i

SVBRSM/DVBRSM/CVBRSM/ZVBRSM Variable block row format triangular solve

533

xii Table of Contents

Figures

List of Figures xiii

xiv List of Figures

Tables

Table 1-1 VECLIB and VECLIB8 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Table 1-2 Compiler Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Table 1-3 VECLIB Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Table 1-4 BLAS Standard Operator Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Table 2-1 FPINFO return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Table 3-1 Extended BLAS Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . 207

Table 3-2 Extended BLAS Naming Convention—Matrix Form . . . . . . . . . . . . . . . . . . . 208

Table 3-3 Extended BLAS Naming Convention—Computation . . . . . . . . . . . . . . . . . . . 208

Table 3-4 Extended BLAS Naming Convention—Subprogram Names . . . . . . . . . . . . . 209

Table 4-1 Sparse BLAS Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . . . . 423

Table 4-2 Sparse BLAS Naming Convention—Matrix Form . . . . . . . . . . . . . . . . . . . . . . 424

Table 4-3 4 x 5 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

Table 4-4 COO Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

Table 4-5 CSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

Table 4-6 MSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

Table 4-7 CSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

Table 4-8 MSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428

Table 4-9 5 x 4 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

Table 4-10 DIA Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

Table 4-11 ELL Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430

Table 4-12 JAD Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Table 4-13 JAD Row-Permuted Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Table 4-14 5 x 5 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Table 4-15 SKY Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432

Table 4-16 4 x 6 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432

Table 4-17 BCO Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

Table 4-18 BSC Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434

List of Tables xiii

Table 4-19 BSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435

Table 4-20 6 x 6 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437

Table 4-21 VBR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437

xiv List of Tables

VECLIB

Preface

Hewlett-Packard’s high-performance math libraries (HP MLIB) help you speed development of applications and shorten execution time of long-running technical applications.

HP MLIB is a collection of subprograms optimized for use on HP servers and workstations, providing mathematical software and computational kernels for engineering and scientiﬁc applications. HP MLIB can be used on systems ranging from single-processor workstations to multiprocessor high-end servers. HP MLIB is optimized for HP PA-RISC 2.0, Itanium 2, and Opteron processors. HP MLIB has six components; VECLIB, LAPACK, ScaLAPACK, SuperLU, SOLVERS, and VMATH.

HP VECLIB contains robust callable subprograms. Together with a subset of the BLAS Standard subroutines, HP MLIB supports the legacy BLAS, a collection of routines for the solution of sparse symmetric systems of equations, a collection of commonly used Fast Fourier Transforms (FFTs), and convolutions. Although VECLIB was designed for use with Fortran programs, C programs can call VECLIB subprograms, as described in Appendix A. Refer to Part 1 of this manual for HP VECLIB information.

Throughout this document there are references to legacy BLAS and BLAS Standard routines. Legacy BLAS routines include Basic Linear Algebra Subprograms (BLAS), that is the level 1, 2, and 3 BLAS, as well as the Sparse BLAS.

A BLAS standardization effort began with a BLAS Technical (BLAST) Forum meeting in November 1995 at the University of Tennessee. The efforts of the BLAST Forum resulted in a BLAS Standard speciﬁcation in 1999. BLAS Standard routines refer to routines as deﬁned by this BLAS Standard speciﬁcation. HP MLIB supports a subset of BLAS Standard routines. Refer to Chapter 2, “Basic Vector Operations,” and Chapter 3, “Basic Matrix Operations,” for details about supported subprograms.

Preface xv

LAPACK

HP Linear Algebra Package (LAPACK) is a collection of subprograms that provide mathematical software for applications involving linear equations, least squares, eigenvalue problems, and the singular value decomposition. LAPACK is designed to supersede the linear equation and eigenvalue packages, LINPACK and EISPACK. The National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy supported the development of the public-domain version of LAPACK, from which the HP version was derived.

HP LAPACK fully conforms with public domain version 3.0 of LAPACK in all user-visible usage conventions. Refer to Part 2 of this manual for information speciﬁc to HP LAPACK. To supplement the HP speciﬁc information provided in Part 2 of this document, refer to the standard LAPACK Users’ Guide. You can access the latest edition of the LAPACK Users’ Guide at the Netlib repository at the following URL:

http://www.netlib.org/lapack/lug/index..html

ScaLAPACK

ScaLAPACK is a library of high-performance linear algebra routines capable of solving systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. ScaLAPACK can also handle many associated computations such as matrix factorizations or estimating condition numbers.

ScaLAPACK is a public domain software that was developed by Oak Ridge National Laboratory. It is designed for distributed computing and uses the Message Passing Interface (MPI) for parallelism. This implementation provides a version of ScaLAPACK tuned on HP servers and built with HP’s MPI. The ScaLAPACK library routines are written in Fortran 77 and are callable from Fortran 90 and C routines. Unlike other MLIB libraries, there is not a version of ScaLAPACK that assumes all integers are 8 bytes in length.

xvi HP MLIB VECLIB User’s Guide

Refer to Part 3 of this manual for information speciﬁc to HP ScaLAPACK. To supplement the HP speciﬁc information provided in Part 3 of this document, refer to the standard ScaLAPACK Users’ Guide. You can access the latest edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following URL:

http://www.netlib.org/scalapack/slug/index..html

SuperLU

This implementation provides the Distributed SuperLU library designed for distributed memory parallel computers. It was based on the public-domain SuperLU_DIST, which was developed at the Lawrence Berkeley National Lab and the University of California at Berkeley.

The library contains a set of subroutines to solve a sparse linear system. The library is implemented in ANSI C, using HP Message Passing Interface (MPI) for communication. The library includes routines to handle both real and complex matrices using double precision. The parallel routine names for the double-precision real version start with the letters “pd” (e.g. pdgstrf). The parallel routine names for the double-precision complex version start with letters “pz” (e.g. pzgstrf). Unlike other MLIB libraries, there is not a version of SuperLU that assumes all integers are 8 bytes in length.

The routines can be called directly from C applications. They may also be called from Fortran applications, however, “bridge” routines (in C) must be supplied. For details about creating bridge routines, please refer to Section 2.9.2 in the SuperLU User’s Guide available at:

http://www.nersc.gov/~xiaoye/SuperLU

SuperLU

SOLVERS

Solvers is a collection of direct sparse linear system solvers and graph partitioning routines. Symmetric systems can be solved using SMP parallelism and structurally-symmetric systems can be solved using out-of-core functionality. These routines have been optimized for use on Hewlett-Packard servers. Features are:

Preface xvii

VMATH

• Sparse symmetric and structurally-symmetric linear equation solutions.

• Sparse symmetric ordinary and generalized eigensystem solutions.

• Out-of-core symmetric and structurally-symmetric linear equation and eigensystems solutions.

• Full METIS functionality This implementation provides the METIS Version 4.0.1 library. It is based

on the public-domain METIS, which was developed at the University of Minnesota, Department of Computer Science, and the Army HPC Research Center. The library contains a set of subroutines for graph partitioning, mesh partitioning, and sparse matrix reordering, as well as auxiliary routines. HP MLIB contains the full METIS functionality as that in the public domain METIS, however, the routine names are different. HP MLIB METIS routine names have been prepended with mlib_ to avoid name conﬂict on applications and libraries that contain their own local version of METIS.

For more information about METIS, please refer to: http://www-users.cs.umn.edu/~karpis/metis/metis/index.html

VMATH

VMATH is a library of vector math routines corresponding to many of the widely used scalar math routines available with C, C++, and Fortran90.

VMATH is intended for computationally intensive mathematical applications amenable to a vector programming style.

VMATH provides two libraries: VMATH, whose interface uses 4-byte integers; and VMATH8, whose interface uses 8-byte integers and is otherwise equivalent to VMATH. VMATH routines come with both Fortran and C interfaces.

For more detailed information on VMATH as well as subprogram speciﬁcations, please refer to the VMATH chapter in Part 5 of this book. The VMATH man pages provide a man page for each subprogram.

xviii HP MLIB VECLIB User’s Guide

Purpose and audience

This guide describes the MLIB software library and shows how to use it. This library provides mathematical software and computational kernels for applications.

The HP MLIB User’s Guide addresses experienced programmers who:

• Convert, develop, or optimize programs for use on HP servers and workstations

• Optimize existing software to improve performance and increase productivity

Purpose and audience

Preface xix

Organization

The HP MLIB User’s Guide describes HP MLIB VECLIB in Part 1, HP MLIB LAPACK in Part 2, HP MLIB ScaLAPACK in Part 3, and HP MLIB Distributed SuperLU in Part 4.

To learn fundamental information necessary for using the VECLIB library, read Chapter 1 and the introductory sections of the other chapters. These sections of background information will help you efﬁciently use the library subprograms.

To learn more about the subject of any chapter, refer to the literature cited in the “Associated documentation” section of each chapter.

Part 1 of this document is organized as follows:

• Chapter 1 introduces general concepts about VECLIB

• Chapter 2 describes basic vector operations included in VECLIB

• Chapter 3 describes basic matrix operations included in VECLIB

• Chapter 4 describes sparse BLAS operations

• Chapter 5 describes the discrete Fourier transforms in VECLIB

• Chapter 6 describes subprograms to compute convolutions and correlations of data sets

• Chapter 7 describes miscellaneous subprograms to produce random numbers, sort the elements of a vector in ascending or descending order, measure time, allocate dynamic memory, and report errors

Part 2 of this document is organized as follows:

• Chapter 8 describes information speciﬁc to Hewlett-Packard’s implementation of LAPACK

• Chapter 9 describes selected LAPACK auxiliary subprograms

Part 3 of this document is organized as follows:

• Chapter 10 describes ScaLAPACK functionality

Part 4 of this document is organized as follows:

• Chapter 11 describes Distributed SuperLU functionality

Part 5 of this document is organized as follows:

• Chapter 12 describes VMATH functionality

xx HP MLIB VECLIB User’s Guide

Organization

Part 6 of this document is organized as follows:

• Chapter 13 explains sparse symmetric linear equation subprograms

• Chapter 14 describes METIS subprograms

• Chapter 15 describes sparse symmetric eigenvalue subprograms

• Chapter 16 describes BCSLIB-EXT functionality Supplemental material is provided as follows:

• Appendix A describes how to call VECLIB and LAPACK subprograms from within C programs

• Appendix B describes LINPACK subprograms available in HP MLIB

• Appendix C lists parallelized subprograms in VECLIB and LAPACK

• An index is included at the back of the manual

Supplemental information for Part 2 of this HP MLIB User’s Guide, is found in the LAPACK Users’ Guide.

The LAPACK Users’ Guide is a publication from the Society for Industrial and Applied Mathematics that provides an introduction to the design of LAPACK as well as complete speciﬁcations for all the driver and computational routines. You can access the latest edition of the LAPACK Users’ Guide at the Netlib repository at the following URL:

http://www.netlib.org/lapack/lug Supplemental information for Part 3 of this HP MLIB User’s Guide, is found in

the ScaLAPACK User’s Guide. The ScaLAPACK Users’ Guide is a publication from the Society for Industrial

and Applied Mathematics that provides an informal introduction to the design of ScaLAPACK as well as a detailed description of its contents and a reference manual. You can access the latest edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following URL:

http://www.netlib.org/scalapack/slug Supplemental information for Part 4 of this HP MLIB User’s Guide, is found in

the SuperLU User’s Guide, which is only available online at: http://www.nersc.gov/~xiaoye/SuperLU

Preface xxi

Notational conventions

The following conventions are used in this manual:

Italics

Italics within text indicate mathematical entities used

or manipulated by the program: for example, solve the

n-by-n system of linear equations Ax = b. Italics within command lines indicate generic

commands, ﬁle names, or subprogram names. Substitute actual commands, ﬁle names, or subprograms for the italicized words. For example, the command line

f90 prog_name.o

instructs you to type the command f90, followed by the name of a program or subprogram object ﬁle.

UPPERCASE BOLDFACE

Fortran statements indicates Fortran keywords and subprogram names that must be typed just as they appear. For example, CALL DAXPY.

within text and in prototype

lowercase boldface

xxii HP MLIB VECLIB User’s Guide

lowercase boldface

generic variable or array names. You should substitute actual variable or array names. The italicized mathematical entities and the lowercase boldface variable and array names usually correspond. For example, A is a matrix and a is the Fortran array containing the matrix:

CALL DAXPY (n, a, x, incx, y, incy) lowercase boldface

ASCII characters that must be typed just as they appear. For example, the command line

f90 prog_name.o

instructs you to type the command f90, followed by the name of a program or subprogram object ﬁle.

within text indicates Fortran

within command lines indicates

UPPERCASE MONOSPACE

UPPERCASE MONOSPACE indicates Fortran programs.

Brackets ( [ ] )

Square brackets in command examples designate optional entries.

NOTE

A NOTE highlights important supplemental information.

Documentation resources

The HP MLIB User’s Guide, the LAPACK Users’ Guide, and the ScaLAPACK Users’ Guide are available in hardcopy and online formats.

For the HP MLIB User’s Guide, refer to: http:// www.hp.com/go/mlib

Documentation resources

For the latest edition of the LAPACK Users’ Guide, refer to: http://www.netlib.org/lapack/lug

For the latest edition of the ScaLAPACK User’s Guide, refer to: http://www.netlib.org/scalapack/slug

For the latest edition of the SuperLU User’s Guide, refer to: http://www.nersc.gov/~xiaoye/SuperLU

The following documents provide supplemental information:

• LAPACK Users’ Guide Philadelphia, PA: Society for Industrial and Applied Mathematics, 1995. This guide provides information on the subprograms provided with the LAPACK library.

• ScaLAPACK Users’ Guide Philadelphia, PA: Society for Industrial and Applied Mathematics. This guide provides information on the subprograms provided with the ScaLAPACK library.

• Parallel Programming Guide for HP-UX Systems. Describes efﬁcient shared-memory parallel programming techniques using HP compilers.

• Fortran/9000 Programmer’s Guide. Describes features and requirements in terms of the tasks a programmer might perform. These tasks include how to compile, link, run, debug, and optimize programs.

Preface xxiii

Documentation resources

• HP-UX Floating-Point Guide. Describes how ﬂoating-point arithmetic is implemented on HP 9000 systems and discusses how ﬂoating-point behavior affects the programmer.

• HP Fortran 90 Programmer’s Guide. Provides extensive usage information (including how to compile and link), suggestions and tools for migrating to HP Fortran 90, and how to call C and HP-UX routines for HP Fortran 90.

• HP Fortran 90 Programmer’s Reference. Provides complete Fortran 90 language reference information. It also covers compiler options, compiler directives, and library information.

• HP C Programming Guide. Contains detailed discussions of selected C topics.

• HP C/HP-UX Reference Manual. Presents reference information on the C programming language as implemented by HP.

• HP aC++ Online Programmer’s Guide. Presents reference and tutorial information on aC++. (This manual is accessed by specifying aCC with the +help command-line option.)

• HP MPI User’s Guide. Describes how to use HP MPI (Message Passing Interface), a library of C- and Fortran-callable routines used for message-passing programming.

• Software Optimization for High Performance Computing: Creating Faster Applications: Provides state-of-the-art solutions for every key aspect of

software performance; both code-based and algorithm-based.

• HP-UX Assembly Language Reference Manual. Describes the HP-UX assembler for the PA-RISC processor.

• HP PA-RISC 2.0 Architecture Reference. Describes the architecture of the PA-RISC 2.0 processor.

• PA-RISC Procedure Calling Conventions Reference. Describes the conventions for creating PA-RISC assembly language procedure calls.

• IA-64 and Elementary Functions. Describes the architecture of the IA-64 processor and how to implement elementary functions.

• METIS documentation is available online at http://www-users.cs.umn.edu/~karypis/metis/metis/index.html.

• OpenMP documentation is available online at http://www.openmp.org.

xxiv HP MLIB VECLIB User’s Guide

Part 1

HP VECLIB

1 Introduction to VECLIB

Overview

VECLIB, a component of HP MLIB, is a collection of subprograms optimized for use on Hewlett-Packard servers and workstations, providing mathematical software and computational kernels for engineering and scientiﬁc applications. This library contains subprograms for:

• Dense vector operations, including the Level 1 BLAS

• Sparse vector and matrix operations, including the Sparse BLAS

• BLAS 3 routine DGEMM is highly tuned

• Matrix operations, including the Level 2 and Level 3 BLAS

• CXML Blas extenstions Compaq Extended Math Library (CXML) is a collection of routines that

performs numerically intensive operations that occur frequently in engineering and scientiﬁc computing, such as linear algebra and signal processing.

HP MLIB adds support for the unique CXML extensions to the legacy BLAS and for the array math functions in CXML. However, the additional support of XCML is not exhaustive. For more information about CXML see the CXML Reference Manual part number AA-PV6VE-TE.

• Discrete Fourier transforms

• Convolution and correlation

• Miscellaneous tasks, such as sorting and generating random numbers

Overview

VECLIB provides two sets of libraries: VECLIB and VECLIB8. To determine if a subprogram is included in VECLIB8, refer to the VECLIB8 section under each subprogram speciﬁcation in the following chapters.

Although VECLIB was designed for use with Fortran programs, C programs can call VECLIB subprograms. Refer to Appendix A, “Calling MLIB routines from C,” for details. Examples are in Fortran, unless otherwise indicated.

Except for subprograms described in Appendix B, “LINPACK Subprograms,” LINPACK, EISPACK, and SKYLINE subprograms are not included in the

Chapter 1 Introduction to VECLIB 1

Chapter objectives

Hewlett-Packard scientiﬁc libraries. Refer to the Appendix, “Converting from LINPACK or EISPACK” in the LAPACK Users’ Guide, for assistance converting programs that currently call LINPACK or EISPACK routines to call LAPACK or VECLIB routines instead.

This chapter provides information necessary for efﬁcient use of VECLIB and includes discussions of:

• Standardization

• Accessing VECLIB

• Optimization

• Parallel processing

• Roundoff effects

• Data types and precision

• VECLIB naming convention

• Data type and byte length

• Operator arguments

• Error handling

• HP MLIB man pages

• Troubleshooting

Chapter objectives

After reading this chapter you will:

• Know how to access VECLIB library subprograms

• Understand how VECLIB works in a parallel computing environment

• Understand VECLIB naming conventions

• Understand roundoff effects

• Understand how VECLIB handles errors

• Know how to access the online HP MLIB man pages

• Know what to do if you experience trouble using VECLIB subprograms

2 HP MLIB User’s Guide

Standardization

VECLIB conforms to a variety of existing standards. For example, it includes Basic Linear Algebra Subprograms (BLAS), levels 1, 2, and 3, and Sparse BLAS.

These products are available with standardized user interfaces on computers ranging from microcomputers to supercomputers. Because VECLIB conforms to these standards, it is a software bridge from other computers to Hewlett-Packard servers and workstations. However, even though the user interface is standardized, internal workings of HP VECLIB subprograms have been specialized for supported computers.

HP MLIB supports a subset of the BLAS Standard routines. BLAS standardization efforts, supported by software and hardware vendors and university groups, began with a BLAS Technical (BLAST) Forum meeting in November 1995 at the University of Tennessee. The efforts of the BLAST Forum resulted in a BLAS Standard speciﬁcation in 1999. Refer to Chapter 2, “Basic Vector Operations,” and Chapter 3, “Basic Matrix Operations,” for details of supported subprograms.

Standardization

Accessing VECLIB

The VECLIB and VECLIB8 libraries are available as archive and shared libraries. They consist of compiled subprograms ready for you to incorporate into your programs with the linker. Include the appropriate declarations and CALL statements in your Fortran source program and specify that VECLIB or VECLIB8 be used as an object library at link time. Refer to the LAPACK chapter for details about accessing the LAPACK library, The ScaLAPACK chapter for details about accessing the ScaLAPACK library, and the Distributed SuperLU chapter for details about accessing the Distributed SuperLU library.

MLIB libraries are installed in the /opt/mlib/ directory. The entire path depends on your system type, as follows:

Chapter 1 Introduction to VECLIB 3

Accessing VECLIB

Table 1-1 VECLIB and VECLIB8 Libraries

Processor

Type

PA 2.0

Itanium 2

Version

HP-UX

11i or later

HP-UX

11i V1.6 or

later

Red Hat

Linux

Enterprise 3

Intel V8.0

Compiler

Red Hat

Linux

Enterprise 3

Intel IA-32

V8.0

Compiler

Address

Width

32-bit /opt/mlib/lib/pa2.0

64-bit /opt/mlib/lib/pa20_64

32-bit /opt/mlib/lib/hpux32

64-bit /opt/mlib/lib/hpux64

64-bit /opt/mlib/intel_8.0/hpmpi_2.1/lib/64

32-bit /opt/mlib/intel_8.0/hpmpi_2.1/lib/32

Installation Directory Libraries

libveclib.a libveclib.sl

libveclib.a libveclib.sl libveclib8.a libveclib8.sl

libveclib.a libveclib.so

libveclib.a libveclib.so libveclib8.a libveclib8.so

Itanium 2

4 HP MLIB User’s Guide

HP XC6000

Intel V8.0

Compiler

64-bit /opt/mlib/intel_8.0/hpmpi_2.1/lib/64

libveclib.a libveclib.so libveclib8.a libveclib8.so

Accessing VECLIB

Processor

Type

Itanium 2

Only 64-bit addressing is supported in the VECLIB8 library. The ﬁle name of the archive VECLIB and VECLIB8 libraries are libveclib.a

and libveclib8.a respectively. The VECLIB shared library for PA is libveclib.sl and is libveclib.so for all other operating systems.

Performance of your applications is better when you use archive libraries. However, if you need to keep executable ﬁles to a minimum size, you can use shared libraries. See “Linking with libisamstub.a” on page 8 if your application calls Fortran77 routines from C language code, and you are linking with the VECLIB archive library.

The HP VECLIB library contains parallelized subprograms. Refer to Appendix C, “Parallelized Subprograms,” for details. If your program is run on an HP-UX system with multiple processors, see “Parallel processing” on page 17 for additional information about linking your program.

Version

HP XC6000

Intel V7.1

Compiler

HP XC4000

PGI V5.1 Compiler

Opteron PGI V5.1 Compiler

Address

Width

64-bit /opt/mlib/intel_7.1/hpmpi_2.1/lib/64

64-bit /opt/mlib/intel_5.1/hpmpi_2.1/lib/64

Installation Directory Libraries

libveclib.a libveclib.so libveclib8.a libveclib8.so

Compiling and linking (VECLIB)

There are several ways to link your program with the version of VECLIB tuned for the machine on which the program runs. By default, the −lveclib option on the f90, cc, or c89 command line that links your program selects the library that corresponds to 32-bit addressing on the machine on which you link.

cc is the HP C compiler and c89 is the HP POSIX-conforming C compiler. The

remainder of this book refers to the cc and f90 compilers. cc and f90 examples also apply to c89.

Chapter 1 Introduction to VECLIB 5

Accessing VECLIB

When you use the –aarchive_shared ﬂag on your compiler command line for HP-UX, it ensures that the compiler links the archive library. If the archive library is not available, then it links the shared library. If you omit

–aarchive_shared and –ashared_archive, the linker defaults to linking the

shared library. Link with –Bstatic on Linux systems. If your program uses subprograms from VECLIB, LAPACK, ScaLAPACK,

Distributed SuperLU, SOLVERS and VMATH, specify −llapack, −lveclib,

−lscalapack, −lsuperlu_dist, −lsolvers, and −lvmath or −llapack8, −lveclib8,

−lscalapack8, −lsuperlu_dist8, −lsolvers8 and −lvmath8 on the compiler

command line.

NOTE Do not mix subprograms from the two types of 64-bit address libraries

(those with 32-bit integers and those with 64-bit integers) in the same program.

Each of the VECLIB and LAPACK library ﬁles is complete in itself, meaning that you do not need to link one library because you have called subprograms from another. This is because various subprograms are included in both libraries. For example, VECLIB subroutine SGEMV is called by several LAPACK subprograms, and therefore, is included in the LAPACK library. Thus, in general, you have to link only the library or libraries you need.

For PA-Based Systems

1. To link a program that uses VECLIB for use on the same machine, use one of the following commands:

f90 [options] ﬁle ... –Wl,–aarchive_shared −lveclib

[options] ﬁle ... –Wl,–aarchive_shared −lveclib −lcl −lm

aCC

2. Specify the entire path of the library ﬁle on the compiler command line that links your program. For example, to link your program with VECLIB for use with 32- or 64-bit addressing on a PA-based system, use one of the following:

f90 [options] ﬁle ... /opt/mlib/lib/[pa2.0|pa20_64]/libveclib.a

[options] ﬁle ... /opt/mlib/lib/[pa2.0|pa20_64]/libveclib.a −lcl −lm

aCC

Replace libveclib.a with libveclib.sl on your compiler command line if you want to link the shared library on a PA-based system.

3. Use the −lveclib option on the compiler command line that links your program, preceded by:

−Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64]

6 HP MLIB User’s Guide

Accessing VECLIB

For example, the command lines in Method 2 for PA could be written:

f90 [options] ﬁle ... −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64]

−lveclib

cc [options] ﬁle ... −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64]

−lveclib −lcl −lm [options] ﬁle ... −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64]

aCC

−lveclib −lcl −lm

4. Set the LDOPTS environment variable to include:

–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64]

For example:

setenv LDOPTS “–aarchive_shared,–L/opt/mlib/lib/pa2.0”

Then use the −lveclib option on the compiler command line that links your program:

f90 [options] ﬁle ... −lveclib

[options] ﬁle ... −lveclib −lcl −lm

aCC

NOTE An LDOPTS speciﬁcation takes precedence over using -Wl on the

compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that speciﬁcation with a -Wl option on your compiler command line.

5. To link with the VECLIB8 libraries on a PA-based system, use the +DA2.0W option to specify memory addresses are 64-bit.

Compile Fortran applications with the +i8, +autodbl, or +autodbl4 option where:

• +i8 promotes 4-byte integer of logical constants, intrinsics, and user

variables (declared as integer or logical) to 8-byte quantities.

• +autodbl promotes all integer, logical, and real items to 8 bytes, and all

double-precision and complex items to 16 bytes.

• +autodbl4 promotes all integer logical, and real items to 8 bytes, and

complex items to 16 bytes. The +autodbl4 option does not promote the size of double-precision and double-complex items.

f90 +DA2.0W +i8 [options] ﬁle ... −Wl,–aarchive_shared −lveclib8 cc +DA2.0W aCC +DA2.0W

[options] ﬁle ... −Wl,–aarchive_shared −lveclib8 −lcl −lm

Chapter 1 Introduction to VECLIB 7

Accessing VECLIB

Linking with libisamstub.a

C language codes that call Fortran77 routines from the BLAS Standard, the sparse linear equation system, or the sparse eigenvalue system, must explicitly link the ISAM (Indexed Sequential Access Method) stubs library into the program. For example,

cc [options] ﬁle ... –Wl,–aarchive_shared,–L/opt/fortran/lib/libisamstub.a

−lveclib

−lcl −lm

This only applies if you are linking with the VECLIB archive library. This option is only valid for 32-bit PA systems.

For Itanium-Based HP-UX Systems

1. To link a program that uses VECLIB for use on the same machine, use one of the following commands:

f90 [options] ﬁle ... –Wl,–aarchive_shared −lveclib

[options] ﬁle ... –Wl,–aarchive_shared −lveclib −lcl −lm

aCC

2. Specify the entire path of the library ﬁle on the compiler command line that links your program. To link your program with VECLIB for use with 32- or 64-bit addressing on an HP-UX system, use one of the following:

f90 [options] ﬁle ... /opt/mlib/lib/[hpux32|hpux64]/libveclib.a

[options] ﬁle ... /opt/mlib/lib/[hpux32|hpux64]/libveclib.a −lcl −lm

aCC

Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an Itanium-based system.

3. Use the −lveclib option on the compiler command line that links your program, preceded by:

−Wl,–aarchive_shared,−L/opt/mlib/lib/[hpux32|hpux64]

For example, the command lines in Method 2 for HP-UX could be written:

f90 [+DD32|+DD64] [options] ﬁle...−Wl,–aarchive_shared,−L/opt/mlib/lib/ [hpux32|hpux64]

cc [+DD32|+DD64] [hpux32|hpux64]

aCC [+DD32|+DD64] [hpux32|hpux64]

4. Set the LDOPTS environment variable to include:

−lveclib [options] ﬁle ... −Wl,–aarchive_shared,−L/opt/mlib/lib/

−lveclib −lcl −lm [options] ﬁle ... −Wl,–aarchive_shared,−L/opt/mlib/lib/

−lveclib −lcl −lm

8 HP MLIB User’s Guide

Accessing VECLIB

–aarchive_shared,−L/opt/mlib/lib/[hpux32|hpux64]

For example:

setenv LDOPTS “–aarchive_shared,–L/opt/mlib/lib/hpux32”

Then use the −lveclib option on the compiler command line that links your program:

f90 [options] ﬁle ... −lveclib cc [options] ﬁle ... −lveclib −lcl −lm

[options] ﬁle ... −lveclib −lcl −lm

aCC

NOTE An LDOPTS speciﬁcation takes precedence over using -Wl on the

compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that speciﬁcation with a -Wl option on your compiler command line.

5. To link with the VECLIB8 libraries on an HP-UX system, use the +DD64 option to specify memory addresses are 64-bit.

Compile Fortran applications with the +i8, +autodbl, or +autodbl4 option where:

• +i8 promotes 4-byte integer of logical constants, intrinsics, and user

variables (declared as integer or logical) to 8-byte quantities.

• +autodbl promotes all integer, logical, and real items to 8 bytes, and all

double-precision and complex items to 16 bytes.

• +autodbl4 promotes all integer logical, and real items to 8 bytes, and

complex items to 16 bytes. The +autodbl4 option does not promote the size of double-precision and double-complex items.

f90 +DD64 +i8 [options] ﬁle ... −Wl,–aarchive_shared −lveclib8 cc +DD64 aCC +DD64

[options] ﬁle ... −Wl,–aarchive_shared −lveclib8 −lcl −lm

For Itanium-Based Red Hat Linux Systems

1. To link a program that uses VECLIB for use on the same machine, use one of the following commands:

ifort [options] ﬁle ... –Wl,–Bstatic −lveclib −openmp

[options] ﬁle ... –Wl,–Bstatic −lveclib −openmp

icc

Link with -Bdynamic if you want the archive library on a linux system.

Chapter 1 Introduction to VECLIB 9

Accessing VECLIB

NOTE When you use the Intel V8.0 C compiler to link the SOLVERS library,

you may require one or more of -lifcore, -lifport, or -ldl.

ifort [options] ﬁle ... /opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/libveclib.a

−openmp icc

[options] ﬁle ... /opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/libveclib.a

−openmp

Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on a Linux system.

When you use the C compiler for linking, you may require one or more of

-1CEPCF90, -1F90, -1IEPCF90, -1PEPCF90, or -1POSF90.

3. Use the −lveclib option on the compiler command line that links your program, preceded by:

−Wl,–Bstatic,−L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]

For example, the command lines in Method 2 for Linux could be written:

ifort [options] ﬁle...−Wl,–Bstatic,−L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]

−lveclib −openmp

icc

[options] ﬁle ... −Wl,–Bstatic,−L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]

−lveclib −openmp

When you use the C compiler for linking, you may require one or more of

-1CEPCF90, -1F90, -1IEPCF90, -1PEPCF90, or -1POSF90.

4. Set the LDOPTS environment variable to include:

–Bstatic,−L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]

For example:

setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/linux”

Then use the −lveclib option on the compiler command line that links your program:

ifort [options] ﬁle ... −lveclib −openmp

[options] ﬁle ... −lveclib −openmp

icc

NOTE An LDOPTS speciﬁcation takes precedence over using -Wl on the

compiler command line. That is, if you use the LDOPTS environment

10 HP MLIB User’s Guide

variable to specify a library path, you cannot override that speciﬁcation with a -Wl option on your compiler command line.

5. Use the following commands to link your programs for use with the VECLIB8 library on a Red Hat Linux system.

ifort −i8 [options] ﬁle ... -L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/lveclib8

−openmp

icc

[options] ﬁle ... −L/opt/mlib/intel_8.0/hpmpi_2.1/lib/[32|64]/lveclib8

−openmp

When you use the C compiler for linking, you may require one or more of

-1CEPCF90, -1F90, -1IEPCF90, -1PEPCF90, or -1POSF90.

For XC6000 Systems with the Intel V8.0 Compiler

1. Use one of the following compile line commands to link VECLIB:

ifort [options] ﬁle ... –L/opt/mlib/intel_8.0/hpmpi_2.1/lib/64 −lveclib −openmp

[options] ﬁle ... –L/opt/mlib/intel_8.0/hpmpi_2.1/lib/64 −lveclib −openmp

icc

2. Specify the entire path of the library ﬁle on the compiler command line that links your program. For example, to link your program with VECLIB for use with 64-bit addressing on an XC6000 system with the Intel V8.0 compiler, use one of the following:

ifort [options] ﬁle ... /opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64/libveclib.a

−openmp

Accessing VECLIB

icc

[options] ﬁle ... /opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64/libveclib.a

−openmp

Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an XC6000 system.

3. Use the −lveclib option on the compiler command line that links your program, preceded by:

−Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64

For example, the command lines in Method 2 for XC6000 could be written:

ifort [options] ﬁle...−Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64

−lveclib −openmp

icc

[options] ﬁle ... −Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64

−lveclib −openmp

4. Set the LDOPTS environment variable to include:

–Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64

Chapter 1 Introduction to VECLIB 11

Accessing VECLIB

For example:

setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64”

Then use the −lveclib option on the compiler command line that links your program:

ifort [options] ﬁle ... −lveclib −openmp

[options] ﬁle ... −lveclib −openmp

icc

NOTE An LDOPTS speciﬁcation takes precedence over using -Wl on the

compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that speciﬁcation with a -Wl option on your compiler command line.

For XC6000 Systems with the Intel V7.1 Compiler

1. Use one of the following compile line commands to link VECLIB:

efc [options] ﬁle ... –L/opt/mlib/intel_7.1/hpmpi_2.1/lib/64 −lveclib −openmp

[options] ﬁle ... –L/opt/mlib/intel_7.1/hpmpi_2.1/lib/64 −lveclib −openmp

icc

efc [options] ﬁle ... /opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64/libveclib.a

−openmp icc

[options] ﬁle ... /opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64/libveclib.a

−openmp

Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an XC6000 system.

3. Use the −lveclib option on the compiler command line that links your program, preceded by:

−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64

For example, the command lines in Method 2 for XC6000 could be written:

efc [options] ﬁle...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64

−lveclib −openmp

icc

−lveclib −openmp

4. Set the LDOPTS environment variable to include:

12 HP MLIB User’s Guide

[options] ﬁle ...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64

Accessing VECLIB

–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64

For example:

setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64”

Then use the −lveclib option on the compiler command line that links your program:

efc [options] ﬁle ... −lveclib −openmp icc [options] ﬁle ... −lveclib −openmp

NOTE An LDOPTS speciﬁcation takes precedence over using -Wl on the

compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that speciﬁcation with a -Wl option on your compiler command line.

For XC4000 Systems with the PGI V5.1 Compiler

1. Use one of the following compile line commands to link VECLIB:

pgf90 [options] ﬁle ... –L/opt/mlib/pgi_5.1/hpmpi_2.1/lib/64 −lveclib −mp

[options] ﬁle ... –L/opt/mlib/pgi_5.1/hpmpi_2.1/lib/64 −lveclib −mp

pgcc

−lpgf90 −lpgf90_rpml −lpgf902 −lpgf90rtl −lpgftnrtl

2. Specify the entire path of the library ﬁle on the compiler command line that links your program. For example, to link your program with VECLIB for use with 64-bit addressing on an XC4000 system with the PGI V5.1 compiler, use one of the following:

pgf90 [options] ﬁle ... /opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64/libveclib.a −mp

[options] ﬁle ... /opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64/libveclib.a −mp

pgcc

−lpgf90 −lpgf90_rpml −lpgf902 −lpgf90rtl −lpgftnrtl

Replace libveclib.a with libveclib.so on your compiler command line if you want to link the shared library on an XC4000 system.

3. Use the −lveclib option on the compiler command line that links your program, preceded by:

−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64

For example, the command lines in Method 2 for XC4000 could be written:

pgf90 [options] ﬁle...−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64

−lveclib −mp

pgcc

[options] ﬁle ...−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64

−lveclib −mp

−lpgf90 −lpgf90_rpml −lpgf902 −lpgf90rtl −lpgftnrtl

Chapter 1 Introduction to VECLIB 13

Accessing VECLIB

4. Set the LDOPTS environment variable to include:

–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64

For example:

setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/pgi5.1/hpmpi_2.1/lib/64”

Then use the −lveclib option on the compiler command line that links your program:

pgf90 [options] ﬁle ... −lveclib −mp

[options] ﬁle ... −lveclib −mp −lpgf90 −lpgf90_rpml −lpgf902 −lpgf90rtl

pgcc

−lpgftnrtl

NOTE An LDOPTS speciﬁcation takes precedence over using -Wl on the

compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that speciﬁcation with a -Wl option on your compiler command line.

Problem with +ppu compatibility and duplicated symbols

All MLIB subprograms documented in the HP MLIB User’s Guide have two entry points: one is compatible with the Fortran compiler option +noppu (no postpend underbar) and the second is compatible with the Fortran compiler option +ppu (postpend underbar). For example, the MLIB BLAS error handler XERBLA has entry points xerbla and xerbla_. Table 1-1 shows the Fortran 90 compiler defaults.

Table 1-2 Compiler Defaults

Different compiler defaults can cause duplicate symbol deﬁnitions at link time as the following examples illustrate.

PA-RISC processors

Suppose you write your own Fortran version of the XERBLA subroutine. You compile on PA-RISC with +DA2.0W and the compiler turns on +ppu (See Table 1-1), so your xerbla.o object ﬁle has an entry point xerbla_. However, the PA-RISC MLIB libraries were compiled with +noppu and the entry points with underbars were added as synonyms to the entry points without underbars. Hence, the various MLIB subprograms that call XERBLA reference them as xerbla (without the postpended underbar). Therefore, your xerbla_ does not satisfy their CALL XERBLA, so the linker accesses xerbla.o from the library.

14 HP MLIB User’s Guide

32-bit Addressing 64-bit Addressing

PA-RISC +noppu +ppu

Itanium +ppu +ppu

Accessing VECLIB

This ﬁle includes both entry points xerbla and xerbla_, and xerbla_ conﬂicts with your XERBLA subroutine.

Use one of the following workarounds on PA-RISC if you are compiling your own Fortran version of XERBLA:

• Compile your XERBLA with +noppu. This implies that you also compile any of your program that calls XERBLA with +noppu, and that you apply this rule recursively.

• Use f90 ALIAS directives:

% !$HP$ ALIAS xerbla='xerbla'

This prevents the compiler from postpending the underbar onto the entry point and external references for XERBLA.

This problem is not conﬁned to the XERBLA subroutine. It occurs on PA-RISC when the following conditions are met:

• Any subprogram with the same name as a user-visible MLIB subprogram is compiled in your program with +ppu

• Another MLIB subprogram used by your program also calls that subprogram

Furthermore, the reverse can occur in the PA-RISC 32-bit libraries if you compile with the +ppu compiler option.

Itanium processors

If you compile on Itanium using the compiler option +noppu your xerbla.o object has an entry point xerbla (without the postpended underbar). However the Itanium MLIB libraries were compiled with +ppu option (following the default behavior described in Table 1-1) and the entry points without underbars were added as synonyms to the entry points with underbars. Hence the various MLIB Fortran source subprograms that call XERBLA reference them as xerbla_. Therefore, your xerbla (without the postpended underbar) does not satisfy the Fortran calls to XERBLA, so the linker accesses xerbla.o from the library. This ﬁle contains both entry points xerbla and xerbla_, and xerbla conﬂicts with your XERBLA subroutine.

Moreover, the various MLIB C source subprograms that call XERBLA still reference them as xerbla (without the postpended underbar). It means that both xerbla and xerbla_ entry points must be provided in your xerbla.o.

Use one of the following workarounds on Itanium if you are compiling your own Fortran version of XERBLA:

• Compile your Fortran source for XERBLA with +noppu and provide both xerbla and xerbla_ entry points:

Chapter 1 Introduction to VECLIB 15

Accessing VECLIB

SUBROUTINE XERBLA(SRNAME,INFO) ENTRY XERBLA_SRNAME,INFO)

...

• Use f90 ALIAS directives and provide both xerbla and xerbla_ entry points:

!$HP$ ALIAS xerbla='xerbla' !$HP$ ALIAS xerbla_='xerbla_' SUBROUTINE XERBLA(SRNAME,INFO) ENTRY XERBLA_(SRNAME,INFO)

...

The ALIAS directives prevent the compiler from postpending the underbar onto the entry points and external references to XERBLA.

Use the following workaround on Itanium if you are compiling your own C version of XERBLA:

• Undeﬁne any previous aliasing from include ﬁles and provide both xerbla and xerbla_ entry points explicitly:

#undef xerbla #undef xerbla_

void xerbla (char*name, int*iarg, size_t len_name){ ... }

void xerbla_ (char*name, int*iarg, size_t len_name){ xerbla (name, iarg, len_name); }

This problem is not conﬁned to the XERBLA subroutine. It occurs on Itanium when the following conditions are met:

• Any subprogram with the same name as a user-visible MLIB

subprogram is compiled in your program with

• +noppu

• Another MLIB subprogram used by your program also calls that

subprogram

PA-RISC and Itanium processors

Also note that all the workarounds listed for Itanium are more generic and can also be used for PA-RISC. Therefore, if you are coding your own version of a MLIB routine called "mlib_routine" on PA-RISC or Itanium, a Fortran version might be implemented as:

!$HP$ ALIAS mlib_routine='mlib_routine' !$HP$ ALIAS mlib_routine_='mlib_routine_'

16 HP MLIB User’s Guide

SUBROUTINE mlib_routine(...) ENTRY mlib_routine_(...) ...

And a C version might be:

#undef mlib_routine #undif mlib_routine_

void mlib_routine (...){ ... }

void mlib_routine_(...){ mlib_routine(...); }

Optimization

The key computational kernels in VECLIB have been optimized to take full advantage of both PA-RISC and Itanium tightly integrated architectures. Optimizations include:

Optimization

• Instruction scheduling to maximize the number of machine instructions executed per clock cycle

• Algorithm restructuring to increase the number of computations per memory access

• Cache management to decrease the number of data cache misses

Refer to “Accessing VECLIB” on page 3 for instructions on linking the desired library with your program.

Parallel processing

Parallel processing is available on multi-processor HP platforms running the HP-UX 11i or greater operating system and Linux. These systems can divide a single computational process into small streams of execution, called threads. The result is that you can have more than one processor executing on behalf of the same process. Also, refer to Appendix C, “Parallelized Subprograms,” for a list of HP VECLIB parallelized subprograms.

Chapter 1 Introduction to VECLIB 17

Parallel processing

You can enable or disable parallel processing at link time or at runtime. A program does not use parallelism in VECLIB unless parallel processing is enabled at both link time and at runtime.

Linking for parallel or non parallel processing

To enable parallel processing at link time, your link step must produce a multithreaded executable. On HP-UX systems, use the + compiler options to get a multithreaded executable when you link with the HP Fortran compiler; use +O3 and +Oopenmp when you link with the HP C compiler; and use +Oopenmp when you link with the HP aC++ compiler:

f90 [options including +O3 +Oparallel] ﬁle ... –Wl,–aarchive_shared −lveclib

[options including +O3 +Oopenmp] ﬁle ... –Wl,–aarchive_shared −lveclib −lcl

−lm [options including +Oopenmp] ﬁle ... –Wl,–aarchive_shared −lveclib −lcl

aCC

−lm

To disable VECLIB’s automatic parallelism at link time, omit the +Oparallel and +Oopenmp options:

f90 [options] ﬁle ... –Wl,–aarchive_shared −lveclib

[options] ﬁle ... –Wl,–aarchive_shared −lveclib −lcl −lm

aCC

VECLIB for Linux is always multithreaded enabled.

O3 and +Oparallel

Controlling VECLIB parallelism at runtime

When you enable parallelism at link time, three methods are available at runtime to specify the extent of parallel processing in MLIB.

• Use MLIB_NUMBER_OF_THREADS, a shell environment variable that

allows you to enable parallelism within MLIB subprograms and to specify the maximum number of threads that can be used in parallel regions.

Not setting MLIB_NUMBER_OF_THREADS has the same result as setting it to 1; that is, parallel processing is disabled within MLIB subroutines. Setting MLIB_NUMBER_OF_THREADS to the number of CPUs in the system, or greater, allows parallelized MLIB subprograms to use as many CPUs as are available to the process.

The following command lines show the C shell syntax and Korn shell syntax to use when setting the variable to eight processors:

For C shell:

setenv MLIB_NUMBER_OF_THREADS 8

18 HP MLIB User’s Guide

Parallel processing

For Korn shell:

export MLIB_NUMBER_OF_THREADS=8

MLIB_NUMBER_OF_THREADS is examined on the ﬁrst call to a parallelized VECLIB subprogram to establish the default parallel action within VECLIB.

Use the subroutine MLIB_SETNUMTHREADS to restore VECLIB parallel processing to its run-time default that was speciﬁed by MLIB_NUMBER_OF_THREADS. Refer to the mlib_setnumthreads(3m) man page for usage information.

• Use the subroutine MLIB_SETNUMTHREADS. You can call this subroutine at any time to set the maximum number of

parallel threads used in subsequent VECLIB or LAPACK calls. The speciﬁed value overrides the absence of the MLIB_NUMBER_OF_THREADS environment variable or any value assigned to it.

• To modify the effect of the MLIB_NUMBER_OF_THREADS and MLIB_SETNUMTHREADS, you can also use MP_NUMBER_OF_THREADS at runtime. When MP_NUMBER_OF_THREADS is set to a value smaller than MLIB_NUMBER_OF_THREADS and MLIB_SETNUMTHREADS, the value of MP_NUMBER_OF_THREADS takes precedence and limits the amount of parallelism to MP_NUMBER_OF_THREADS.

These controls set the maximum amount of SMP parallelism that your program can use, and the VECLIB-speciﬁc mechanisms offer ﬁner control within that maximum. Refer to “Performance beneﬁts” below for more information.

Performance beneﬁts

If VECLIB parallelism is enabled, each parallelized VECLIB subprogram determines at runtime if multiple processors are available. If multiple processors are available, VECLIB detects whether the program is already using multiple threads. If the application calling VECLIB is not running multiple threads, VECLIB will attempt to use all available threads (limited by MLIB_NUMBER_THREADS). If the application calling VECLIB is already running multiple threads, VECLIB will attempt to use the remaining threads without over-subscribing. This is a form of nested parallelism.

If you are using an HP server with multiple processors, you can realize the performance beneﬁts of parallel processing in three ways:

• Call any parallelized VECLIB subprogram. Let it use parallelism internally if it determines that it is appropriate to do so based on such factors as problem size, system conﬁguration, and user environment.

Chapter 1 Introduction to VECLIB 19

Parallel processing

• Call VECLIB subprograms in a parallelized loop or region. VECLIB

• Use the Message Passing Interface (MPI) explicit parallel model. Refer to

VECLIB subprograms are reentrant, meaning that they may be called several times in parallel to do independent computations without one call interfering with another. You can use this feature to call VECLIB subprograms in a parallelized loop or region.

The compiler does not automatically parallelize loops containing a function reference or subroutine call. You can force it to parallelize such a loop by deﬁning OpenMP parallel regions.

For example, the following Fortran code makes parallel calls to subprogram DAXPY:

supports nested parallelism where the outer parallelism is implemented through OpenMP while the inner parallelism is implemented with VECLIB SMP parallelism. To use this mechanism, you must be familiar with the techniques of parallel processing. Refer to the Parallel Programming Guide for HP-UX Systems for details.

the HP MPI User’s Guide or the MPI(1) man page for details.

NTHREADS = 4 C$OMP PARALLEL DO NUM_THREADS(NTHREADS) DO J=1, N CALL DAXPY (N-I,A(I,J),A(I+1,I),1,A(I+1,J),1) ENDO C$OMP END PARALLEL DO

While optimizing a parallel program, you may want to make parallel calls to a VECLIB subprogram to execute independent operations where the call statements are not in a loop. OpenMP supports the PARALLEL and END PARALLEL directions that deﬁne a block of code that is to be executed by multiple threads in parallel.

OpenMP-based nested parallelism

Nested parallelism can be achieved when calling VECLIB parallelized subprograms from an OpenMP parallel region. (See “Parallelized subprograms in VECLIB” on page 1104.) Consider the following code running on an HP platform with at least four processors:

... call omp_set_nested (.true.) c$omp parallel NUM_THREADS(2) myid = omp_get_thread_num

if (myid.eq.0) then call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta, c,ldc)

else

call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,

f,ldf)

20 HP MLIB User’s Guide

Parallel processing

endif c$omp end parallel call omp_set_nested(.false.) ...

Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one OpenMP thread for

C αAB βC+=

and another for

F αDE βF+=

Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.

If a parallel VECLIB subprogram is called from a parallelized loop or region, VECLIB will automatically avoid over-subscription of the CPUs. The number of threads spawned by each call to a parallelized VECLIB subroutine on a nested parallel region is limited by:

• MLIB_NUMBER_OF_THREADS

• The number of threads still available in the system

• will never be larger than four. Speciﬁcally:

MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)

Message passing-based nested parallelism

Nested parallelism can be achieved when calling VECLIB parallelized subprograms from an MPI process. (See “Parallelized subprograms in VECLIB” on page 1104.) Consider the following code:

... call mpi_init (ierr) call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)

if (myid.eq.0) then call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta, c,ldc)

else

call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,

f,ldf)

endif

...

Chapter 1 Introduction to VECLIB 21

Parallel processing

Assume the application started on two MPI processes. Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one MPI process for

C αAB βC+=

and another for

F αDE βF+=

Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.

Default CPS library stack is too small for MLIB

In libcps, the HP Compiler Parallel Support library, a CPS thread has a default stack size of 8M bytes. For performance reasons, several subprograms in HP MLIB use the stack for temporary arrays that exceed the default value. Using the default CPS stack size, these routines overwrite neighboring stacks, resulting in errors that are difﬁcult to diagnose.

The solution is to change the CPS thread stacksize attribute to a value that is large enough to accommodate all the MLIB subprograms the thread may encounter. Currently, 8 MB*(the number of threads) should be sufﬁcient for all MLIB subprograms.

The environment variable CPS_STACK_SIZE expects values in K bytes. Setting the stack size as follows would be sufﬁcient for programs that execute on two threads:

For C shell:

% setenv CPS_STACK_SIZE 16384

For Korn shell:

% export CPS_STACK_SIZE=16384

Default Pthread library stack is too small for MLIB

The stack allocated for each new thread created using direct pthread calls to “pthread_create” might not be large enough for HP MLIB. Several subprograms in HP MLIB use the stack for storing temporary work arrays and improve performance. If the stack size is not large enough, these routines overwrite neighboring stacks, resulting in errors that are difﬁcult to diagnose.

22 HP MLIB User’s Guide

Currently, 8 MB*(the number of threads) should be sufﬁcient for all MLIB subprograms. If your application launchs threads directly from pthread library calls, set the minimum allocated stack size to match HP MLIB needs on each new thread. Setting the stack size on HP-UX as follows would be sufﬁcient for programs that execute on two threads:

int stacksize = 8388608;

(...)

pthread_attr_setstacksize(&thread_attr, stacksize);

pthread_create(&thread_id, &thread_attr, thread_func, &thread_parm);

(...)

Roundoff effects

VECLIB subprograms may use a different arithmetic order of evaluation than other implementations. Different roundoff characteristics may result. Accuracy of results is usually similar to other implementations, so using VECLIB should not affect the accumulation of roundoff errors in a complete application. If it does, examine the mathematical analysis of the problem to determine if it is ill-conditioned. Ill-conditioned means that the small roundoff errors that are inadvertently introduced into any computation are magniﬁed out of proportion to the desired result. Similarly, if results with and without VECLIB differ materially, both sets of answers are probably inaccurate and you should investigate further. If the program correctly applies stable computational algorithms, the problem itself is probably ill-posed.

Roundoff effects

Data types and precision

In general, VECLIB provides the same range of functionality for both real and complex data. For most computations, there are matching subprograms for real and complex data, but there are a few exceptions.

For example, corresponding to the subprograms for real dot products, there are subprograms for complex dot products in both the conjugated and unconjugated forms because both types of complex dot products occur. However, there is no complex analogue of the subprograms for solving a real symmetric sparse linear system.

Chapter 1 Introduction to VECLIB 23

VECLIB naming convention

Matching subprograms for real and complex data have been coded to maintain a close correspondence between the two. However, in some areas, the correspondence is necessarily weaker, and this has not been possible.

Subprograms in VECLIB are provided in both 32-bit and 64-bit addressing versions, except in the VECLIB8 library where only 64-bit addressing is supported.

VECLIB naming convention

The name of each VECLIB subprogram is a coded speciﬁcation of its function within the limits of standard Fortran 77 6-character names. Usually, the ﬁrst character of a subprogram name, denoted by T, shows the predominant data type according to Table 1-3:

Table 1-3 VECLIB Naming Convention—Data Type

T Data Type VECLIB VECLIB8

S Single Precision REAL*4 REAL*4 D Double Precision REAL*8 REAL*8

I Integer INTEGER*4 INTEGER*8 C Complex COMPLEX*8 COMPLEX*8 Z Double Complex COMPLEX*16 COMPLEX*16

Basic Linear Algebra Subprograms, that is, level 1, 2, and 3 BLAS, as well as the Sparse BLAS use this naming convention. For example, subprograms that compute the sum of vector elements are named according to data type: SSUM, DSUM, ISUM, CSUM, and ZSUM. Some function subprograms use two of these letters, the ﬁrst describing the data type of the function and the second indicating the type of data on which it operates.

Fortran 77 BLAS Standard subprograms use a similar convention, with F_ prepended to each routine name.

For example, the legacy BLAS single-precision, triangular-solve routine is named STRSM and its BLAS Standard counterpart is named F_STRSM. BLAS Standard subprograms that compute the sum of vector elements are named F_SSUM, F_DSUM, F_CSUM, and F_ZSUM. Refer to “Legacy BLAS routines” on page 211 and “BLAS Standard routines” on page 339 for more information.

C BLAS Standard subprograms begin with C_ prepended to each routine name.

24 HP MLIB User’s Guide

Data type and byte length

There is a relationship between the data type of a subprogram, designated by the ﬁrst character of its name (refer to T in Table 1-3), and the byte lengths of its arguments. This relationship is shown in Table :

MLIB Argument Lengths

Data type and byte length

S4 8 4 8 D4 8 8 16 C4 8 4 8

Z4 8 8 16

VECLIB INTEGER LOGICAL

Operator arguments

Some BLAS routines take input-only arguments called operators that allow for the speciﬁcation of multiple related operations to be performed by a single function. Operator arguments used by the BLAS Standard routines are NORM,

SORT, SIDE, UPLO, TRANS, CONJ, DIAG

deﬁned as follows:

norm Used by routines computing the norm of a vector or

sort Used by sorting routines.There are two valid values

side Used by functions computing the product of two

VECLIB8 INTEGER LOGICAL

, and JROT. Their meanings are

matrix. There are seven valid values that specify the norm to be computed, namely one-norm, real one-norm, inﬁnity-norm and real inﬁnity-norm for vectors and matrices, two-norm for vectors, and Frobenius-norm, max-norm, and real max-norm for matrices.

that specify whether the data should be speciﬁed in increasing or decreasing order.

matrices A and B. There are two valid values that specify whether A*B or B*A should be computed.

REAL COMPLEX

Chapter 1 Introduction to VECLIB 25

Operator arguments

uplo Refers to triangular matrices. There are two valid

values to specify whether a matrix is upper or lower triangular.

trans Used by routines applying a matrix, say A, to another

vector or another matrix. There are three valid values to specify whether the matrix (A), its transpose (AT), or

its conjugate transpose (A*) should be applied. op(A)

refers to A, A

, or A* depending on the input value of the trans operator argument. Some BLAS routines have more than one

trans operator. For example, a

general matrix multiply operation can be speciﬁed as

CopA()op B()←

where A, B, and C are general matrices. A trans argument is needed for each of the input matrices A and B. These arguments are denoted transA and

transB.

conj Used by complex routines operating with x or

diag Refers to triangular matrices. Two values are valid to

specify whether the triangular matrix has unit-diagonal or not.

jrot Used by the routine to generate Jacobi rotations. There

are three valid values to specify whether the rotation is an inner rotation, an outer rotation, or a sorted rotation.

For BLAS Standard routines, specify an operator argument with a named constant value. Table 1-4 on page 27 lists the operator arguments and their associated named constants.

The actual numeric value assigned to the named constant is deﬁned in the appropriate language include ﬁle. For example, the f77blas.h include ﬁle deﬁnes assigned values for Fortran 77.

Table 1-4 lists the operator arguments and their associated named constants.

26 HP MLIB User’s Guide

Table 1-4 BLAS Standard Operator Arguments

Operator arguments

Operator

Argument

norm

sort blas_increasing_order sort in increasing order

side blas_left_side operate on the left hand side

uplo blas_upper reference upper triangle only

transx blas_trans

conj blas_conj

diag blas_non_unit_diag non-unit triangular

jrot blas_jrot_inner

Named Constant Meaning

blas_one_norm 1-norm

blas_real_one_norm real 1-norm

blas_two_norm 2-norm

blas_frobenius_norm Frobenius-norm

blas_real_inf_norm real inﬁnity-norm

blas_max_norm max-norm

blas_real_max_norm real-norm

blas_decreasing_order sort in decreasing order

blas_right_side operate on the right hand side

blas_conj_trans

blas_unit_diag unit triangular

blas_jrot_outer

blas_jrot_sorted

blas_inf_norm inﬁnity-norm

blas_lower reference lower triangle only

operate with x

blas_no_trans operate with x

-------

∗

-------

≥

operate with

blas_no_conj operate with x

inner rotation

0 c

outer rotation

sorted rotation

≤≤

abs a() abs b()≥

Chapter 1 Introduction to VECLIB 27

Error handling

VECLIB subprograms are divided into two classes according to the way they detect and report usage errors:

• Low-level subprograms

• High-level subprograms

Low-level subprograms

Low-level subprograms are only minimally capable of detecting or handling errors. These subprograms attempt to do what is reasonable when a usage error occurs, but they do not warn you that something is wrong. For example, SDOT, which computes the dot product of two dense real vectors of length N, gives the mathematically proper result, zero, when N ≤ 0. This is considered mathematically correct because, by deﬁnition, an empty sum is zero. Because SDOT conforms to the published BLAS argument list and usage, however, SDOT does not notify you that you may have made a mistake. Issuing a warning would add severe usage conﬂicts with codes that already use the BLAS.

Because these low-level subprograms do what is reasonable, you can simplify a program and perhaps speed it up by using VECLIB subprograms without checking for special cases. For example, relying on SDOT having a zero result for N ≤ 0 may eliminate an IF statement that would take the same branch almost every time through a loop.

High-level subprograms

High-level subprograms detect and report errors. These subprograms usually do more work than the low-level subprograms, so the relative expense of checking and reporting errors may be small. Some possible errors are:

• Argument value errors, such as negative matrix order

• Computational errors, such as a singular matrix

• Computational problems, such as ill-conditioning When a high-level subprogram detects an error, it responds with a success/error

code, usually with the output argument ier. The convention used for ier is:

• ier = 0: Successful exit

• ier < 0: Invalid value of an argument—computation not completed

28 HP MLIB User’s Guide

Troubleshooting

• ier > 0: Failure during the computation Some VECLIB subprograms do not have a success/error code in their argument

lists, but instead call another VECLIB subprogram to process the error condition. MLIB provides the following error handlers:

• XERBLA

• XERVEC

• F_BLASERROR Refer to the documentation for individual VECLIB subprogram to determine if

one of these error handlers is used. For example, all BLAS Standard subprograms (those subprograms whose names begin with F_) use F_BLASERROR.

The standard versions of XERBLA, XERVEC, and F_BLASERROR write an error message onto the standard error ﬁle. Execution is then terminated with a nonzero exit status. You can supply a version of the error handler that alters this action. Refer to “XERBLA” on page 337, “XERVEC” on page 623, and “F_BLASERROR” on page 605 for more information about these routines.

Routine-speciﬁc error conditions are listed in the respective subprogram documentation.

Troubleshooting

The following are suggestions to help you ﬁnd the cause of a problem:

1. Verify that the subprogram usage in the program matches the subprogram speciﬁcations in the documentation. Pay attention to the number of arguments in the CALL statement and to the declarations of arrays and integer constants or variables that describe them. Write out all the arguments immediately before and after the CALL statement.

2. Make sure there really is a problem. For example, if you get an apparently incorrect answer, check if the answer satisﬁes the problem as deﬁned in the program. For problems with more than one answer, VECLIB may produce a different answer or give the answers in a different order than expected. If the problem is ill-conditioned, VECLIB may not be able to compute a reliable answer. Error messages often suggest the cause of the problem.

3. Isolate the problem. If possible, write a small test program that encounters the same difﬁculty. For example, write the data causing the problem from

Chapter 1 Introduction to VECLIB 29

HP MLIB man pages

the original program into the small one. In this way, you eliminate extraneous code from suspicion. If the problem area is large, try to pare it to a manageable size. For example, if a 50-by-50 linear system fails, try to produce a 2-by-2 system that fails in the same way.

HP MLIB man pages

The HP MLIB man pages contain online documentation that includes information from the HP MLIB User’s Guide.

The HP MLIB man pages are installed in the directory /opt/mlib/share/man. Set this path in your MANPATH environment variable to access man pages for VECLIB and LAPACK.

HP VECLIB man pages provide:

• An introduction to VECLIB (veclib (3m))

• An introduction to each group of subprograms, for example, blas2(3m)

• A man page for each subprogram The HP MLIB User’s Guide has more detailed information than the man pages

because of the limited number of fonts supported and the difﬁculty of presenting mathematical equations in the man(1) system.

For further explanation and a table of contents of reference entries for VECLIB, refer to the veclib(3) man page.

30 HP MLIB User’s Guide

2 Basic Vector Operations

Overview

This chapter explains how to use the VECLIB vector subprograms that serve as building blocks for many user programs. It describes subprograms for performing dense and sparse vector operations. This set of VECLIB subprograms includes:

• Basic Linear Algebra Subprograms

• Sparse BLAS

• Hewlett-Packard extensions to the BLAS

• BLAS Standard

The term BLAS, as used in this section, refers to all of the above listed BLAS and the Hewlett-Packard extensions to the BLAS.

BLAS standardization efforts, supported by software and hardware vendors and university groups, began with a BLAS Technical (BLAST) Forum meeting in November 1995 at the University of Tennessee. The efforts of the BLAST Forum resulted in a BLAS Standard speciﬁcation in 1999.

HP MLIB 8.5 supports a subset of the BLAS Standard routines. This chapter describes dense and banded BLAS Standard functionality in “BLAS Standard routines” on page 152.

Overview

This section discusses commonly used or computationally expensive operations of linear algebra. Even though you can code most of these operations in fewer than 10 lines of Fortran or C, using VECLIB subprograms can improve program performance as well as program modularity and readability. However, in some situations, you can achieve better computational performance by entering Fortran or C code than by calling one of these subprograms.

Chapter 2 Basic Vector Operations 31

Chapter objectives

After reading this chapter you will:

• Understand BLAS storage conventions

• Know how to specify array sections

• Know how to handle backward storage

• Know how to use increment (also called stride) arguments

• Understand the vector subprograms included with HP VECLIB, both Legacy BLAS and BLAS Standard subprograms

Associated documentation

The following documents provide supplemental material for this chapter: Dodson, D.S., R.G. Grimes, and J.G. Lewis. “Sparse Extensions to the Fortran

Basic Linear Algebra Subprograms.” ACM Transactions on Mathematical Software. June, 1991. Vol. 17, No. 2.

Dodson, D.S., R.G. Grimes, and J.G. Lewis. “Algorithm 692: Model Implementation and Test Package for the Sparse Basic Linear Algebra Subprograms.” ACM Transactions on Mathematical Software. June, 1991. Vol. 17, No. 2.

Lawson, C., R. Hanson, D. Kincaid, and F. Krogh. “Basic Linear Algebra Subprograms for Fortran Usage.” ACM Transactions on Mathematical Software. September, 1979. Vol. 5, No. 3.

32 HP MLIB User’s Guide

What you need to know to use vector subprograms

The following sections describe overall considerations for using vector subprograms:

• BLAS storage conventions

❏ Fortran storage of arrays ❏ Fortran array argument association

• BLAS indexing conventions

❏ Forward storage ❏ Backward storage ❏ Increment arguments

• Operator arguments in the BLAS Standard

• Representation of a permutation matrix

• Representation of a Householder matrix

BLAS storage conventions

The Basic Linear Algebra Subprograms (BLAS) were developed to enhance the portability of published linear algebra codes. In particular LAPACK, the high-level public-domain linear equation package, uses the BLAS. When your routines use the VECLIB BLAS, you increase the efﬁciency of those routines on your Hewlett-Packard servers.

You need not limit your use of the VECLIB BLAS to linear algebra codes. Because these subprograms are portable, modular, and efﬁcient, you can incorporate them into your programs. To realize the full power of the BLAS, you should understand storage and indexing conventions.

Chapter 2 Basic Vector Operations 33

What you need to know to use vector subprograms

Fortran storage of arrays

Two-dimensional arrays in Fortran are stored by columns. Consider the following speciﬁcations:

DIMENSION A(N1,N2),B(N3) EQUIVALENCE (A,B)

where N3 = N1 X N2. Then A(I,J) is associated with the same memory location as B(K) where

K = I + (J−1) × N1

Successive elements of a column of A are adjacent in memory, while successive elements of a row of A are stored with a difference of N1 storage units between them. Remember that the size of a storage unit depends on the data type.

Fortran array argument association

When a Fortran subprogram is called with an array element as an argument, the value is not passed. Instead, the subprogram receives the address in memory of the element. Consider the following code segment:

REAL A(10,10) J = 3 L = 10 CALL SUBR (A(1,J),L) . . . SUBROUTINE SUBR (X,N) REAL X(N) . . .

SUBR is given the address of the ﬁrst element of the third column of A. Because it treats that argument as a one-dimensional array, successive elements X(1), X(2),..., occupy the same memory locations as the successive elements of the

third column of A, that is, A(1,3), A(2,3),... Hence, the entire third column of A is available to the subprogram.

34 HP MLIB User’s Guide

BLAS indexing conventions

This section describes handling stride arguments and forward and backward storage.

A vector in the BLAS is deﬁned by three quantities:

• Vector length

• Array or starting element within an array

• Increment, sometimes called stride—deﬁnes the number of storage units between successive vector elements

Forward storage

Suppose that X is a real array. Let N be the vector length and let INCX be the increment. Suppose that a vector x with components xi, i= 1, 2, ..., N, is stored

in X. If INCX ≥ 0, then xi is stored in X(1 + (I−1) × INCX). This is forward storage starting from X(1) with stride

equal to INCX, ending with X(1 + (N−1) × INCX). Thus, if N = 4 and INCX = 2, the vector components x1, x2, x3, and x4 are stored in the array

elements X(1), X(3), X(5), and X(7), respectively.

What you need to know to use vector subprograms

Backward storage

Some BLAS subprograms permit the backward storage of vectors, which is speciﬁed by using a negative INCX. If INCX < 0, then xi is stored in X(1 + (N−I) × |INCX|) or equivalently in X(1 − (N−I) × INCX). This is backward storage starting from X(1 − (N−1) × INCX) with stride equal to INCX, ending with X(1). Thus, if N = 4 and INCX = −2, the vector components

x1, x2, x3, and x4 are stored in the array elements

X(7), X(5), X(3), and X(1), respectively.

INCX = 0 is only permitted by COPY routines in the legacy BLAS subprograms.

When INCX = 0 is allowed, it means that x is a vector of length N, whose components all equal the value of X(1).

The notation (N,X,INCX) describes a BLAS vector. For example, if X is an array of dimension N, then (N,X,1) represents forward storage and (N,X,−1) represents backward storage. If A is an M-by-N array, then (M,A(1,J),1) represents column J and (N,A(I,1),M) represents row I. Finally, if an M-by-N matrix is embedded in the upper left-hand corner of an array B of size LDB by NMAX, then column J is (M,B(1,J),1) and row is I(N,B(I,1),LDB).

Chapter 2 Basic Vector Operations 35

What you need to know to use vector subprograms

Increment arguments

The following examples illustrate how to use increment arguments to perform different operations with the same subprogram. These examples use the function F_SDOT with the following usage:

SUBROUTINE F_SDOT (CONJ, N, ALPHA, X, INCX, BETA, Y, INCY, R) INTEGER CONJ, INCX, INCY, N REAL*4 ALPHA, BETA, R, X(*), Y(*)

This usage adds the scaled dot product of the vectors X( * ) and Y( * ) to a scaled scalar R.

Example 1

Compute the dot product

R = X(1)*Y(1) + X(2)*Y(2) + X(3)*Y(3) + X(4)*Y(4):

REAL*4 ALPHA,BETA,R,X(*),Y(*) SUBROUTINE F_SDOT (CONJ, 4, 1.0, X, 1, 0.0, Y, 1)

Example 2

Compute the dot product

T = X(1)*Y(4) + X(2)*Y(3) + X(3)*Y(2) + X(4)*Y(1):

REAL*4 ALPHA,BETA,R,X(*),Y(*) SUBROUTINE F_SDOT (CONJ, 4, 1.0, X, 1, 0.0, Y, -1)

Example 3

Compute the dot product

Y(2) = A(2,1)*X(1) + A(2,2)*X(2) + A(2,3)*X(3)

This is the dot product of the second row of an M-by-3 matrix A, stored in a 10-by-3 array, with a 3-vector X:

PARAMETER (LDA = 10) REAL*4 SDOT,A(LDA,3),X(3),Y(LDA) N = 3

Y(2) = SDOT (N, A(2,1),LDA, X,1)

Operator arguments in the BLAS Standard

Some routines in the BLAS Standard take input-only arguments called operators. Operators allow for the speciﬁcation of multiple related operations to be performed by a single function. The BLAS Standard speciﬁes the type and the valid values these arguments should have according to the speciﬁc programming language.

Operator arguments used by the BLAS Standard routines are NORM, SORT,

SIDE, UPLO, TRANS, CONJ, DIAG,

on page 25 for explanations of the valid operator values.

36 HP MLIB User’s Guide

and JROT. Refer to “Operator arguments”

In BLAS Standard routines, you specify an operator argument with a named constant value. The actual numeric value assigned to the named constant is deﬁned in the appropriate language’s include ﬁle. Operator arguments are represented in the Fortran 77 interface as INTEGERs. This speciﬁcation is different from the legacy BLAS, where operator arguments are deﬁned as CHARACTER*1.

Refer to individual routines in “BLAS Standard routines” on page 152 for the named constants you can use to specify operator arguments for basic vector subprograms.

Representation of a permutation matrix

This section explains how the BLAS Standard represents a permutation matrix.

An n-by-n permutation matrix P is represented as a product of at most n interchange permutations. An interchange permutation E is a permutation obtained by swapping two rows of the identity matrix. An efﬁcient way to represent a general permutation matrix P is with an integer vector p of length n. In other words, P = En...E1 and each Ei is the identity with rows i and p interchanged:

What you need to know to use vector subprograms

For i = n to 1 and incp < 0,

For i = 1 to n and incp > 0,

xi() xpi()()↔

Chapter 2 Basic Vector Operations 37

What you need to know to use vector subprograms

Representation of a Householder matrix

This section explains how the BLAS Standard represents a Householder matrix.

An elementary reﬂector (or elementary Householder matrix) H of order n is a unitary matrix of the form

HIτυυ

where τ is a scalar, and υ is an n-vector, with

υ is often referred to as the Householder vector. Often, υ can have leading or

trailing zero elements, but for the purposes of this discussion, assume that H has no special structure.

This representation sets υ1 = 1, meaning that υ1 need not be stored. In real arithmetic, except that τ = 0 implies H = I. This representation agrees with that used in LAPACK.

In complex arithmetic, τ may be complex, and satisﬁes

1 τ 2,≤≤

1 Re τ() 2 and τ 1–1≤≤≤

–=

2Re τ()=

Thus a complex H is not Hermitian, as with other representations, but it is unitary. The latter is the important property. The advantage of allowing τ to be complex is that, given an arbitrary complex vector x, H can be computed so that

with real β. This is useful, for example, when reducing a complex Hermitian matrix to a real symmetric tridiagonal matrix, or a complex rectangular matrix to real bidiagonal form.

38 HP MLIB User’s Guide

H∗x β 10… 0,, ,()

Subprograms for basic vector operations

The following sections in this chapter describe the vector subprograms included with VECLIB:

• Legacy BLAS routines

• BLAS Standard routines

Note that the speciﬁcation for operator arguments is different in legacy BLAS routines than in BLAS Standard routines. Operator arguments are represented in the BLAS Standard Fortran 77 interface as INTEGERs; in the legacy BLAS they are deﬁned as CHARACTER*1.

In BLAS Standard routines, you specify an operator argument with a named constant value. Refer to the individual routines in “BLAS Standard routines” on page 152 for the named constants you can use to specify operator arguments. The actual numeric value assigned to the named constant is deﬁned in the f77blas.h include ﬁle.

Chapter 2 Basic Vector Operations 39

ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes



Legacy BLAS routines

Name ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX

Index of maximum of magnitudes

Purpose Given a real or integer vector x of length n, ISAMAX, IDAMAX, or IIAMAX

determines the index of the element of the vector of maximum magnitude. Speciﬁcally, the subprograms determine the smallest index i such that



max

Given a complex vector x of length n, ICAMAX or IZAMAX determines the smallest index i:

Re xi() Im xi()+ max Re xj() Im xj()+:j 12… n,, ,=()=

where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual deﬁnition of complex magnitude is

: j 12… n,, ,=



Re xi()2Im xi()

 

This deﬁnition is not used because of computational speed. If the index i is used for pivot selection in matrix factorization, no signiﬁcant difference in numerical stability should result.

The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 i, ISAMAX, n, incx REAL*4 x(lenx)

i = ISAMAX(n, x, incx)

INTEGER*4 i, IDAMAX, n, incx REAL*8 x(lenx)

i = IDAMAX(n, x, incx)

INTEGER*4 i, IIAMAX, n, incx, x(lenx) i = IIAMAX(n, x, incx)

40 HP MLIB User’s Guide

12⁄

Index of maximum of magnitudes ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX

INTEGER*4 i, ICAMAX, n, incx COMPLEX*8 x(lenx)

i = ICAMAX(n, x, incx)

INTEGER*4 i, IZAMAX, n, incx COMPLEX*16 x(lenx)

i = IZAMAX(n, x, incx)

VECLIB8:

INTEGER*8 i, ISAMAX, n, incx REAL*4 x(lenx)

i = ISAMAX(n, x, incx)

INTEGER*8 i, IDAMAX, n, incx REAL*8 x(lenx)

i = IDAMAX(n, x, incx)

INTEGER*8 i, IIAMAX, n, incx, x(lenx) i = IIAMAX(n, x, incx)

INTEGER*8 i, ICAMAX, n, incx COMPLEX*8 x(lenx)

i = ICAMAX(n, x, incx)

INTEGER*8 i, IZAMAX, n, incx COMPLEX*16 x(lenx)

i = IZAMAX(n, x, incx)

Input n Number of elements of vector x to be used. If n ≤ 0, the

subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1) ×|incx|+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the

element of x of maximum magnitude.

Chapter 2 Basic Vector Operations 41

ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes

Fortran Equivalent

INTEGER*4 FUNCTION ISAMAX (N,X,INCX) REAL*4 X(*),TEMP,XMAX ISAMAX = 1 IF ( N .GT. 1 ) THEN XMAX = ABS ( X(1) ) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N TEMP = ABS ( X(IX) ) IF ( TEMP .GT. XMAX ) THEN ISAMAX = I XMAX = TEMP END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISAMAX = 0 END IF RETURN END

Example Locate the largest element of a REAL*8 vector x, where x is a vector 10

elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 I,IDAMAX,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDAMAX (N,X,INCX)

42 HP MLIB User’s Guide

Index of minimum of magnitudes ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN

Name ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN

Index of minimum of magnitudes

Purpose Given a real or integer vector x of length n, ISAMIN, IDAMIN, or IIAMIN

determines the index of element of the vector of minimum magnitude. Speciﬁcally, the subprograms determine the smallest index i such that

min xj: j 12… n,, ,=()=

Given a complex vector x of length n, ICAMIN or IZAMIN determines the smallest index i:

 

|Re(xi)|+|Im(xi)| min

where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual deﬁnition of complex magnitude is



)|+|Im(xj)| : j 1, 2, ..., n=

|Re(x



Re x

()2Im xi()





This deﬁnition is not used because of computational speed. The vector can be stored in a one-dimensional array or in either a row or a

column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 i, ISAMIN, n, incx REAL*4 x(lenx)

i = ISAMIN(n, x, incx)

INTEGER*4 i, IDAMIN, n, incx REAL*8 x(lenx)

i = IDAMIN(n, x, incx)

INTEGER*4 i, IIAMIN, n, incx, x(lenx) i = IIAMIN(n, x, incx)

INTEGER*4 i, ICAMIN, n, incx COMPLEX*8 x(lenx)

i = ICAMIN(n, x, incx)

INTEGER*4 i, IZAMIN, n, incx COMPLEX*16 x(lenx)

i = IZAMIN(n, x, incx)

12⁄

Chapter 2 Basic Vector Operations 43

ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN Index of minimum of magnitudes

VECLIB8:

INTEGER*8 i, ISAMIN, n, incx REAL*4 x(lenx)

i = ISAMIN(n, x, incx)

INTEGER*8 i, IDAMIN, n, incx REAL*8 x(lenx)

i = IDAMIN(n, x, incx)

INTEGER*8 i, IIAMIN, n, incx, x(lenx) i = IIAMIN(n, x, incx)

INTEGER*8 i, ICAMIN, n, incx COMPLEX*8 x(lenx)

i = ICAMIN(n, x, incx)

INTEGER*8 i, IZAMIN, n, incx COMPLEX*16 x(lenx)

i = IZAMIN(n, x, incx)

Input n Number of elements of vector x to be used. If n ≤ 0, the

subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the

element of x of minimum magnitude.

44 HP MLIB User’s Guide

Index of minimum of magnitudes ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN

Fortran Equivalent

INTEGER*4 FUNCTION ISAMIN (N,X,INCX) REAL*4 X(*),TEMP,XMIN ISAMIN = 1 IF ( N .GT. 1 ) THEN XMIN = ABS ( X(1) ) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N TEMP = ABS ( X(IX) ) IF ( TEMP .LT. XMIN ) THEN ISAMIN = I XMIN = TEMP END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISAMIN = 0 END IF RETURN END

Example Locate the smallest element of a REAL*8 vector x, where x is a vector 10

elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 I,IDAMIN,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDAMIN (N,X,INCX)

Chapter 2 Basic Vector Operations 45

ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements

Name ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx

Count selected vector elements

Purpose Given a real, integer, or complex vector x of length n, these subprograms count

the number of elements of the vector that satisfy a speciﬁed relationship to a given scalar a.

The last two characters of the subprogram name specify the relation of interest between the elements of the vector and the scalar. For real and integer subprograms, these characters, represented by “xx” in the prototype Fortran statements, and the corresponding function values can be:

xx Function value

EQ GE GT LE

#{i : x #{i : x #{i : x #{i : x #{i : x #{i : x

For complex subprograms, these characters and corresponding function values are:

= a} ≥ a} > a} ≤ a} < a} ≠ a}

The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 i, ISCTxx, n, incx REAL*4 a, x(lenx)

i = ISCTxx(n, x, incx, a)

INTEGER*4 i, IDCTxx, n, incx REAL*8 a, x(lenx)

i = IDCTxx(n, x, incx, a)

INTEGER*4 i, IICTxx, n, incx, a, x(lenx) i = IICTxx(n, x, incx, a)

46 HP MLIB User’s Guide

xx Function value

EQ NE

#{i : x #{i : x

= a} ≠ a}

Count selected vector elements ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx

INTEGER*4 i, ICCTxx, n, incx COMPLEX*8 a, x(lenx)

i = ICCTxx(n, x, incx, a)

INTEGER*4 i, IZCTxx, n, incx COMPLEX*16 a, x(lenx)

i = IZCTxx(n, x, incx, a)

VECLIB8:

INTEGER*8 i, ISCTxx, n, incx REAL*4 a, x(lenx)

i = ISCTxx(n, x, incx, a)

INTEGER*8 i, IDCTxx, n, incx REAL*8 a, x(lenx)

i = IDCTxx(n, x, incx, a)

INTEGER*8 i, IICTxx, n, incx, a, x(lenx) i = IICTxx(n, x, incx, a)

INTEGER*8 i, ICCTxx, n, incx COMPLEX*8 a, x(lenx)

i = ICCTxx(n, x, incx, a)

INTEGER*8 i, IZCTxx, n, incx COMPLEX*16 a, x(lenx)

i = IZCTxx(n, x, incx, a)

Input n Number of elements of vector x to be compared to a. If

n ≤ 0, the subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

a The scalar a.

Chapter 2 Basic Vector Operations 47

ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements

Output i If n ≤ 0, then i = 0. Otherwise, i is the number of

elements of x that satisfy the relationship with a speciﬁed by the subprogram name.

Fortran Equivalent

INTEGER*4 FUNCTION ISCTEQ (N,X,INCX,A) REAL*4 A,X(*) ISCTEQ = 0 INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N IF ( X(IX) .EQ. A ) ISCTEQ = ISCTEQ + 1 IX = IX + INCXA 10 CONTINUE RETURN END

Example Count the number of positive elements of a REAL*8 vector x, where x is a

vector 10 elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 I,IDCTGT,N,INCX REAL*8 A,X(20) N = 10 INCX = 1 A = 0.0D0 I = IDCTGT (N,X,INCX,A)

48 HP MLIB User’s Guide

Index of maximum element of vector ISMAX/IDMAX/IIMAX

Name ISMAX/IDMAX/IIMAX

Index of maximum element of vector

Purpose Given a real or integer vector x of length n, these subprograms determine the

index of the maximum element of the vector. Speciﬁcally, the subprograms determine the smallest index i such that

ximax xj: j 12… n,, ,=

The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 i, ISMAX, n, incx REAL*4 x(lenx)

i = ISMAX(n, x, incx)

INTEGER*4 i, IDMAX, n, incx REAL*8 x(lenx)

i = IDMAX(n, x, incx)

INTEGER*4 i, IIMAX, n, incx, x(lenx) i = IIMAX(n, x, incx)

VECLIB8:

INTEGER*8 i, ISMAX, n, incx REAL*4 x(lenx)

i = ISMAX(n, x, incx)

INTEGER*8 i, IDMAX, n, incx REAL*8 x(lenx)

i = IDMAX(n, x, incx)

()

INTEGER*8 i, IIMAX, n, incx, x(lenx) i = IIMAX(n, x, incx)

Input n Number of elements of vector x to be used. If n ≤ 0, the

subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Chapter 2 Basic Vector Operations 49

ISMAX/IDMAX/IIMAX Index of maximum element of vector

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the

maximum element of x.

Fortran Equivalent

INTEGER*4 FUNCTION ISMAX (N,X,INCX) REAL*4 X(*),XMAX ISMAX = 1 IF ( N .GT. 1 ) THEN XMAX = X(1) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N IF ( X(IX) .GT. XMAX ) THEN ISMAX = I XMAX = X(IX) END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISMAX = 0 END IF RETURN END

Example Locate the largest element of a REAL*8 vector x, where x is a vector 10

elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 I,IDMAX,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDMAX (N,X,INCX)

50 HP MLIB User’s Guide

Index of minimum element of vector ISMIN/IDMIN/IIMIN

Name ISMIN/IDMIN/IIMIN

Index of minimum element of vector

Purpose Given a real or integer vector x of length n, these subprograms determine the

index of minimum element of the vector. Speciﬁcally, the subprograms determine the smallest index i such that

ximin xj: j 12… n,, ,=()=

The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 i, ISMIN, n, incx REAL*4 x(lenx)

i = ISMIN(n, x, incx)

INTEGER*4 i, IDMIN, n, incx REAL*8 x(lenx)

i = IDMIN(n, x, incx)

INTEGER*4 i, IIMIN, n, incx, x(lenx) i = IIMIN(n, x, incx)

VECLIB8:

INTEGER*8 i, ISMIN, n, incx REAL*4 x(lenx)

i = ISMIN(n, x, incx)

INTEGER*8 i, ISMIN, n, incx REAL*8 x(lenx)

i = ISMIN(n, x, incx)

INTEGER*8 i, IIMIN, n, incx, x(lenx) i = IIMIN(n, x, incx)

Input n Number of elements of vector x to be used. If n ≤ 0, the

subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Chapter 2 Basic Vector Operations 51

ISMIN/IDMIN/IIMIN Index of minimum element of vector

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

Output i If n ≤ 0, then i = 0. Otherwise, i is the index of the

minimum element of x.

Fortran Equivalent

INTEGER*4 FUNCTION ISMIN (N,X,INCX) REAL*4 X(*),XMIN ISMIN = 1 IF ( N .GT. 1 ) THEN XMIN = X(1) INCXA = ABS ( INCX ) IX = 1 + INCXA DO 10 I = 2, N IF ( X(IX) .LT. XMIN ) THEN ISMIN = I XMIN = X(IX) END IF IX = IX + INCXA 10 CONTINUE ELSE IF ( N .LT. 1 ) THEN ISMIN = 0 END IF RETURN END

Example Locate the smallest element of a REAL*8 vector x, where x is a vector 10

elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 I,IDMIN,N,INCX REAL*8 X(20) N = 10 INCX = 1 I = IDMIN (N,X,INCX)

52 HP MLIB User’s Guide

Search vector for element ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx

Name ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx

Search vector for element

Purpose Given a real, integer, or complex vector x of length n, these subprograms search

sequentially through the vector for the ﬁrst element xi that satisﬁes a speciﬁed relationship to a given scalar a and return the index i of that element.

The last two characters of the subprogram name specify the relationship of interest between the element of the vector and the scalar. For real and integer subprograms, these characters, represented by “xx” in the prototype Fortran statements, and the corresponding function values can be:

xx Function value

EQ GE GT LE

min{i : x min{i : x min{i : x min{i : x min{i : x min{i : x

For complex subprograms, these characters and corresponding function values are:

= a} ≥ a} > a} ≤ a} < a} ≠ a}

The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 i, ISSVxx, n, incx REAL*4 a, x(lenx)

i = ISSVxx(n, x, incx, a)

INTEGER*4 i, IDSVxx, n, incx REAL*8 a, x(lenx)

i = IDSVxx(n, x, incx, a)

xx Function value

EQ NE

min{i : x

= a}

≠ a}

Chapter 2 Basic Vector Operations 53

ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx Search vector for element

INTEGER*4 i, IISVxx, n, incx, a, x(lenx) i = IISVxx(n, x, incx, a)

INTEGER*4 i, ICSVxx, n, incx COMPLEX*8 a, x(lenx)

i = ICSVxx(n, x, incx, a)

INTEGER*4 i, IZSVxx, n, incx COMPLEX*16 a, x(lenx)

i = IZSVxx(n, x, incx, a)

VECLIB8:

INTEGER*8 i, ISSVxx, n, incx REAL*4 a, x(lenx)

i = ISSVxx(n, x, incx, a)

INTEGER*8 i, IDSVxx, n, incx REAL*8 a, x(lenx)

i = IDSVxx(n, x, incx, a)

INTEGER*8 i, IISVxx, n, incx, a, x(lenx) i = IISVxx(n, x, incx, a)

INTEGER*8 i, ICSVxx, n, incx COMPLEX*8 a, x(lenx)

i = ICSVxx(n, x, incx, a)

INTEGER*8 i, IZSVxx, n, incx COMPLEX*16 a, x(lenx)

i = IZSVxx(n, x, incx, a)

Input n Number of elements of vector x to be compared to a. If

n ≤ 0, the subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

a The scalar a.

54 HP MLIB User’s Guide

Search vector for element ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx

Output i If n ≤ 0 or if no element of x satisﬁes the relationship

with a speciﬁed by the subprogram name, then i = 0. Otherwise, i is the index i of the ﬁrst element xi of x that satisﬁes the relationship with a speciﬁed by the subprogram name. Recall that xi is stored in

x((i−1)×|incx|+1).

Fortran Equivalent

INTEGER*4 FUNCTION IISVEQ (N, X,INCX, A) INTEGER*4 X(*),A IISVEQ = 0 INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N IF ( X(IX) .EQ. A ) THEN IISVEQ = I RETURN END IF IX = IX + INCXA 10 CONTINUE RETURN END

Example Search for the ﬁrst positive element of a REAL*8 vector x, where x is a vector

10 elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 I,IDSVGT,N,INCX REAL*8 A,X(20) N = 10 INCX = 1 A = 0.0D0 I = IDSVGT (N,X,INCX,A)

Chapter 2 Basic Vector Operations 55

SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes

Name SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX

Maximum of magnitudes

Purpose Given a real or integer vector x of length n, SAMAX, DAMAX, or IAMAX

computes the l∞ norm of x, that is, the maximum of the magnitudes of the elements of the vector

∞

Given a complex vector x of length n, SCAMAX or DZAMAX computes

s max Re xi() Im xi()+:i 12… n,, ,=().=

where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual deﬁnition of the maximum of magnitudes of a complex vector is

tx∞max Re xi()2Im xi()

s is computed instead of t because, with its lack of square roots, it is faster to

compute. Because , s is often an acceptable substitute for t. The vector can be stored in a one-dimensional array or in either a row or a

column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 n, incx REAL*4 s, SAMAX, x(lenx)

s = SAMAX(n, x, incx)

INTEGER*4 n, incx REAL*8 s, DAMAX, x(lenx)

s = DAMAX(n, x, incx)

max xi: i 12… n,, ,=().==

  

+{}

t ≤ s ≤ 2t

12⁄

: i 12… n,, ,=

.==

56 HP MLIB User’s Guide

INTEGER*4 n, incx, s, IAMAX, x(lenx) s = IAMAX(n, x, incx)

INTEGER*4 n, incx REAL*4 s, SCAMAX COMPLEX*8 x(lenx)

s = SCAMAX(n, x, incx)

Maximum of magnitudes SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX

INTEGER*4 n, incx REAL*8 s, DZAMAX COMPLEX*16 x(lenx)

s = DZAMAX(n, x, incx)

VECLIB8:

INTEGER*8 n, incx REAL*4 s, SAMAX, x(lenx)

s = SAMAX(n, x, incx)

INTEGER*8 n, incx REAL*8 s, DAMAX, x(lenx)

s = DAMAX(n, x, incx)

INTEGER*8 n, incx, s, IAMAX, x(lenx) s = IAMAX(n, x, incx)

INTEGER*8 n, incx REAL*4 s, SCAMAX COMPLEX*8 x(lenx)

s = SCAMAX(n, x, incx)

INTEGER*8 n, incx REAL*8 s, DZAMAX COMPLEX*16 x(lenx)

s = DZAMAX(n, x, incx)

Input n Number of elements of vector x to be used. If n ≤ 0, the

subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

Output s If n ≤ 0, then s = 0. Otherwise, s is the maximum of the

magnitudes of the elements of x.

Chapter 2 Basic Vector Operations 57

SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes

Fortran Equivalent

REAL*4 FUNCTION SAMAX (N,X,INCX) REAL*4 X(*) SAMAX = 0.0 INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N SAMAX = MAX ( SAMAX , ABS ( X(IX) ) ) IX = IX + INCXA 10 CONTINUE RETURN END

Example Compute the maximum of the magnitudes of the elements of a REAL*8 vector

x, where x is a vector 10 elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 N,INCX REAL*8 S,DAMAX,X(20) N = 10 INCX = 1 S = DAMAX (N,X,INCX)

58 HP MLIB User’s Guide

Minimum of magnitudes SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN

Name SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN

Minimum of magnitudes

Purpose Given a real or integer vector x of length n, SAMIN, DAMIN, or IAMIN

computes the minimum of the magnitudes of the elements of the vector

s min xi: i 12… n,, ,=().=

Given a complex vector x of length n, SCAMIN or DZAMIN computes

s min Re xi() Im xi(): i+12… n,, ,=()=

where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively. The usual deﬁnition of the minimum of magnitudes of a complex vector is

12⁄

t min Re xi()2Im xi()

+{}

s is computed instead of t because, with its lack of square roots, it is faster to compute. Because , s is often an acceptable substitute for t.

The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.

: i 12… n,, ,=().=

t ≤ s ≤ 2t

Usage VECLIB:

INTEGER*4 n, incx REAL*4 s, SAMIN, x(lenx)

s = SAMIN(n, x, incx)

INTEGER*4 n, incx REAL*8 s, DAMIN, x(lenx)

s = DAMIN(n, x, incx)

INTEGER*4 n, incx, s, IAMIN, x(lenx) s = IAMIN(n, x, incx)

INTEGER*4 n, incx REAL*4 s, SCAMIN COMPLEX*8 x(lenx)

s = SCAMIN(n, x, incx)

INTEGER*4 n, incx REAL*8 s, DZAMIN COMPLEX*16 x(lenx)

s = DZAMIN(n, x, incx)

VECLIB8:

Chapter 2 Basic Vector Operations 59

SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN Minimum of magnitudes

INTEGER*8 n, incx REAL*4 s, SAMIN, x(lenx)

s = SAMIN(n, x, incx)

INTEGER*8 n, incx REAL*8 s, DAMIN, x(lenx)

s = DAMIN(n, x, incx)

INTEGER*8 n, incx, s, IAMIN, x(lenx) s = IAMIN(n, x, incx)

INTEGER*8 n, incx REAL*4 s, SCAMIN COMPLEX*8 x(lenx)

s = SCAMIN(n, x, incx)

INTEGER*8 n, incx REAL*8 s, DZAMIN COMPLEX*16 x(lenx)

s = DZAMIN(n, x, incx)

60 HP MLIB User’s Guide

Minimum of magnitudes SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN

Input n Number of elements of vector x to be used. If n ≤ 0, the

subprograms do not reference x.

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

Output s If n ≤ 0, then s = ∞, the largest representable machine

number. Otherwise, s is the minimum of the magnitudes of the elements of x.

Fortran Equivalent

REAL*4 FUNCTION SAMIN (N,X,INCX) REAL*4 X(*) SAMIN = ∞ INCXA = ABS ( INCX ) IX = 1 DO 10 I = 1, N SAMIN = MIN ( SAMIN , ABS ( X(IX) ) ) IX = IX + INCXA 10 CONTINUE RETURN END

Example Compute the minimum of the magnitudes of the elements of a REAL*8 vector

x, where x is a vector 10 elements long stored in a one-dimensional array X of dimension 20.

INTEGER*4 N,INCX REAL*8 S,DAMIN,X(20) N = 10 INCX = 1 S = DAMIN (N,X,INCX)

Chapter 2 Basic Vector Operations 61

SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes

Name SASUM/DASUM/IASUM/SCASUM/DZASUM

Sum of magnitudes

Purpose Given a real or integer vector x of length n, SASUM, DASUM, or IASUM

computes the l1 norm of x, that is, the sum of magnitudes of the elements of the vector

Given a complex vector x of length n, SCASUM or DZASUM computes

∑

i 1=

where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively.

The usual deﬁnition of sum of magnitudes of a complex vector is

∑

i 1=

Re x

()

Im x

()

s is computed instead of t because, with its lack of square roots, it is faster to compute. Because , s is often an acceptable substitute for t.

The vector can be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.

Usage VECLIB:

INTEGER*4 n, incx REAL*4 s, SASUM, x(lenx)

s = SASUM(n, x, incx)

INTEGER*4 n, incx REAL*8 s, DASUM, x(lenx)

s = DASUM(n, x, incx)

INTEGER*4 n, incx, s, IASUM, x(lenx) s = IASUM(n, x, incx)

 

Re xi()2Im xi()

∑



i 1=

t ≤ s ≤ 2t

12⁄

62 HP MLIB User’s Guide

Sum of magnitudes SASUM/DASUM/IASUM/SCASUM/DZASUM

INTEGER*4 n, incx REAL*4 s, SCASUM COMPLEX*8 x(lenx)

s = SCASUM(n, x, incx)

INTEGER*4 n, incx REAL*8 s, DZASUM COMPLEX*16 x(lenx)

s = DZASUM(n, x, incx)

VECLIB8:

INTEGER*8 n, incx REAL*4 s, SASUM, x(lenx)

s = SASUM(n, x, incx)

INTEGER*8 n, incx REAL*8 s, DASUM, x(lenx)

s = DASUM(n, x, incx)

INTEGER*8 n, incx, s, IASUM, x(lenx) s = IASUM(n, x, incx)

INTEGER*8 n, incx REAL*4 s, SCASUM COMPLEX*8 x(lenx)

s = SCASUM(n, x, incx)

INTEGER*8 n, incx REAL*8 s, DZASUM COMPLEX*16 x(lenx)

s = DZASUM(n, x, incx)

Input n Number of elements of vector x to be used in the sum of

magnitudes. If n ≤ 0, the subprograms do not reference

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x. x is stored forward in array x

with increment |incx|; that is, xi is stored in

x((i−1)×|incx|+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS

Chapter 2 Basic Vector Operations 63

SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes

Indexing Conventions” in the introduction to this chapter.

Output s If n ≤ 0, then s = 0. Otherwise, s is the sum of

magnitudes of the elements of x.

Fortran Equivalent

REAL*4 FUNCTION SASUM (N, X,INCX) REAL*4 X(*) SASUM = 0.0 IF ( N .LE. 0 ) RETURN IX = 1 INCXA = ABS ( INCX ) DO 10 I = 1, N SASUM = SASUM + ABS ( X(IX) ) IX = IX + INCXA 10 CONTINUE RETURN END

Example Compute the sum of magnitudes of the elements of a REAL*8 vector x, where x

is a vector 10 elements long stored in a one-dimensional array X of dimension

20.

INTEGER*4 N,INCX REAL*8 S,DASUM,X(20) N = 10 INCX = 1 S = DASUM (N,X,INCX)

64 HP MLIB User’s Guide

Elementary vector operation SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC

Name SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC

Elementary vector operation

Purpose Given a real or complex scalar a and real or complex vectors x and y of length n,

these subprograms perform the elementary vector operations

yax← y+

and

y ← ax+y

where is the complex conjugate of x. The vectors can be stored in

one-dimensional arrays or in either rows or columns of two-dimensional arrays, and the indexing through the arrays can be either forward or backward.

Usage VECLIB:

INTEGER*4 n, incx, incy REAL*4 a, x(lenx), y(leny)

CALL SAXPY(n, a, x, incx, y, incy)

INTEGER*4 n, incx, incy REAL*8 a, x(lenx), y(leny)

CALL DAXPY(n, a, x, incx, y, incy)

INTEGER*4 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)

CALL CAXPY(n, a, x, incx, y, incy)

INTEGER*4 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)

CALL CAXPYC(n, a, x, incx, y, incy)

INTEGER*4 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)

CALL ZAXPY(n, a, x, incx, y, incy)

INTEGER*4 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)

CALL ZAXPYC(n, a, x, incx, y, incy)

VECLIB8:

INTEGER*8 n, incx, incy REAL*4 a, x(lenx), y(leny)

CALL SAXPY(n, a, x, incx, y, incy)

INTEGER*8 n, incx, incy REAL*8 a, x(lenx), y(leny)

CALL DAXPY(n, a, x, incx, y, incy)

Chapter 2 Basic Vector Operations 65

SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC Elementary vector operation

INTEGER*8 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)

CALL CAXPY(n, a, x, incx, y, incy)

INTEGER*8 n, incx, incy COMPLEX*8 a, x(lenx), y(leny)

CALL CAXPYC(n, a, x, incx, y, incy)

INTEGER*8 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)

CALL ZAXPY(n, a, x, incx, y, incy)

INTEGER*8 n, incx, incy COMPLEX*16 a, x(lenx), y(leny)

CALL ZAXPYC(n, a, x, incx, y, incy)

Input n Number of elements of vectors x and y to be used in the

elementary vector operation. If n ≤ 0, the subprograms do not reference x or y.

a The scalar a. x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x. x is used in conjugated form by CAXPYC and ZAXPYC and in unconjugated form by the other subprograms.

incx Increment for the array x:

incx ≥ 0 x is stored forward in array x; that is,

xi is stored in x((i−1)×incx+1).

66 HP MLIB User’s Guide

incx < 0 x is stored backward in array x; that

is, xi is stored in x((i−n)×incx+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

y Array of length leny = (n−1)×|incy|+1 containing the

n-vector y.

incy Increment for the array y, incy ≠ 0:

incy > 0 y is stored forward in array y; that is,

yi is stored in y((i−1)×incy+1).

incy < 0 y is stored backward in array y; that

is, yi is stored in y((i−n)×incy+1).

Elementary vector operation SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC

Use incy = 1 if the vector y is stored contiguously in array y; that is, if yi is stored in y(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

Output y If n ≤ 0 or a = 0, then y is unchanged. Otherwise, ax+y

overwrites the input.

Notes If incx = 0, then xi= x(1) for all i.

The result is unspeciﬁed if incy = 0 or if x and y overlap such that any element of x shares a memory location with any element of y.

Fortran Equivalent

SUBROUTINE SAXPY (N, A, X,INCX, Y,INCY) REAL*4 X(*),Y(*) IF ( N .LE. 0 ) RETURN IF ( A .EQ. 0.0 ) RETURN IX = 1 IY = 1 IF ( INCX .LT. 0 ) IX = 1 - (N-1) * INCX IF ( INCY .LT. 0 ) IY = 1 - (N-1) * INCY DO 10 I = 1, N Y(IY) = A * X(IX) + Y(IY) IX = IX + INCX IY = IY + INCY 10 CONTINUE RETURN END

Example 1 Compute the REAL*8 elementary vector operation

y 2x← y,+

where x and y are vectors 10 elements long stored in one-dimensional arrays X and Y of dimension 20.

INTEGER*4 N,INCX,INCY REAL*8 A,X(20),Y(20) N = 10 A = 2.0D0 INCX = 1 INCY = 1 CALL DAXPY (N,A,X,INCX,Y,INCY)

Example 2 Subtract 3 times the 4th row of a 10-by-10 matrix from the 5th row. The matrix

is stored in a two-dimensional array B of dimension 20-by-21.

INTEGER*4 N REAL*8 A,B(20,21) N = 10 A = -3.0D0 CALL DAXPY (N,A,B(4,1),20,B(5,1),20)

Chapter 2 Basic Vector Operations 67

SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation

Name SAXPYI/DAXPYI/CAXPYI/ZAXPYI

Sparse elementary vector operation

Purpose Given a real or complex scalar a, a sparse vector x stored in compact form via a

set of indices and a dense vector y stored in full storage form, these subprograms perform the elementary vector operation

yax← y.+

More precisely, let x be a sparse n-vector with m ≤ n interesting (usually nonzero) elements, and let {k1, k2, ..., km} be the indices of these elements. All uninteresting elements of x are assumed to be zero. Let y be an ordinary n-vector. If x is represented by arrays x and indx such that indx(i)=ki and

x(i)=x

Usage VECLIB:

then these subprograms compute

axiy

,+← i 12… m.,, ,=

INTEGER*4 m, indx(m) REAL*4 a, x(m), y(n)

CALL SAXPYI(m, a, x, indx, y)

INTEGER*4 m, indx(m) REAL*8 a, x(m), y(n)

CALL DAXPYI(m, a, x, indx, y)

VECLIB8:

68 HP MLIB User’s Guide

INTEGER*4 m, indx(m) COMPLEX*8 a, x(m), y(n)

CALL CAXPYI(m, a, x, indx, y)

INTEGER*4 m, indx(m) COMPLEX*16 a, x(m), y(n)

CALL ZAXPYI(m, a, x, indx, y)

INTEGER*8 m, indx(m) REAL*4 a, x(m), y(n)

CALL SAXPYI(m, a, x, indx, y)

INTEGER*8 m, indx(m) REAL*8 a, x(m), y(n)

CALL DAXPYI(m, a, x, indx, y)

Sparse elementary vector operation SAXPYI/DAXPYI/CAXPYI/ZAXPYI

INTEGER*8 m, indx(m) COMPLEX*8 a, x(m), y(n)

CALL CAXPYI(m, a, x, indx, y)

INTEGER*8 m, indx(m) COMPLEX*16 a, x(m), y(n)

CALL ZAXPYI(m, a, x, indx, y)

Input m Number of interesting elements of x, m ≤ n, where n is

the length of y. If m ≤ 0, the subprograms do not reference x, indx, or y.

a The scalar a. x Array of length m containing the interesting elements

of x. x(j)=xi if indx(j)=i.

Chapter 2 Basic Vector Operations 69

SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation

indx Array containing the indices {k

elements of x. The indices must satisfy

} of the interesting

1 indx i() n≤≤i 12… m,, ,=

and

indx i() indx j()≠ 1 i≤ j m,≤≠

where n is the length of y.

y Array containing the elements of y, y(i)=y

Output y If m ≤ 0 or a = 0, then y is unchanged. Otherwise, ax+y

overwrites the input. Only the elements of y whose indices are included in indx are changed.

Notes The result is unspeciﬁed if any element of indx is out of range, if any two

elements of indx have the same value, or if x, indx, and y overlap such that any element of x or any index shares a memory location with any element of y.

Fortran Equivalent

SUBROUTINE SAXPYI (M, A, X,INDX, Y) REAL*4 A,X(*),Y(*) INTEGER*4 INDX(*) IF ( M .LE. 0 ) RETURN IF ( A .EQ. 0.0 ) RETURN DO 10 I = 1, M Y(INDX(I)) = A * X(I) + Y(INDX(I)) 10 CONTINUE RETURN END

Example Compute the REAL*8 elementary vector operation

y 2x← y,+

where x is a sparse vector with interesting elements x1, x4, x5, and x9 stored in one-dimensional array X, and y is stored in a one-dimensional array Y of

dimension 20.

INTEGER*4 M,INDX(4) REAL*8 A,X(4),Y(20) DATA INDX / 1, 4, 5, 9 / M = 4 A = 2.0D0 CALL DAXPYI (M,A,X,INDX,Y)

70 HP MLIB User’s Guide

Two sided vector clip SCLIP/DCLIP/ICLIP

Name SCLIP/DCLIP/ICLIP

Two sided vector clip

Purpose Given scalars a and b and a vector x of length n, these subprograms form the

vector y by the clip operation

a if xia≤

 





b if bx



The vectors can be stored in one-dimensional arrays or in either rows or columns of two-dimensional arrays. Indexing through the arrays can be either forward or backward.

Usage VECLIB:

if axib<< i 12… n,, ,=

≤

INTEGER*4 n, incx, incy REAL*4 a, b, x(lenx), y(leny)

CALL SCLIP(n, a, b, x, incx, y, incy)

INTEGER*4 n, incx, incy REAL*8 a, b, x(lenx), y(leny)

CALL DCLIP(n, a, b, x, incx, y, incy)

INTEGER*4 n, incx, incy, a, b, x(lenx), y(leny) CALL ICLIP(n, a, b, x, incx, y, incy)

VECLIB8:

INTEGER*8 n, incx, incy REAL*4 a, b, x(lenx), y(leny)

CALL SCLIP(n, a, b, x, incx, y, incy)

INTEGER*8 n, incx, incy REAL*8 a, b, x(lenx), y(leny)

CALL DCLIP(n, a, b, x, incx, y, incy)

INTEGER*8 n, incx, incy, a, b, x(lenx), y(leny) CALL ICLIP(n, a, b, x, incx, y, incy)

Input n Number of elements of vectors x and y to be used. If

n 0≤

, the subprograms do not reference x or y.

a The scalar a. b The scalar b.

Chapter 2 Basic Vector Operations 71

SCLIP/DCLIP/ICLIP Two sided vector clip

x Array of length lenx = (n−1)×|incx|+1 containing the

n-vector x.

incx Increment for the array x:

incx ≥ 0 x is stored forward in array x; that is,

xi is stored in x((i−1)×incx+1).

incx < 0 x is stored backward in array x; that

is, xi is stored in x((i−n)×incx+1).

Use incx = 1 if the vector x is stored contiguously in array x; that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to this chapter.

72 HP MLIB User’s Guide

HP HP-UX Performance Tools User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Part 1

SDOT/DDOT/CDOTC/CDOTU/ZDOTC/ZDOTU Dot product . . . . . . . . . . . . . . . . . 84

SDOTI/DDOTI/CDOTCI/CDOTUI/ZDOTCI/ZDOTUI Sparse dot product . . . . . . . 88

SROT/DROT/CROT/CSROT/ZROT/ZDROT Apply Givens rotation . . . . . . . . . . . 114

SWDOT/DWDOT/CWDOTC/CWDOTU/ZWDOTC/ZWDOTU Weighted dot product .

VECLIB

Preface

LAPACK

ScaLAPACK

SuperLU

SOLVERS

VMATH

Purpose and audience

Organization

Notational conventions

Documentation resources

1 Introduction to VECLIB

Overview

Chapter objectives

Standardization

Accessing VECLIB

Table 1-1 VECLIB and VECLIB8 Libraries

Compiling and linking (VECLIB)

Problem with +ppu compatibility and duplicated symbols

Table 1-2 Compiler Defaults

Optimization

Parallel processing

Linking for parallel or non parallel processing

Controlling VECLIB parallelism at runtime

OpenMP-based nested parallelism

Message passing-based nested parallelism

Default CPS library stack is too small for MLIB

Roundoff effects

Data types and precision

VECLIB naming convention

Table 1-3 VECLIB Naming Convention—Data Type

Data type and byte length

MLIB Argument Lengths

Operator arguments

Table 1-4 BLAS Standard Operator Arguments

Error handling

Low-level subprograms

High-level subprograms

Troubleshooting

HP MLIB man pages

2 Basic Vector Operations

Overview

Chapter objectives

Associated documentation

What you need to know to use vector subprograms

BLAS storage conventions

BLAS indexing conventions

Representation of a permutation matrix

Representation of a Householder matrix

Subprograms for basic vector operations

Name ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX

Name ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN

Name ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx

Name ISMAX/IDMAX/IIMAX

Name ISMIN/IDMIN/IIMIN

Name ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx

Name SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX

Name SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN

Name SASUM/DASUM/IASUM/SCASUM/DZASUM

Name SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC

Name SAXPYI/DAXPYI/CAXPYI/ZAXPYI

Name SCLIP/DCLIP/ICLIP