Document Numbers: B6061-96027 and B6061-96028
Remarks: Released Dcember 2004 with HP MLIB software version9.0.
User’s Guide split into two volumes with this release.
Edition: Sixth
Document Number: B6061-96023
Remarks: Released September 2003 with HP MLIB software version 8.50.
Edition: Fifth
Document Number: B6061-96020
Remarks: Released September 2002 with HP MLIB software version 8.30.
Edition: Fourth
Document Number: B6061-96017
Remarks: Released June 2002 with HP MLIB software version 8.20.
Edition: Third
Document Number: B6061-96015
Remarks: Released September 2001 with HP MLIB software version 8.10.
Edition: Second
Document Number: B6061-96012
Remarks: Released June 2001 with HP MLIB software version 8.00.
Edition: First
Document Number: B6061-96010
Remarks: Released December 1999 with HP MLIB software version
B.07.00. This HP MLIB User’s Guide contains both HP VECLIB and
HP LAPACK information. This document supercedes both the HP MLIB
VECLIB User’s Guide, fifth edition (B6061-96006) and the HP MLIB
LAPACK User’s Guide, sixth edition (B6061-96005).
Notice
Copyright 1979-2004 Hewlett-Packard Development Company. All Rights
Reserved. Reproduction, adaptation, or translation without prior written
permission is prohibited, except as allowed under the copyright laws.
The information contained in this document is subject to change without notice.
Hewlett-Packard makes no warranty of any kind with regard to this material,
including, but not limited to, the implied warranties of merchantability and
fitness for a particular purpose. Hewlett-Packard shall not be liable for errors
contained herein or for incidental or consequential damages in connection with
the furnishing, performance or use of this material.
Hewlett-Packard’s high-performance math libraries (HP MLIB) help you speed
development of applications and shorten execution time of long-running
technical applications.
HP MLIB is a collection of subprograms optimized for use on HP servers and
workstations, providing mathematical software and computational kernels for
engineering and scientific applications. HP MLIB can be used on systems
ranging from single-processor workstations to multiprocessor high-end servers.
HP MLIB is optimized for HP PA-RISC 2.0, Itanium 2, and Opteron
processors. HP MLIB has six components; VECLIB, LAPACK, ScaLAPACK,
SuperLU, SOLVERS, and VMATH.
HP VECLIB contains robust callable subprograms. Together with a subset of
the BLAS Standard subroutines, HP MLIB supports the legacy BLAS, a
collection of routines for the solution of sparse symmetric systems of equations,
a collection of commonly used Fast Fourier Transforms (FFTs), and
convolutions. Although VECLIB was designed for use with Fortran programs,
C programs can call VECLIB subprograms, as described in Appendix A. Refer
to Part 1 of this manual for HP VECLIB information.
Throughout this document there are references to legacy BLAS and BLAS
Standard routines. Legacy BLAS routines include Basic Linear Algebra
Subprograms (BLAS), that is the level 1, 2, and 3 BLAS, as well as the Sparse
BLAS.
A BLAS standardization effort began with a BLAS Technical (BLAST) Forum
meeting in November 1995 at the University of Tennessee. The efforts of the
BLAST Forum resulted in a BLAS Standard specification in 1999. BLAS
Standard routines refer to routines as defined by this BLAS Standard
specification. HP MLIB supports a subset of BLAS Standard routines. Refer to
Chapter 2, “Basic Vector Operations,” and Chapter 3, “Basic Matrix
Operations,” for details about supported subprograms.
Prefacexv
LAPACK
LAPACK
HP Linear Algebra Package (LAPACK) is a collection of subprograms that
provide mathematical software for applications involving linear equations,
least squares, eigenvalue problems, and the singular value decomposition.
LAPACK is designed to supersede the linear equation and eigenvalue packages,
LINPACK and EISPACK. The National Science Foundation, the Defense
Advanced Research Projects Agency, and the Department of Energy supported
the development of the public-domain version of LAPACK, from which the HP
version was derived.
HP LAPACK fully conforms with public domain version 3.0 of LAPACK in all
user-visible usage conventions. Refer to Part 2 of this manual for information
specific to HP LAPACK. To supplement the HP specific information provided in
Part 2 of this document, refer to the standard LAPACK Users’ Guide. You can
access the latest edition of the LAPACK Users’ Guide at the Netlib repository at
the following URL:
http://www.netlib.org/lapack/lug/index..html
ScaLAPACK
ScaLAPACK is a library of high-performance linear algebra routines capable of
solving systems of linear equations, linear least squares problems, eigenvalue
problems, and singular value problems. ScaLAPACK can also handle many
associated computations such as matrix factorizations or estimating condition
numbers.
ScaLAPACK is a public domain software that was developed by Oak Ridge
National Laboratory. It is designed for distributed computing and uses the
Message Passing Interface (MPI) for parallelism. This implementation provides
a version of ScaLAPACK tuned on HP servers and built with HP’s MPI. The
ScaLAPACK library routines are written in Fortran 77 and are callable from
Fortran 90 and C routines. Unlike other MLIB libraries, there is not a version
of ScaLAPACK that assumes all integers are 8 bytes in length.
xvi HP MLIB VECLIB User’s Guide
Refer to Part 3 of this manual for information specific to HP ScaLAPACK. To
supplement the HP specific information provided in Part 3 of this document,
refer to the standard ScaLAPACK Users’ Guide. You can access the latest
edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following
URL:
http://www.netlib.org/scalapack/slug/index..html
SuperLU
This implementation provides the Distributed SuperLU library designed for
distributed memory parallel computers. It was based on the public-domain
SuperLU_DIST, which was developed at the Lawrence Berkeley National Lab
and the University of California at Berkeley.
The library contains a set of subroutines to solve a sparse linear system. The
library is implemented in ANSI C, using HP Message Passing Interface (MPI)
for communication. The library includes routines to handle both real and
complex matrices using double precision. The parallel routine names for the
double-precision real version start with the letters “pd” (e.g. pdgstrf). The
parallel routine names for the double-precision complex version start with
letters “pz” (e.g. pzgstrf). Unlike other MLIB libraries, there is not a version of
SuperLU that assumes all integers are 8 bytes in length.
The routines can be called directly from C applications. They may also be called
from Fortran applications, however, “bridge” routines (in C) must be supplied.
For details about creating bridge routines, please refer to Section 2.9.2 in the
SuperLU User’s Guide available at:
http://www.nersc.gov/~xiaoye/SuperLU
SuperLU
SOLVERS
Solvers is a collection of direct sparse linear system solvers and graph
partitioning routines. Symmetric systems can be solved using SMP parallelism
and structurally-symmetric systems can be solved using out-of-core
functionality. These routines have been optimized for use on Hewlett-Packard
servers. Features are:
Prefacexvii
VMATH
• Sparse symmetric and structurally-symmetric linear equation solutions.
• Sparse symmetric ordinary and generalized eigensystem solutions.
• Out-of-core symmetric and structurally-symmetric linear equation and
eigensystems solutions.
• Full METIS functionality
This implementation provides the METIS Version 4.0.1 library. It is based
on the public-domain METIS, which was developed at the University of
Minnesota, Department of Computer Science, and the Army HPC Research
Center. The library contains a set of subroutines for graph partitioning,
mesh partitioning, and sparse matrix reordering, as well as auxiliary
routines. HP MLIB contains the full METIS functionality as that in the
public domain METIS, however, the routine names are different. HP MLIB
METIS routine names have been prepended with mlib_ to avoid name
conflict on applications and libraries that contain their own local version of
METIS.
For more information about METIS, please refer to:
http://www-users.cs.umn.edu/~karpis/metis/metis/index.html
VMATH
VMATH is a library of vector math routines corresponding to many of the
widely used scalar math routines available with C, C++, and Fortran90.
VMATH is intended for computationally intensive mathematical applications
amenable to a vector programming style.
VMATH provides two libraries: VMATH, whose interface uses 4-byte integers;
and VMATH8, whose interface uses 8-byte integers and is otherwise equivalent
to VMATH. VMATH routines come with both Fortran and C interfaces.
For more detailed information on VMATH as well as subprogram specifications,
please refer to the VMATH chapter in Part 5 of this book. The VMATH man
pages provide a man page for each subprogram.
xviii HP MLIB VECLIB User’s Guide
Purpose and audience
This guide describes the MLIB software library and shows how to use it. This
library provides mathematical software and computational kernels for
applications.
The HP MLIB User’s Guide addresses experienced programmers who:
• Convert, develop, or optimize programs for use on HP servers and
workstations
• Optimize existing software to improve performance and increase
productivity
Purpose and audience
Prefacexix
Organization
Organization
The HP MLIB User’s Guide describes HP MLIB VECLIB in Part 1, HP MLIB
LAPACK in Part 2, HP MLIB ScaLAPACK in Part 3, and HP MLIB Distributed
SuperLU in Part 4.
To learn fundamental information necessary for using the VECLIB library,
read Chapter 1 and the introductory sections of the other chapters. These
sections of background information will help you efficiently use the library
subprograms.
To learn more about the subject of any chapter, refer to the literature cited in
the “Associated documentation” section of each chapter.
Part 1 of this document is organized as follows:
• Chapter 1 introduces general concepts about VECLIB
• Chapter 2 describes basic vector operations included in VECLIB
• Chapter 3 describes basic matrix operations included in VECLIB
• Chapter 4 describes sparse BLAS operations
• Chapter 5 describes the discrete Fourier transforms in VECLIB
• Chapter 6 describes subprograms to compute convolutions and correlations
of data sets
• Chapter 7 describes miscellaneous subprograms to produce random
numbers, sort the elements of a vector in ascending or descending order,
measure time, allocate dynamic memory, and report errors
Part 2 of this document is organized as follows:
• Chapter 8 describes information specific to Hewlett-Packard’s
implementation of LAPACK
• Chapter 16 describes BCSLIB-EXT functionality
Supplemental material is provided as follows:
• Appendix A describes how to call VECLIB and LAPACK subprograms from
within C programs
• Appendix B describes LINPACK subprograms available in HP MLIB
• Appendix C lists parallelized subprograms in VECLIB and LAPACK
• An index is included at the back of the manual
Supplemental information for Part 2 of this HP MLIB User’s Guide, is found in
the LAPACK Users’ Guide.
The LAPACK Users’ Guide is a publication from the Society for Industrial and
Applied Mathematics that provides an introduction to the design of LAPACK as
well as complete specifications for all the driver and computational routines.
You can access the latest edition of the LAPACK Users’ Guide at the Netlib
repository at the following URL:
http://www.netlib.org/lapack/lug
Supplemental information for Part 3 of this HP MLIB User’s Guide, is found in
the ScaLAPACK User’s Guide.
The ScaLAPACK Users’ Guide is a publication from the Society for Industrial
and Applied Mathematics that provides an informal introduction to the design
of ScaLAPACK as well as a detailed description of its contents and a reference
manual. You can access the latest edition of the ScaLAPACK Users’ Guide at
the Netlib repository at the following URL:
http://www.netlib.org/scalapack/slug
Supplemental information for Part 4 of this HP MLIB User’s Guide, is found in
the SuperLU User’s Guide, which is only available online at:
http://www.nersc.gov/~xiaoye/SuperLU
Prefacexxi
Notational conventions
Notational conventions
The following conventions are used in this manual:
Italics
Italics within text indicate mathematical entities used
or manipulated by the program: for example, solve the
n-by-n system of linear equations Ax = b.
Italics within command lines indicate generic
commands, file names, or subprogram names.
Substitute actual commands, file names, or
subprograms for the italicized words. For example, the
command line
f90 prog_name.o
instructs you to type the command f90, followed by the
name of a program or subprogram object file.
UPPERCASE BOLDFACE
UPPERCASE BOLDFACE
Fortran statements indicates Fortran keywords and
subprogram names that must be typed just as they
appear. For example, CALL DAXPY.
within text and in prototype
lowercase boldface
xxii HP MLIB VECLIB User’s Guide
lowercase boldface
generic variable or array names. You should substitute
actual variable or array names. The italicized
mathematical entities and the lowercase boldface
variable and array names usually correspond. For
example, A is a matrix and a is the Fortran array
containing the matrix:
CALL DAXPY (n, a, x, incx, y, incy)
lowercase boldface
ASCII characters that must be typed just as they
appear. For example, the command line
f90 prog_name.o
instructs you to type the command f90, followed by the
name of a program or subprogram object file.
within text indicates Fortran
within command lines indicates
UPPERCASE MONOSPACE
UPPERCASE MONOSPACE indicates Fortran programs.
Brackets ( [ ] )
Square brackets in command examples designate
optional entries.
NOTE
A NOTE highlights important supplemental
information.
Documentation resources
The HP MLIB User’s Guide, the LAPACK Users’ Guide, and the ScaLAPACK
Users’ Guide are available in hardcopy and online formats.
For the HP MLIB User’s Guide, refer to:
http:// www.hp.com/go/mlib
Documentation resources
For the latest edition of the LAPACK Users’ Guide, refer to:
http://www.netlib.org/lapack/lug
For the latest edition of the ScaLAPACK User’s Guide, refer to:
http://www.netlib.org/scalapack/slug
For the latest edition of the SuperLU User’s Guide, refer to:
http://www.nersc.gov/~xiaoye/SuperLU
The following documents provide supplemental information:
• LAPACK Users’ Guide Philadelphia, PA: Society for Industrial and Applied
Mathematics, 1995. This guide provides information on the subprograms
provided with the LAPACK library.
• ScaLAPACK Users’ Guide Philadelphia, PA: Society for Industrial and
Applied Mathematics. This guide provides information on the subprograms
provided with the ScaLAPACK library.
• Parallel Programming Guide for HP-UX Systems. Describes efficient
shared-memory parallel programming techniques using HP compilers.
• Fortran/9000 Programmer’s Guide. Describes features and requirements in
terms of the tasks a programmer might perform. These tasks include how to
compile, link, run, debug, and optimize programs.
Prefacexxiii
Documentation resources
• HP-UX Floating-Point Guide. Describes how floating-point arithmetic is
implemented on HP 9000 systems and discusses how floating-point behavior
affects the programmer.
• HP Fortran 90 Programmer’s Guide. Provides extensive usage information
(including how to compile and link), suggestions and tools for migrating to
HP Fortran 90, and how to call C and HP-UX routines for HP Fortran 90.
• HP Fortran 90 Programmer’s Reference. Provides complete Fortran 90
language reference information. It also covers compiler options, compiler
directives, and library information.
• HP C Programming Guide. Contains detailed discussions of selected C
topics.
• HP C/HP-UX Reference Manual. Presents reference information on the C
programming language as implemented by HP.
• HP aC++ Online Programmer’s Guide. Presents reference and tutorial
information on aC++. (This manual is accessed by specifying aCC with the
+help command-line option.)
• HP MPI User’s Guide. Describes how to use HP MPI (Message Passing
Interface), a library of C- and Fortran-callable routines used for
message-passing programming.
• Software Optimization for High Performance Computing: Creating Faster
Applications: Provides state-of-the-art solutions for every key aspect of
software performance; both code-based and algorithm-based.
• HP-UX Assembly Language Reference Manual. Describes the HP-UX
assembler for the PA-RISC processor.
• HP PA-RISC 2.0 Architecture Reference. Describes the architecture of the
PA-RISC 2.0 processor.
• PA-RISC Procedure Calling Conventions Reference. Describes the
conventions for creating PA-RISC assembly language procedure calls.
• IA-64 and Elementary Functions. Describes the architecture of the IA-64
processor and how to implement elementary functions.
• METIS documentation is available online at
http://www-users.cs.umn.edu/~karypis/metis/metis/index.html.
• OpenMP documentation is available online at http://www.openmp.org.
xxiv HP MLIB VECLIB User’s Guide
Part 1
HP VECLIB
1Introduction to VECLIB
Overview
VECLIB, a component of HP MLIB, is a collection of subprograms optimized for
use on Hewlett-Packard servers and workstations, providing mathematical
software and computational kernels for engineering and scientific applications.
This library contains subprograms for:
• Dense vector operations, including the Level 1 BLAS
• Sparse vector and matrix operations, including the Sparse BLAS
• BLAS 3 routine DGEMM is highly tuned
• Matrix operations, including the Level 2 and Level 3 BLAS
• CXML Blas extenstions
Compaq Extended Math Library (CXML) is a collection of routines that
performs numerically intensive operations that occur frequently in
engineering and scientific computing, such as linear algebra and signal
processing.
HP MLIB adds support for the unique CXML extensions to the legacy BLAS
and for the array math functions in CXML. However, the additional support
of XCML is not exhaustive. For more information about CXML see the
CXML Reference Manual part number AA-PV6VE-TE.
• Discrete Fourier transforms
• Convolution and correlation
• Miscellaneous tasks, such as sorting and generating random numbers
Overview
VECLIB provides two sets of libraries: VECLIB and VECLIB8. To determine if
a subprogram is included in VECLIB8, refer to the VECLIB8 section under
each subprogram specification in the following chapters.
Although VECLIB was designed for use with Fortran programs, C programs
can call VECLIB subprograms. Refer to Appendix A, “Calling MLIB routines
from C,” for details. Examples are in Fortran, unless otherwise indicated.
Except for subprograms described in Appendix B, “LINPACK Subprograms,”
LINPACK, EISPACK, and SKYLINE subprograms are not included in the
Chapter 1 Introduction to VECLIB 1
Chapter objectives
Hewlett-Packard scientific libraries. Refer to the Appendix, “Converting from
LINPACK or EISPACK” in the LAPACK Users’ Guide, for assistance converting
programs that currently call LINPACK or EISPACK routines to call LAPACK
or VECLIB routines instead.
This chapter provides information necessary for efficient use of VECLIB and
includes discussions of:
• Standardization
• Accessing VECLIB
• Optimization
• Parallel processing
• Roundoff effects
• Data types and precision
• VECLIB naming convention
• Data type and byte length
• Operator arguments
• Error handling
• HP MLIB man pages
• Troubleshooting
Chapter objectives
After reading this chapter you will:
• Know how to access VECLIB library subprograms
• Understand how VECLIB works in a parallel computing environment
• Understand VECLIB naming conventions
• Understand roundoff effects
• Understand how VECLIB handles errors
• Know how to access the online HP MLIB man pages
• Know what to do if you experience trouble using VECLIB subprograms
2 HP MLIB User’s Guide
Standardization
VECLIB conforms to a variety of existing standards. For example, it includes
Basic Linear Algebra Subprograms (BLAS), levels 1, 2, and 3, and Sparse
BLAS.
These products are available with standardized user interfaces on computers
ranging from microcomputers to supercomputers. Because VECLIB conforms to
these standards, it is a software bridge from other computers to
Hewlett-Packard servers and workstations. However, even though the user
interface is standardized, internal workings of HP VECLIB subprograms have
been specialized for supported computers.
HP MLIB supports a subset of the BLAS Standard routines. BLAS
standardization efforts, supported by software and hardware vendors and
university groups, began with a BLAS Technical (BLAST) Forum meeting in
November 1995 at the University of Tennessee. The efforts of the BLAST
Forum resulted in a BLAS Standard specification in 1999. Refer to Chapter 2,
“Basic Vector Operations,” and Chapter 3, “Basic Matrix Operations,” for
details of supported subprograms.
Standardization
Accessing VECLIB
The VECLIB and VECLIB8 libraries are available as archive and shared
libraries. They consist of compiled subprograms ready for you to incorporate
into your programs with the linker. Include the appropriate declarations and
CALL statements in your Fortran source program and specify that VECLIB or
VECLIB8 be used as an object library at link time. Refer to the LAPACK
chapter for details about accessing the LAPACK library, The ScaLAPACK
chapter for details about accessing the ScaLAPACK library, and the Distributed
SuperLU chapter for details about accessing the Distributed SuperLU library.
MLIB libraries are installed in the /opt/mlib/ directory. The entire path depends
on your system type, as follows:
Only 64-bit addressing is supported in the VECLIB8 library.
The file name of the archive VECLIB and VECLIB8 libraries are libveclib.a
and libveclib8.a respectively. The VECLIB shared library for PA is libveclib.sl
and is libveclib.so for all other operating systems.
Performance of your applications is better when you use archive libraries.
However, if you need to keep executable files to a minimum size, you can use
shared libraries. See “Linking with libisamstub.a” on page 8 if your application
calls Fortran77 routines from C language code, and you are linking with the
VECLIB archive library.
The HP VECLIB library contains parallelized subprograms. Refer to Appendix
C, “Parallelized Subprograms,” for details. If your program is run on an HP-UX
system with multiple processors, see “Parallel processing” on page 17 for
additional information about linking your program.
There are several ways to link your program with the version of VECLIB tuned
for the machine on which the program runs. By default, the −lveclib option on
the f90, cc, or c89 command line that links your program selects the library that
corresponds to 32-bit addressing on the machine on which you link.
cc is the HP C compiler and c89 is the HP POSIX-conforming C compiler. The
remainder of this book refers to the cc and f90 compilers. cc and f90 examples
also apply to c89.
Chapter 1 Introduction to VECLIB 5
Accessing VECLIB
When you use the –aarchive_shared flag on your compiler command line for
HP-UX, it ensures that the compiler links the archive library. If the archive
library is not available, then it links the shared library. If you omit
–aarchive_shared and –ashared_archive, the linker defaults to linking the
shared library. Link with –Bstatic on Linux systems.
If your program uses subprograms from VECLIB, LAPACK, ScaLAPACK,
Distributed SuperLU, SOLVERS and VMATH, specify −llapack, −lveclib,
−lscalapack, −lsuperlu_dist, −lsolvers, and −lvmath or −llapack8, −lveclib8,
−lscalapack8, −lsuperlu_dist8, −lsolvers8 and −lvmath8 on the compiler
command line.
NOTEDo not mix subprograms from the two types of 64-bit address libraries
(those with 32-bit integers and those with 64-bit integers) in the same
program.
Each of the VECLIB and LAPACK library files is complete in itself, meaning
that you do not need to link one library because you have called subprograms
from another. This is because various subprograms are included in both
libraries. For example, VECLIB subroutine SGEMV is called by several
LAPACK subprograms, and therefore, is included in the LAPACK library. Thus,
in general, you have to link only the library or libraries you need.
For PA-Based Systems
1. To link a program that uses VECLIB for use on the same machine, use one
of the following commands:
2. Specify the entire path of the library file on the compiler command line that
links your program. For example, to link your program with VECLIB for use
with 32- or 64-bit addressing on a PA-based system, use one of the following:
Then use the −lveclib option on the compiler command line that links your
program:
f90 [options] file ... −lveclib
[options] file ... −lveclib −lcl −lm
cc
[options] file ... −lveclib −lcl −lm
aCC
NOTEAn LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment
variable to specify a library path, you cannot override that specification
with a -Wl option on your compiler command line.
5. To link with the VECLIB8 libraries on a PA-based system, use the +DA2.0W
option to specify memory addresses are 64-bit.
Compile Fortran applications with the +i8, +autodbl, or +autodbl4 option
where:
• +i8 promotes 4-byte integer of logical constants, intrinsics, and user
variables (declared as integer or logical) to 8-byte quantities.
• +autodbl promotes all integer, logical, and real items to 8 bytes, and all
double-precision and complex items to 16 bytes.
• +autodbl4 promotes all integer logical, and real items to 8 bytes, and
complex items to 16 bytes. The +autodbl4 option does not promote the
size of double-precision and double-complex items.
C language codes that call Fortran77 routines from the BLAS Standard, the
sparse linear equation system, or the sparse eigenvalue system, must explicitly
link the ISAM (Indexed Sequential Access Method) stubs library into the
program. For example,
cc [options] file ... –Wl,–aarchive_shared,–L/opt/fortran/lib/libisamstub.a
−lveclib
−lcl −lm
This only applies if you are linking with the VECLIB archive library.
This option is only valid for 32-bit PA systems.
For Itanium-Based HP-UX Systems
1. To link a program that uses VECLIB for use on the same machine, use one
of the following commands:
2. Specify the entire path of the library file on the compiler command line that
links your program. To link your program with VECLIB for use with 32- or
64-bit addressing on an HP-UX system, use one of the following:
NOTEAn LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment
variable to specify a library path, you cannot override that specification
with a -Wl option on your compiler command line.
5. To link with the VECLIB8 libraries on an HP-UX system, use the +DD64
option to specify memory addresses are 64-bit.
Compile Fortran applications with the +i8, +autodbl, or +autodbl4 option
where:
• +i8 promotes 4-byte integer of logical constants, intrinsics, and user
variables (declared as integer or logical) to 8-byte quantities.
• +autodbl promotes all integer, logical, and real items to 8 bytes, and all
double-precision and complex items to 16 bytes.
• +autodbl4 promotes all integer logical, and real items to 8 bytes, and
complex items to 16 bytes. The +autodbl4 option does not promote the
size of double-precision and double-complex items.
Link with -Bdynamic if you want the archive library on a linux system.
Chapter 1 Introduction to VECLIB 9
Accessing VECLIB
NOTEWhen you use the Intel V8.0 C compiler to link the SOLVERS library,
you may require one or more of -lifcore, -lifport, or -ldl.
2. Specify the entire path of the library file on the compiler command line that
links your program. For example, to link your program with VECLIB for use
with 32- or 64-bit addressing on a Linux system, use one of the following:
2. Specify the entire path of the library file on the compiler command line that
links your program. For example, to link your program with VECLIB for use
with 64-bit addressing on an XC6000 system with the Intel V8.0 compiler,
use one of the following:
Then use the −lveclib option on the compiler command line that links your
program:
ifort [options] file ... −lveclib −openmp
[options] file ... −lveclib −openmp
icc
NOTEAn LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment
variable to specify a library path, you cannot override that specification
with a -Wl option on your compiler command line.
For XC6000 Systems with the Intel V7.1 Compiler
1. Use one of the following compile line commands to link VECLIB:
2. Specify the entire path of the library file on the compiler command line that
links your program. For example, to link your program with VECLIB for use
with 64-bit addressing on an XC6000 system with the Intel V7.1 compiler,
use one of the following:
NOTEAn LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment
variable to specify a library path, you cannot override that specification
with a -Wl option on your compiler command line.
For XC4000 Systems with the PGI V5.1 Compiler
1. Use one of the following compile line commands to link VECLIB:
2. Specify the entire path of the library file on the compiler command line that
links your program. For example, to link your program with VECLIB for use
with 64-bit addressing on an XC4000 system with the PGI V5.1 compiler,
use one of the following:
NOTEAn LDOPTS specification takes precedence over using -Wl on the
compiler command line. That is, if you use the LDOPTS environment
variable to specify a library path, you cannot override that specification
with a -Wl option on your compiler command line.
Problem with +ppu compatibility and duplicated symbols
All MLIB subprograms documented in the HP MLIB User’s Guide have two
entry points: one is compatible with the Fortran compiler option +noppu (no
postpend underbar) and the second is compatible with the Fortran compiler
option +ppu (postpend underbar). For example, the MLIB BLAS error handler
XERBLA has entry points xerbla and xerbla_. Table 1-1 shows the Fortran 90
compiler defaults.
Table 1-2Compiler Defaults
Different compiler defaults can cause duplicate symbol definitions at link time
as the following examples illustrate.
PA-RISC processors
Suppose you write your own Fortran version of the XERBLA subroutine. You
compile on PA-RISC with +DA2.0W and the compiler turns on +ppu (See Table
1-1), so your xerbla.o object file has an entry point xerbla_. However, the
PA-RISC MLIB libraries were compiled with +noppu and the entry points with
underbars were added as synonyms to the entry points without underbars.
Hence, the various MLIB subprograms that call XERBLA reference them as
xerbla (without the postpended underbar). Therefore, your xerbla_ does not
satisfy their CALL XERBLA, so the linker accesses xerbla.o from the library.
14 HP MLIB User’s Guide
32-bit Addressing64-bit Addressing
PA-RISC+noppu+ppu
Itanium+ppu+ppu
Accessing VECLIB
This file includes both entry points xerbla and xerbla_, and xerbla_ conflicts
with your XERBLA subroutine.
Use one of the following workarounds on PA-RISC if you are compiling your
own Fortran version of XERBLA:
• Compile your XERBLA with +noppu. This implies that you also compile any
of your program that calls XERBLA with +noppu, and that you apply this
rule recursively.
• Use f90 ALIAS directives:
% !$HP$ ALIAS xerbla='xerbla'
This prevents the compiler from postpending the underbar onto the entry
point and external references for XERBLA.
This problem is not confined to the XERBLA subroutine. It occurs on PA-RISC
when the following conditions are met:
• Any subprogram with the same name as a user-visible MLIB subprogram is
compiled in your program with +ppu
• Another MLIB subprogram used by your program also calls that
subprogram
Furthermore, the reverse can occur in the PA-RISC 32-bit libraries if you
compile with the +ppu compiler option.
Itanium processors
If you compile on Itanium using the compiler option +noppu your xerbla.o
object has an entry point xerbla (without the postpended underbar). However
the Itanium MLIB libraries were compiled with +ppu option (following the
default behavior described in Table 1-1) and the entry points without underbars
were added as synonyms to the entry points with underbars. Hence the various
MLIB Fortran source subprograms that call XERBLA reference them as
xerbla_. Therefore, your xerbla (without the postpended underbar) does not
satisfy the Fortran calls to XERBLA, so the linker accesses xerbla.o from the
library. This file contains both entry points xerbla and xerbla_, and xerbla
conflicts with your XERBLA subroutine.
Moreover, the various MLIB C source subprograms that call XERBLA still
reference them as xerbla (without the postpended underbar). It means that
both xerbla and xerbla_ entry points must be provided in your xerbla.o.
Use one of the following workarounds on Itanium if you are compiling your own
Fortran version of XERBLA:
• Compile your Fortran source for XERBLA with +noppu and provide both
xerbla and xerbla_ entry points:
This problem is not confined to the XERBLA subroutine. It occurs on
Itanium when the following conditions are met:
• Any subprogram with the same name as a user-visible MLIB
subprogram is compiled in your program with
• +noppu
• Another MLIB subprogram used by your program also calls that
subprogram
PA-RISC and Itanium processors
Also note that all the workarounds listed for Itanium are more generic and can
also be used for PA-RISC. Therefore, if you are coding your own version of a
MLIB routine called "mlib_routine" on PA-RISC or Itanium, a Fortran version
might be implemented as:
!$HP$ ALIAS mlib_routine='mlib_routine'
!$HP$ ALIAS mlib_routine_='mlib_routine_'
The key computational kernels in VECLIB have been optimized to take full
advantage of both PA-RISC and Itanium tightly integrated architectures.
Optimizations include:
Optimization
• Instruction scheduling to maximize the number of machine instructions
executed per clock cycle
• Algorithm restructuring to increase the number of computations per
memory access
• Cache management to decrease the number of data cache misses
Refer to “Accessing VECLIB” on page 3 for instructions on linking the desired
library with your program.
Parallel processing
Parallel processing is available on multi-processor HP platforms running the
HP-UX 11i or greater operating system and Linux. These systems can divide a
single computational process into small streams of execution, called threads.
The result is that you can have more than one processor executing on behalf of
the same process. Also, refer to Appendix C, “Parallelized Subprograms,” for a
list of HP VECLIB parallelized subprograms.
Chapter 1 Introduction to VECLIB 17
Parallel processing
You can enable or disable parallel processing at link time or at runtime. A
program does not use parallelism in VECLIB unless parallel processing is
enabled at both link time and at runtime.
Linking for parallel or non parallel processing
To enable parallel processing at link time, your link step must produce a
multithreaded executable. On HP-UX systems, use the +
compiler options to get a multithreaded executable when you link with the HP
Fortran compiler; use +O3 and +Oopenmp when you link with the HP C
compiler; and use +Oopenmp when you link with the HP aC++ compiler:
f90 [options including +O3 +Oparallel] file ... –Wl,–aarchive_shared −lveclib
[options including +O3 +Oopenmp] file ... –Wl,–aarchive_shared −lveclib −lcl
cc
−lm [options including +Oopenmp] file ... –Wl,–aarchive_shared −lveclib −lcl
aCC
−lm
To disable VECLIB’s automatic parallelism at link time, omit the +Oparallel
and +Oopenmp options:
When you enable parallelism at link time, three methods are available at
runtime to specify the extent of parallel processing in MLIB.
• Use MLIB_NUMBER_OF_THREADS, a shell environment variable that
allows you to enable parallelism within MLIB subprograms and to specify
the maximum number of threads that can be used in parallel regions.
Not setting MLIB_NUMBER_OF_THREADS has the same result as setting
it to 1; that is, parallel processing is disabled within MLIB subroutines.
Setting MLIB_NUMBER_OF_THREADS to the number of CPUs in the
system, or greater, allows parallelized MLIB subprograms to use as many
CPUs as are available to the process.
The following command lines show the C shell syntax and Korn shell syntax
to use when setting the variable to eight processors:
For C shell:
setenv MLIB_NUMBER_OF_THREADS 8
18 HP MLIB User’s Guide
Parallel processing
For Korn shell:
export MLIB_NUMBER_OF_THREADS=8
MLIB_NUMBER_OF_THREADS is examined on the first call to a
parallelized VECLIB subprogram to establish the default parallel action
within VECLIB.
Use the subroutine MLIB_SETNUMTHREADS to restore VECLIB parallel
processing to its run-time default that was specified by
MLIB_NUMBER_OF_THREADS. Refer to the mlib_setnumthreads(3m)
man page for usage information.
• Use the subroutine MLIB_SETNUMTHREADS.
You can call this subroutine at any time to set the maximum number of
parallel threads used in subsequent VECLIB or LAPACK calls. The
specified value overrides the absence of the
MLIB_NUMBER_OF_THREADS environment variable or any value
assigned to it.
• To modify the effect of the MLIB_NUMBER_OF_THREADS and
MLIB_SETNUMTHREADS, you can also use
MP_NUMBER_OF_THREADS at runtime. When
MP_NUMBER_OF_THREADS is set to a value smaller than
MLIB_NUMBER_OF_THREADS and MLIB_SETNUMTHREADS, the
value of MP_NUMBER_OF_THREADS takes precedence and limits the
amount of parallelism to MP_NUMBER_OF_THREADS.
These controls set the maximum amount of SMP parallelism that your program
can use, and the VECLIB-specific mechanisms offer finer control within that
maximum. Refer to “Performance benefits” below for more information.
Performance benefits
If VECLIB parallelism is enabled, each parallelized VECLIB subprogram
determines at runtime if multiple processors are available. If multiple
processors are available, VECLIB detects whether the program is already using
multiple threads. If the application calling VECLIB is not running multiple
threads, VECLIB will attempt to use all available threads (limited by
MLIB_NUMBER_THREADS). If the application calling VECLIB is already
running multiple threads, VECLIB will attempt to use the remaining threads
without over-subscribing. This is a form of nested parallelism.
If you are using an HP server with multiple processors, you can realize the
performance benefits of parallel processing in three ways:
• Call any parallelized VECLIB subprogram. Let it use parallelism internally
if it determines that it is appropriate to do so based on such factors as
problem size, system configuration, and user environment.
Chapter 1 Introduction to VECLIB 19
Parallel processing
• Call VECLIB subprograms in a parallelized loop or region. VECLIB
• Use the Message Passing Interface (MPI) explicit parallel model. Refer to
VECLIB subprograms are reentrant, meaning that they may be called several
times in parallel to do independent computations without one call interfering
with another. You can use this feature to call VECLIB subprograms in a
parallelized loop or region.
The compiler does not automatically parallelize loops containing a function
reference or subroutine call. You can force it to parallelize such a loop by
defining OpenMP parallel regions.
For example, the following Fortran code makes parallel calls to subprogram
DAXPY:
supports nested parallelism where the outer parallelism is implemented
through OpenMP while the inner parallelism is implemented with VECLIB
SMP parallelism. To use this mechanism, you must be familiar with the
techniques of parallel processing. Refer to the Parallel Programming Guidefor HP-UX Systems for details.
the HP MPI User’s Guide or the MPI(1) man page for details.
NTHREADS = 4
C$OMP PARALLEL DO NUM_THREADS(NTHREADS)
DO J=1, N
CALL DAXPY (N-I,A(I,J),A(I+1,I),1,A(I+1,J),1)
ENDO
C$OMP END PARALLEL DO
While optimizing a parallel program, you may want to make parallel calls to a
VECLIB subprogram to execute independent operations where the call
statements are not in a loop. OpenMP supports the PARALLEL and ENDPARALLEL directions that define a block of code that is to be executed by
multiple threads in parallel.
OpenMP-based nested parallelism
Nested parallelism can be achieved when calling VECLIB parallelized
subprograms from an OpenMP parallel region. (See “Parallelized subprograms
in VECLIB” on page 1104.) Consider the following code running on an HP
platform with at least four processors:
if (myid.eq.0) then
call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta,
c,ldc)
else
call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,
f,ldf)
20 HP MLIB User’s Guide
Parallel processing
endif
c$omp end parallel
call omp_set_nested(.false.)
...
Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way
parallel: one OpenMP thread for
CαAB βC+=
and another for
FαDE βF+=
Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism
and run the code four-way parallel.
If a parallel VECLIB subprogram is called from a parallelized loop or region,
VECLIB will automatically avoid over-subscription of the CPUs. The number of
threads spawned by each call to a parallelized VECLIB subroutine on a nested
parallel region is limited by:
• MLIB_NUMBER_OF_THREADS
• The number of threads still available in the system
• will never be larger than four. Specifically:
MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)
Message passing-based nested parallelism
Nested parallelism can be achieved when calling VECLIB parallelized
subprograms from an MPI process. (See “Parallelized subprograms in VECLIB”
on page 1104.) Consider the following code:
if (myid.eq.0) then
call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta,
c,ldc)
else
call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,
f,ldf)
endif
...
Chapter 1 Introduction to VECLIB 21
Parallel processing
Assume the application started on two MPI processes. Using
MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel:
one MPI process for
CαAB βC+=
and another for
FαDE βF+=
Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism
and run the code four-way parallel.
Default CPS library stack is too small for MLIB
In libcps, the HP Compiler Parallel Support library, a CPS thread has a
default stack size of 8M bytes. For performance reasons, several subprograms
in HP MLIB use the stack for temporary arrays that exceed the default value.
Using the default CPS stack size, these routines overwrite neighboring stacks,
resulting in errors that are difficult to diagnose.
The solution is to change the CPS thread stacksize attribute to a value that is
large enough to accommodate all the MLIB subprograms the thread may
encounter. Currently, 8 MB*(the number of threads) should be sufficient for all
MLIB subprograms.
The environment variable CPS_STACK_SIZE expects values in K bytes.
Setting the stack size as follows would be sufficient for programs that execute
on two threads:
For C shell:
% setenv CPS_STACK_SIZE 16384
For Korn shell:
% export CPS_STACK_SIZE=16384
Default Pthread library stack is too small for MLIB
The stack allocated for each new thread created using direct pthread calls to
“pthread_create” might not be large enough for HP MLIB. Several subprograms
in HP MLIB use the stack for storing temporary work arrays and improve
performance. If the stack size is not large enough, these routines overwrite
neighboring stacks, resulting in errors that are difficult to diagnose.
22 HP MLIB User’s Guide
Currently, 8 MB*(the number of threads) should be sufficient for all MLIB
subprograms. If your application launchs threads directly from pthread library
calls, set the minimum allocated stack size to match HP MLIB needs on each
new thread. Setting the stack size on HP-UX as follows would be sufficient for
programs that execute on two threads:
VECLIB subprograms may use a different arithmetic order of evaluation than
other implementations. Different roundoff characteristics may result. Accuracy
of results is usually similar to other implementations, so using VECLIB should
not affect the accumulation of roundoff errors in a complete application. If it
does, examine the mathematical analysis of the problem to determine if it is
ill-conditioned. Ill-conditioned means that the small roundoff errors that are
inadvertently introduced into any computation are magnified out of proportion
to the desired result. Similarly, if results with and without VECLIB differ
materially, both sets of answers are probably inaccurate and you should
investigate further. If the program correctly applies stable computational
algorithms, the problem itself is probably ill-posed.
Roundoff effects
Data types and precision
In general, VECLIB provides the same range of functionality for both real and
complex data. For most computations, there are matching subprograms for real
and complex data, but there are a few exceptions.
For example, corresponding to the subprograms for real dot products, there are
subprograms for complex dot products in both the conjugated and unconjugated
forms because both types of complex dot products occur. However, there is no
complex analogue of the subprograms for solving a real symmetric sparse
linear system.
Chapter 1 Introduction to VECLIB 23
VECLIB naming convention
Matching subprograms for real and complex data have been coded to maintain
a close correspondence between the two. However, in some areas, the
correspondence is necessarily weaker, and this has not been possible.
Subprograms in VECLIB are provided in both 32-bit and 64-bit addressing
versions, except in the VECLIB8 library where only 64-bit addressing is
supported.
VECLIB naming convention
The name of each VECLIB subprogram is a coded specification of its function
within the limits of standard Fortran 77 6-character names. Usually, the first
character of a subprogram name, denoted by T, shows the predominant data
type according to Table 1-3:
Basic Linear Algebra Subprograms, that is, level 1, 2, and 3 BLAS, as well as
the Sparse BLAS use this naming convention. For example, subprograms that
compute the sum of vector elements are named according to data type: SSUM,
DSUM, ISUM, CSUM, and ZSUM. Some function subprograms use two of these
letters, the first describing the data type of the function and the second
indicating the type of data on which it operates.
Fortran 77 BLAS Standard subprograms use a similar convention, with F_
prepended to each routine name.
For example, the legacy BLAS single-precision, triangular-solve routine is
named STRSM and its BLAS Standard counterpart is named F_STRSM. BLAS
Standard subprograms that compute the sum of vector elements are named
F_SSUM, F_DSUM, F_CSUM, and F_ZSUM. Refer to “Legacy BLAS routines”
on page 211 and “BLAS Standard routines” on page 339 for more information.
C BLAS Standard subprograms begin with C_ prepended to each routine name.
24 HP MLIB User’s Guide
Data type and byte length
There is a relationship between the data type of a subprogram, designated by
the first character of its name (refer to T in Table 1-3), and the byte lengths of
its arguments. This relationship is shown in Table :
MLIB Argument Lengths
MLIB Argument Lengths
Data type and byte length
T
S4848
D48816
C4848
Z48816
VECLIB
INTEGER
LOGICAL
Operator arguments
Some BLAS routines take input-only arguments called operators that allow for
the specification of multiple related operations to be performed by a single
function. Operator arguments used by the BLAS Standard routines are NORM,
SORT, SIDE, UPLO, TRANS, CONJ, DIAG
defined as follows:
normUsed by routines computing the norm of a vector or
sortUsed by sorting routines.There are two valid values
sideUsed by functions computing the product of two
VECLIB8
INTEGER
LOGICAL
, and JROT. Their meanings are
matrix. There are seven valid values that specify the
norm to be computed, namely one-norm, real one-norm,
infinity-norm and real infinity-norm for vectors and
matrices, two-norm for vectors, and Frobenius-norm,
max-norm, and real max-norm for matrices.
that specify whether the data should be specified in
increasing or decreasing order.
matrices A and B. There are two valid values that
specify whether A*B or B*A should be computed.
REALCOMPLEX
Chapter 1 Introduction to VECLIB 25
Operator arguments
uploRefers to triangular matrices. There are two valid
values to specify whether a matrix is upper or lower
triangular.
transUsed by routines applying a matrix, say A, to another
vector or another matrix. There are three valid values
to specify whether the matrix (A), its transpose (AT), or
its conjugate transpose (A*) should be applied. op(A)
T
refers to A, A
, or A* depending on the input value of
the trans operator argument. Some BLAS routines
have more than one
trans operator. For example, a
general matrix multiply operation can be specified as
CopA()op B()←
where A, B, and C are general matrices. A trans
argument is needed for each of the input matrices A
and B. These arguments are denoted transA and
transB.
conjUsed by complex routines operating with x or
x.
diagRefers to triangular matrices. Two values are valid to
specify whether the triangular matrix has
unit-diagonal or not.
jrotUsed by the routine to generate Jacobi rotations. There
are three valid values to specify whether the rotation is
an inner rotation, an outer rotation, or a sorted
rotation.
For BLAS Standard routines, specify an operator argument with a named
constant value. Table 1-4 on page 27 lists the operator arguments and their
associated named constants.
The actual numeric value assigned to the named constant is defined in the
appropriate language include file. For example, the f77blas.h include file
defines assigned values for Fortran 77.
Table 1-4 lists the operator arguments and their associated named constants.
26 HP MLIB User’s Guide
Table 1-4BLAS Standard Operator Arguments
Operator arguments
Operator
Argument
norm
sortblas_increasing_ordersort in increasing order
sideblas_left_sideoperate on the left hand side
uploblas_upperreference upper triangle only
transxblas_trans
conjblas_conj
diagblas_non_unit_diagnon-unit triangular
jrotblas_jrot_inner
Named ConstantMeaning
blas_one_norm1-norm
blas_real_one_normreal 1-norm
blas_two_norm2-norm
blas_frobenius_normFrobenius-norm
blas_real_inf_normreal infinity-norm
blas_max_normmax-norm
blas_real_max_normreal-norm
blas_decreasing_ordersort in decreasing order
blas_right_sideoperate on the right hand side
blas_conj_trans
blas_unit_diagunit triangular
blas_jrot_outer
blas_jrot_sorted
blas_inf_norminfinity-norm
blas_lowerreference lower triangle only
operate with x
blas_no_transoperate with x
1
-------
∗
x
x
1
-------
c
≥
2
2
operate with
operate with
blas_no_conjoperate with x
inner rotation
0 c
outer rotation
sorted rotation
≤≤
abs a() abs b()≥
T
Chapter 1 Introduction to VECLIB 27
Error handling
Error handling
VECLIB subprograms are divided into two classes according to the way they
detect and report usage errors:
• Low-level subprograms
• High-level subprograms
Low-level subprograms
Low-level subprograms are only minimally capable of detecting or handling
errors. These subprograms attempt to do what is reasonable when a usage
error occurs, but they do not warn you that something is wrong. For example,
SDOT, which computes the dot product of two dense real vectors of length N,
gives the mathematically proper result, zero, when N ≤ 0. This is considered
mathematically correct because, by definition, an empty sum is zero. Because
SDOT conforms to the published BLAS argument list and usage, however,
SDOT does not notify you that you may have made a mistake. Issuing a
warning would add severe usage conflicts with codes that already use the
BLAS.
Because these low-level subprograms do what is reasonable, you can simplify a
program and perhaps speed it up by using VECLIB subprograms without
checking for special cases. For example, relying on SDOT having a zero result
for N ≤ 0 may eliminate an IF statement that would take the same branch
almost every time through a loop.
High-level subprograms
High-level subprograms detect and report errors. These subprograms usually
do more work than the low-level subprograms, so the relative expense of
checking and reporting errors may be small. Some possible errors are:
• Argument value errors, such as negative matrix order
• Computational errors, such as a singular matrix
• Computational problems, such as ill-conditioning
When a high-level subprogram detects an error, it responds with a success/error
code, usually with the output argument ier.
The convention used for ier is:
• ier = 0: Successful exit
• ier < 0: Invalid value of an argument—computation not completed
28 HP MLIB User’s Guide
Troubleshooting
• ier > 0: Failure during the computation
Some VECLIB subprograms do not have a success/error code in their argument
lists, but instead call another VECLIB subprogram to process the error
condition. MLIB provides the following error handlers:
• XERBLA
• XERVEC
• F_BLASERROR
Refer to the documentation for individual VECLIB subprogram to determine if
one of these error handlers is used. For example, all BLAS Standard
subprograms (those subprograms whose names begin with F_) use
F_BLASERROR.
The standard versions of XERBLA, XERVEC, and F_BLASERROR write an
error message onto the standard error file. Execution is then terminated with a
nonzero exit status. You can supply a version of the error handler that alters
this action. Refer to “XERBLA” on page 337, “XERVEC” on page 623, and
“F_BLASERROR” on page 605 for more information about these routines.
Routine-specific error conditions are listed in the respective subprogram
documentation.
Troubleshooting
The following are suggestions to help you find the cause of a problem:
1. Verify that the subprogram usage in the program matches the subprogram
specifications in the documentation. Pay attention to the number of
arguments in the CALL statement and to the declarations of arrays and
integer constants or variables that describe them. Write out all the
arguments immediately before and after the CALL statement.
2. Make sure there really is a problem. For example, if you get an apparently
incorrect answer, check if the answer satisfies the problem as defined in the
program. For problems with more than one answer, VECLIB may produce a
different answer or give the answers in a different order than expected. If
the problem is ill-conditioned, VECLIB may not be able to compute a
reliable answer. Error messages often suggest the cause of the problem.
3. Isolate the problem. If possible, write a small test program that encounters
the same difficulty. For example, write the data causing the problem from
Chapter 1 Introduction to VECLIB 29
HP MLIB man pages
the original program into the small one. In this way, you eliminate
extraneous code from suspicion. If the problem area is large, try to pare it to
a manageable size. For example, if a 50-by-50 linear system fails, try to
produce a 2-by-2 system that fails in the same way.
HP MLIB man pages
The HP MLIB man pages contain online documentation that includes
information from the HP MLIB User’s Guide.
The HP MLIB man pages are installed in the directory /opt/mlib/share/man.
Set this path in your MANPATH environment variable to access man pages for
VECLIB and LAPACK.
HP VECLIB man pages provide:
• An introduction to VECLIB (veclib (3m))
• An introduction to each group of subprograms, for example, blas2(3m)
• A man page for each subprogram
The HP MLIB User’s Guide has more detailed information than the man pages
because of the limited number of fonts supported and the difficulty of
presenting mathematical equations in the man(1) system.
For further explanation and a table of contents of reference entries for VECLIB,
refer to the veclib(3) man page.
30 HP MLIB User’s Guide
2Basic Vector Operations
Overview
This chapter explains how to use the VECLIB vector subprograms that serve as
building blocks for many user programs. It describes subprograms for
performing dense and sparse vector operations. This set of VECLIB
subprograms includes:
• Basic Linear Algebra Subprograms
• Sparse BLAS
• Hewlett-Packard extensions to the BLAS
• BLAS Standard
The term BLAS, as used in this section, refers to all of the above listed BLAS
and the Hewlett-Packard extensions to the BLAS.
BLAS standardization efforts, supported by software and hardware vendors
and university groups, began with a BLAS Technical (BLAST) Forum meeting
in November 1995 at the University of Tennessee. The efforts of the BLAST
Forum resulted in a BLAS Standard specification in 1999.
HP MLIB 8.5 supports a subset of the BLAS Standard routines. This chapter
describes dense and banded BLAS Standard functionality in “BLAS Standard
routines” on page 152.
Overview
This section discusses commonly used or computationally expensive operations
of linear algebra. Even though you can code most of these operations in fewer
than 10 lines of Fortran or C, using VECLIB subprograms can improve
program performance as well as program modularity and readability. However,
in some situations, you can achieve better computational performance by
entering Fortran or C code than by calling one of these subprograms.
Chapter 2 Basic Vector Operations 31
Chapter objectives
Chapter objectives
After reading this chapter you will:
• Understand BLAS storage conventions
• Know how to specify array sections
• Know how to handle backward storage
• Know how to use increment (also called stride) arguments
• Understand the vector subprograms included with HP VECLIB, both
Legacy BLAS and BLAS Standard subprograms
Associated documentation
The following documents provide supplemental material for this chapter:
Dodson, D.S., R.G. Grimes, and J.G. Lewis. “Sparse Extensions to the Fortran
Basic Linear Algebra Subprograms.” ACM Transactions on MathematicalSoftware. June, 1991. Vol. 17, No. 2.
Dodson, D.S., R.G. Grimes, and J.G. Lewis. “Algorithm 692: Model
Implementation and Test Package for the Sparse Basic Linear Algebra
Subprograms.” ACM Transactions on Mathematical Software. June, 1991. Vol.
17, No. 2.
Lawson, C., R. Hanson, D. Kincaid, and F. Krogh. “Basic Linear Algebra
Subprograms for Fortran Usage.” ACM Transactions on MathematicalSoftware. September, 1979. Vol. 5, No. 3.
32 HP MLIB User’s Guide
What you need to know to use vector subprograms
What you need to know to use vector subprograms
The following sections describe overall considerations for using vector
subprograms:
• BLAS storage conventions
❏ Fortran storage of arrays
❏ Fortran array argument association
The Basic Linear Algebra Subprograms (BLAS) were developed to enhance the
portability of published linear algebra codes. In particular LAPACK, the
high-level public-domain linear equation package, uses the BLAS. When your
routines use the VECLIB BLAS, you increase the efficiency of those routines on
your Hewlett-Packard servers.
You need not limit your use of the VECLIB BLAS to linear algebra codes.
Because these subprograms are portable, modular, and efficient, you can
incorporate them into your programs. To realize the full power of the BLAS, you
should understand storage and indexing conventions.
Chapter 2 Basic Vector Operations 33
What you need to know to use vector subprograms
Fortran storage of arrays
Two-dimensional arrays in Fortran are stored by columns. Consider the
following specifications:
DIMENSION A(N1,N2),B(N3)
EQUIVALENCE (A,B)
where N3 = N1 X N2. Then A(I,J) is associated with the same memory
location as B(K) where
K = I + (J−1) × N1
Successive elements of a column of A are adjacent in memory, while successive
elements of a row of A are stored with a difference of N1 storage units between
them. Remember that the size of a storage unit depends on the data type.
Fortran array argument association
When a Fortran subprogram is called with an array element as an argument,
the value is not passed. Instead, the subprogram receives the address in
memory of the element. Consider the following code segment:
REAL A(10,10)
J = 3
L = 10
CALL SUBR (A(1,J),L)
.
.
.
SUBROUTINE SUBR (X,N)
REAL X(N)
.
.
.
SUBR is given the address of the first element of the third column of A. Becauseit treats that argument as a one-dimensional array, successive elements X(1),
X(2),..., occupy the same memory locations as the successive elements of the
third column of A, that is, A(1,3), A(2,3),... Hence, the entire third column
of A is available to the subprogram.
34 HP MLIB User’s Guide
BLAS indexing conventions
This section describes handling stride arguments and forward and backward
storage.
A vector in the BLAS is defined by three quantities:
• Vector length
• Array or starting element within an array
• Increment, sometimes called stride—defines the number of storage units
between successive vector elements
Forward storage
Suppose that X is a real array. Let N be the vector length and let INCX be the
increment. Suppose that a vector x with components xi, i= 1, 2, ..., N, is stored
in X. If INCX ≥ 0, then xi is stored in
X(1 + (I−1) × INCX). This is forward storage starting from X(1) with stride
equal to INCX, ending with X(1 + (N−1) × INCX). Thus, if N = 4 and
INCX = 2, the vector components x1, x2, x3, and x4 are stored in the array
elements X(1), X(3), X(5), and X(7), respectively.
What you need to know to use vector subprograms
Backward storage
Some BLAS subprograms permit the backward storage of vectors, which is
specified by using a negative INCX. If INCX < 0, then xi is stored in X(1 +(N−I) × |INCX|) or equivalently in X(1 − (N−I) × INCX). This is
backward storage starting from X(1 − (N−1) × INCX) with stride equal to
INCX, ending with X(1). Thus, if N = 4 and INCX = −2, the vector components
x1, x2, x3, and x4 are stored in the array elements
X(7), X(5), X(3), and X(1), respectively.
INCX = 0 is only permitted by COPY routines in the legacy BLAS subprograms.
When INCX = 0 is allowed, it means that x is a vector of length N, whose
components all equal the value of X(1).
The notation (N,X,INCX) describes a BLAS vector. For example, if X is an
array of dimension N, then (N,X,1) represents forward storage and (N,X,−1)
represents backward storage. If A is an M-by-N array, then (M,A(1,J),1)
represents column J and (N,A(I,1),M) represents row I. Finally, if an M-by-N
matrix is embedded in the upper left-hand corner of an array B of size LDB by
NMAX, then column J is (M,B(1,J),1) and row is I(N,B(I,1),LDB).
Chapter 2 Basic Vector Operations 35
What you need to know to use vector subprograms
Increment arguments
The following examples illustrate how to use increment arguments to perform
different operations with the same subprogram. These examples use the
function F_SDOT with the following usage:
SUBROUTINE F_SDOT (CONJ, N, ALPHA, X, INCX, BETA, Y, INCY, R)
INTEGER CONJ, INCX, INCY, N
REAL*4 ALPHA, BETA, R, X(*), Y(*)
This usage adds the scaled dot product of the vectors X( * ) and Y( * ) to a scaled
scalar R.
Example 1
Compute the dot product
R = X(1)*Y(1) + X(2)*Y(2) + X(3)*Y(3) + X(4)*Y(4):
This is the dot product of the second row of an M-by-3 matrix A, stored in a
10-by-3 array, with a 3-vector X:
PARAMETER (LDA = 10)
REAL*4 SDOT,A(LDA,3),X(3),Y(LDA)
N = 3
Y(2) = SDOT (N, A(2,1),LDA, X,1)
Operator arguments in the BLAS Standard
Some routines in the BLAS Standard take input-only arguments called
operators. Operators allow for the specification of multiple related operations to
be performed by a single function. The BLAS Standard specifies the type and
the valid values these arguments should have according to the specific
programming language.
Operator arguments used by the BLAS Standard routines are NORM, SORT,
SIDE, UPLO, TRANS, CONJ, DIAG,
on page 25 for explanations of the valid operator values.
36 HP MLIB User’s Guide
and JROT. Refer to “Operator arguments”
In BLAS Standard routines, you specify an operator argument with a named
constant value. The actual numeric value assigned to the named constant is
defined in the appropriate language’s include file. Operator arguments are
represented in the Fortran 77 interface as INTEGERs. This specification is
different from the legacy BLAS, where operator arguments are defined as
CHARACTER*1.
Refer to individual routines in “BLAS Standard routines” on page 152 for the
named constants you can use to specify operator arguments for basic vector
subprograms.
Representation of a permutation matrix
This section explains how the BLAS Standard represents a permutation
matrix.
An n-by-n permutation matrix P is represented as a product of at most n
interchange permutations. An interchange permutation E is a permutation
obtained by swapping two rows of the identity matrix. An efficient way to
represent a general permutation matrix P is with an integer vector p of length
n. In other words, P = En...E1 and each Ei is the identity with rows i and p
interchanged:
What you need to know to use vector subprograms
i
For i = n to 1 and incp < 0,
For i = 1 to n and incp > 0,
xi() xpi()()↔
xi() xpi()()↔
Chapter 2 Basic Vector Operations 37
What you need to know to use vector subprograms
Representation of a Householder matrix
This section explains how the BLAS Standard represents a Householder
matrix.
An elementary reflector (or elementary Householder matrix) H of order n is a
unitary matrix of the form
HIτυυ
where τ is a scalar, and υ is an n-vector, with
2
τ
υ is often referred to as the Householder vector. Often, υ can have leading or
trailing zero elements, but for the purposes of this discussion, assume that H
has no special structure.
This representation sets υ1 = 1, meaning that υ1 need not be stored. In real
arithmetic,except that τ = 0 implies H = I. This representation agrees
with that used in LAPACK.
In complex arithmetic, τ may be complex, and satisfies
1 τ 2,≤≤
1 Re τ() 2 and τ 1–1≤≤≤
H
–=
2
υ
2Re τ()=
2
Thus a complex H is not Hermitian, as with other representations, but it is
unitary. The latter is the important property. The advantage of allowing τ to be
complex is that, given an arbitrary complex vector x, H can be computed so that
with real β. This is useful, for example, when reducing a complex Hermitian
matrix to a real symmetric tridiagonal matrix, or a complex rectangular matrix
to real bidiagonal form.
38 HP MLIB User’s Guide
H∗xβ 10… 0,, ,()
=
T
Subprograms for basic vector operations
Subprograms for basic vector operations
The following sections in this chapter describe the vector subprograms included
with VECLIB:
• Legacy BLAS routines
• BLAS Standard routines
Note that the specification for operator arguments is different in legacy BLAS
routines than in BLAS Standard routines. Operator arguments are represented
in the BLAS Standard Fortran 77 interface as INTEGERs; in the legacy BLAS
they are defined as CHARACTER*1.
In BLAS Standard routines, you specify an operator argument with a named
constant value. Refer to the individual routines in “BLAS Standard routines”
on page 152 for the named constants you can use to specify operator
arguments. The actual numeric value assigned to the named constant is
defined in the f77blas.h include file.
Chapter 2 Basic Vector Operations 39
ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAXIndex of maximum of magnitudes
Legacy BLAS routines
NameISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX
Index of maximum of magnitudes
PurposeGiven a real or integer vector x of length n, ISAMAX, IDAMAX, or IIAMAX
determines the index of the element of the vector of maximum magnitude.
Specifically, the subprograms determine the smallest index i such that
x
x
=
max
i
Given a complex vector x of length n, ICAMAX or IZAMAX determines the
smallest index i:
Re xi() Im xi()+max Re xj() Im xj()+:j12… n,, ,=()=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively.
The usual definition of complex magnitude is
j
: j12… n,, ,=
Re xi()2Im xi()
This definition is not used because of computational speed. If the index i is used
for pivot selection in matrix factorization, no significant difference in numerical
stability should result.
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4i, ISAMAX, n, incx
REAL*4x(lenx)
i = ISAMAX(n, x, incx)
INTEGER*4i, IDAMAX, n, incx
REAL*8x(lenx)
i = IDAMAX(n, x, incx)
INTEGER*4i, IIAMAX, n, incx, x(lenx)
i = IIAMAX(n, x, incx)
40 HP MLIB User’s Guide
12⁄
+
2
Index of maximum of magnitudesISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX
INTEGER*4i, ICAMAX, n, incx
COMPLEX*8x(lenx)
i = ICAMAX(n, x, incx)
INTEGER*4i, IZAMAX, n, incx
COMPLEX*16x(lenx)
i = IZAMAX(n, x, incx)
VECLIB8:
INTEGER*8i, ISAMAX, n, incx
REAL*4x(lenx)
i = ISAMAX(n, x, incx)
INTEGER*8i, IDAMAX, n, incx
REAL*8x(lenx)
i = IDAMAX(n, x, incx)
INTEGER*8i, IIAMAX, n, incx, x(lenx)
i = IIAMAX(n, x, incx)
INTEGER*8i, ICAMAX, n, incx
COMPLEX*8x(lenx)
i = ICAMAX(n, x, incx)
INTEGER*8i, IZAMAX, n, incx
COMPLEX*16x(lenx)
i = IZAMAX(n, x, incx)
InputnNumber of elements of vector x to be used. If n ≤ 0, the
subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1) ×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
OutputiIf n ≤ 0, then i = 0. Otherwise, i is the index of the
element of x of maximum magnitude.
Chapter 2 Basic Vector Operations 41
ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAXIndex of maximum of magnitudes
Fortran
Equivalent
INTEGER*4 FUNCTION ISAMAX (N,X,INCX)
REAL*4 X(*),TEMP,XMAX
ISAMAX = 1
IF ( N .GT. 1 ) THEN
XMAX = ABS ( X(1) )
INCXA = ABS ( INCX )
IX = 1 + INCXA
DO 10 I = 2, N
TEMP = ABS ( X(IX) )
IF ( TEMP .GT. XMAX ) THEN
ISAMAX = I
XMAX = TEMP
END IF
IX = IX + INCXA
10 CONTINUE
ELSE IF ( N .LT. 1 ) THEN
ISAMAX = 0
END IF
RETURN
END
ExampleLocate the largest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDAMAX,N,INCX
REAL*8 X(20)
N = 10
INCX = 1
I = IDAMAX (N,X,INCX)
42 HP MLIB User’s Guide
Index of minimum of magnitudesISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN
NameISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN
Index of minimum of magnitudes
PurposeGiven a real or integer vector x of length n, ISAMIN, IDAMIN, or IIAMIN
determines the index of element of the vector of minimum magnitude.
Specifically, the subprograms determine the smallest index i such that
x
min xj: j12… n,, ,=()=
i
Given a complex vector x of length n, ICAMIN or IZAMIN determines the
smallest index i:
|Re(xi)|+|Im(xi)|min
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively.
The usual definition of complex magnitude is
=
)|+|Im(xj)| : j1, 2, ..., n=
|Re(x
j
Re x
()2Im xi()
i
This definition is not used because of computational speed.
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4i, ISAMIN, n, incx
REAL*4x(lenx)
i = ISAMIN(n, x, incx)
INTEGER*4i, IDAMIN, n, incx
REAL*8x(lenx)
i = IDAMIN(n, x, incx)
INTEGER*4i, IIAMIN, n, incx, x(lenx)
i = IIAMIN(n, x, incx)
INTEGER*4i, ICAMIN, n, incx
COMPLEX*8x(lenx)
i = ICAMIN(n, x, incx)
INTEGER*4i, IZAMIN, n, incx
COMPLEX*16x(lenx)
i = IZAMIN(n, x, incx)
12⁄
+
2
Chapter 2 Basic Vector Operations 43
ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMINIndex of minimum of magnitudes
VECLIB8:
INTEGER*8i, ISAMIN, n, incx
REAL*4x(lenx)
i = ISAMIN(n, x, incx)
INTEGER*8i, IDAMIN, n, incx
REAL*8x(lenx)
i = IDAMIN(n, x, incx)
INTEGER*8i, IIAMIN, n, incx, x(lenx)
i = IIAMIN(n, x, incx)
INTEGER*8i, ICAMIN, n, incx
COMPLEX*8x(lenx)
i = ICAMIN(n, x, incx)
INTEGER*8i, IZAMIN, n, incx
COMPLEX*16x(lenx)
i = IZAMIN(n, x, incx)
InputnNumber of elements of vector x to be used. If n ≤ 0, the
subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
OutputiIf n ≤ 0, then i = 0. Otherwise, i is the index of the
element of x of minimum magnitude.
44 HP MLIB User’s Guide
Index of minimum of magnitudesISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN
Fortran
Equivalent
INTEGER*4 FUNCTION ISAMIN (N,X,INCX)
REAL*4 X(*),TEMP,XMIN
ISAMIN = 1
IF ( N .GT. 1 ) THEN
XMIN = ABS ( X(1) )
INCXA = ABS ( INCX )
IX = 1 + INCXA
DO 10 I = 2, N
TEMP = ABS ( X(IX) )
IF ( TEMP .LT. XMIN ) THEN
ISAMIN = I
XMIN = TEMP
END IF
IX = IX + INCXA
10 CONTINUE
ELSE IF ( N .LT. 1 ) THEN
ISAMIN = 0
END IF
RETURN
END
ExampleLocate the smallest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDAMIN,N,INCX
REAL*8 X(20)
N = 10
INCX = 1
I = IDAMIN (N,X,INCX)
Chapter 2 Basic Vector Operations 45
ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxxCount selected vector elements
NameISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx
Count selected vector elements
PurposeGiven a real, integer, or complex vector x of length n, these subprograms count
the number of elements of the vector that satisfy a specified relationship to a
given scalar a.
The last two characters of the subprogram name specify the relation of interest
between the elements of the vector and the scalar. For real and integer
subprograms, these characters, represented by “xx” in the prototype Fortran
statements, and the corresponding function values can be:
xxFunction value
EQ
GE
GT
LE
LT
NE
#{i : x
#{i : x
#{i : x
#{i : x
#{i : x
#{i : x
For complex subprograms, these characters and corresponding function values
are:
i
i
i
i
i
i
= a}≥ a}> a}≤ a}< a}≠ a}
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4i, ISCTxx, n, incx
REAL*4a, x(lenx)
i = ISCTxx(n, x, incx, a)
INTEGER*4i, IDCTxx, n, incx
REAL*8a, x(lenx)
i = IDCTxx(n, x, incx, a)
INTEGER*4i, IICTxx, n, incx, a, x(lenx)
i = IICTxx(n, x, incx, a)
INTEGER*8i, IICTxx, n, incx, a, x(lenx)
i = IICTxx(n, x, incx, a)
INTEGER*8i, ICCTxx, n, incx
COMPLEX*8a, x(lenx)
i = ICCTxx(n, x, incx, a)
INTEGER*8i, IZCTxx, n, incx
COMPLEX*16a, x(lenx)
i = IZCTxx(n, x, incx, a)
InputnNumber of elements of vector x to be compared to a. If
n ≤ 0, the subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
aThe scalar a.
Chapter 2 Basic Vector Operations 47
ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxxCount selected vector elements
OutputiIf n ≤ 0, then i = 0. Otherwise, i is the number of
elements of x that satisfy the relationship with a
specified by the subprogram name.
Fortran
Equivalent
INTEGER*4 FUNCTION ISCTEQ (N,X,INCX,A)
REAL*4 A,X(*)
ISCTEQ = 0
INCXA = ABS ( INCX )
IX = 1
DO 10 I = 1, N
IF ( X(IX) .EQ. A ) ISCTEQ = ISCTEQ + 1
IX = IX + INCXA
10 CONTINUE
RETURN
END
ExampleCount the number of positive elements of a REAL*8 vector x, where x is a
vector 10 elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDCTGT,N,INCX
REAL*8 A,X(20)
N = 10
INCX = 1
A = 0.0D0
I = IDCTGT (N,X,INCX,A)
48 HP MLIB User’s Guide
Index of maximum element of vectorISMAX/IDMAX/IIMAX
NameISMAX/IDMAX/IIMAX
Index of maximum element of vector
PurposeGiven a real or integer vector x of length n, these subprograms determine the
index of the maximum element of the vector. Specifically, the subprograms
determine the smallest index i such that
ximax xj: j12… n,, ,=
=
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4i, ISMAX, n, incx
REAL*4x(lenx)
i = ISMAX(n, x, incx)
INTEGER*4i, IDMAX, n, incx
REAL*8x(lenx)
i = IDMAX(n, x, incx)
INTEGER*4i, IIMAX, n, incx, x(lenx)
i = IIMAX(n, x, incx)
VECLIB8:
INTEGER*8i, ISMAX, n, incx
REAL*4x(lenx)
i = ISMAX(n, x, incx)
INTEGER*8i, IDMAX, n, incx
REAL*8x(lenx)
i = IDMAX(n, x, incx)
()
INTEGER*8i, IIMAX, n, incx, x(lenx)
i = IIMAX(n, x, incx)
InputnNumber of elements of vector x to be used. If n ≤ 0, the
subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Chapter 2 Basic Vector Operations 49
ISMAX/IDMAX/IIMAXIndex of maximum element of vector
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
OutputiIf n ≤ 0, then i = 0. Otherwise, i is the index of the
maximum element of x.
Fortran
Equivalent
INTEGER*4 FUNCTION ISMAX (N,X,INCX)
REAL*4 X(*),XMAX
ISMAX = 1
IF ( N .GT. 1 ) THEN
XMAX = X(1)
INCXA = ABS ( INCX )
IX = 1 + INCXA
DO 10 I = 2, N
IF ( X(IX) .GT. XMAX ) THEN
ISMAX = I
XMAX = X(IX)
END IF
IX = IX + INCXA
10 CONTINUE
ELSE IF ( N .LT. 1 ) THEN
ISMAX = 0
END IF
RETURN
END
ExampleLocate the largest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDMAX,N,INCX
REAL*8 X(20)
N = 10
INCX = 1
I = IDMAX (N,X,INCX)
50 HP MLIB User’s Guide
Index of minimum element of vectorISMIN/IDMIN/IIMIN
NameISMIN/IDMIN/IIMIN
Index of minimum element of vector
PurposeGiven a real or integer vector x of length n, these subprograms determine the
index of minimum element of the vector. Specifically, the subprograms
determine the smallest index i such that
ximin xj: j12… n,, ,=()=
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4i, ISMIN, n, incx
REAL*4x(lenx)
i = ISMIN(n, x, incx)
INTEGER*4i, IDMIN, n, incx
REAL*8x(lenx)
i = IDMIN(n, x, incx)
INTEGER*4i, IIMIN, n, incx, x(lenx)
i = IIMIN(n, x, incx)
VECLIB8:
INTEGER*8i, ISMIN, n, incx
REAL*4x(lenx)
i = ISMIN(n, x, incx)
INTEGER*8i, ISMIN, n, incx
REAL*8x(lenx)
i = ISMIN(n, x, incx)
INTEGER*8i, IIMIN, n, incx, x(lenx)
i = IIMIN(n, x, incx)
InputnNumber of elements of vector x to be used. If n ≤ 0, the
subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Chapter 2 Basic Vector Operations 51
ISMIN/IDMIN/IIMINIndex of minimum element of vector
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
OutputiIf n ≤ 0, then i = 0. Otherwise, i is the index of the
minimum element of x.
Fortran
Equivalent
INTEGER*4 FUNCTION ISMIN (N,X,INCX)
REAL*4 X(*),XMIN
ISMIN = 1
IF ( N .GT. 1 ) THEN
XMIN = X(1)
INCXA = ABS ( INCX )
IX = 1 + INCXA
DO 10 I = 2, N
IF ( X(IX) .LT. XMIN ) THEN
ISMIN = I
XMIN = X(IX)
END IF
IX = IX + INCXA
10 CONTINUE
ELSE IF ( N .LT. 1 ) THEN
ISMIN = 0
END IF
RETURN
END
ExampleLocate the smallest element of a REAL*8 vector x, where x is a vector 10
elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDMIN,N,INCX
REAL*8 X(20)
N = 10
INCX = 1
I = IDMIN (N,X,INCX)
52 HP MLIB User’s Guide
Search vector for elementISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx
NameISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx
Search vector for element
PurposeGiven a real, integer, or complex vector x of length n, these subprograms search
sequentially through the vector for the first element xi that satisfies a specified
relationship to a given scalar a and return the index i of that element.
The last two characters of the subprogram name specify the relationship of
interest between the element of the vector and the scalar. For real and integer
subprograms, these characters, represented by “xx” in the prototype Fortran
statements, and the corresponding function values can be:
xxFunction value
EQ
GE
GT
LE
LT
NE
min{i : x
min{i : x
min{i : x
min{i : x
min{i : x
min{i : x
For complex subprograms, these characters and corresponding function values
are:
i
i
i
i
i
i
= a}≥ a}> a}≤ a}< a}≠ a}
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4i, ISSVxx, n, incx
REAL*4a, x(lenx)
i = ISSVxx(n, x, incx, a)
INTEGER*4i, IDSVxx, n, incx
REAL*8a, x(lenx)
i = IDSVxx(n, x, incx, a)
xxFunction value
EQ
NE
min{i : x
min{i : x
i
i
= a}
≠ a}
Chapter 2 Basic Vector Operations 53
ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxxSearch vector for element
INTEGER*4i, IISVxx, n, incx, a, x(lenx)
i = IISVxx(n, x, incx, a)
INTEGER*4i, ICSVxx, n, incx
COMPLEX*8a, x(lenx)
i = ICSVxx(n, x, incx, a)
INTEGER*4i, IZSVxx, n, incx
COMPLEX*16a, x(lenx)
i = IZSVxx(n, x, incx, a)
VECLIB8:
INTEGER*8i, ISSVxx, n, incx
REAL*4a, x(lenx)
i = ISSVxx(n, x, incx, a)
INTEGER*8i, IDSVxx, n, incx
REAL*8a, x(lenx)
i = IDSVxx(n, x, incx, a)
INTEGER*8i, IISVxx, n, incx, a, x(lenx)
i = IISVxx(n, x, incx, a)
INTEGER*8i, ICSVxx, n, incx
COMPLEX*8a, x(lenx)
i = ICSVxx(n, x, incx, a)
INTEGER*8i, IZSVxx, n, incx
COMPLEX*16a, x(lenx)
i = IZSVxx(n, x, incx, a)
InputnNumber of elements of vector x to be compared to a. If
n ≤ 0, the subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
aThe scalar a.
54 HP MLIB User’s Guide
Search vector for elementISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx
OutputiIf n ≤ 0 or if no element of x satisfies the relationship
with a specified by the subprogram name, then i = 0.
Otherwise, i is the index i of the first element xi of x
that satisfies the relationship with a specified by the
subprogram name. Recall that xi is stored in
x((i−1)×|incx|+1).
Fortran
Equivalent
INTEGER*4 FUNCTION IISVEQ (N, X,INCX, A)
INTEGER*4 X(*),A
IISVEQ = 0
INCXA = ABS ( INCX )
IX = 1
DO 10 I = 1, N
IF ( X(IX) .EQ. A ) THEN
IISVEQ = I
RETURN
END IF
IX = IX + INCXA
10 CONTINUE
RETURN
END
ExampleSearch for the first positive element of a REAL*8 vector x, where x is a vector
10 elements long stored in a one-dimensional array X of dimension 20.
INTEGER*4 I,IDSVGT,N,INCX
REAL*8 A,X(20)
N = 10
INCX = 1
A = 0.0D0
I = IDSVGT (N,X,INCX,A)
Chapter 2 Basic Vector Operations 55
SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAXMaximum of magnitudes
NameSAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX
Maximum of magnitudes
PurposeGiven a real or integer vector x of length n, SAMAX, DAMAX, or IAMAX
computes the l∞ norm of x, that is, the maximum of the magnitudes of the
elements of the vector
sx
∞
Given a complex vector x of length n, SCAMAX or DZAMAX computes
smax Re xi() Im xi()+:i12… n,, ,=().=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively.
The usual definition of the maximum of magnitudes of a complex vector is
tx∞maxRe xi()2Im xi()
s is computed instead of t because, with its lack of square roots, it is faster to
compute. Because, s is often an acceptable substitute for t.
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4n, incx
REAL*4s, SAMAX, x(lenx)
s = SAMAX(n, x, incx)
INTEGER*4n, incx
REAL*8s, DAMAX, x(lenx)
s = DAMAX(n, x, incx)
max xi: i12… n,, ,=().==
+{}
t ≤ s ≤2t
12⁄
2
: i12… n,, ,=
.==
56 HP MLIB User’s Guide
INTEGER*4n, incx, s, IAMAX, x(lenx)
s = IAMAX(n, x, incx)
INTEGER*4n, incx
REAL*4s, SCAMAX
COMPLEX*8x(lenx)
s = SCAMAX(n, x, incx)
Maximum of magnitudesSAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX
InputnNumber of elements of vector x to be used. If n ≤ 0, the
subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
OutputsIf n ≤ 0, then s = 0. Otherwise, s is the maximum of the
magnitudes of the elements of x.
Chapter 2 Basic Vector Operations 57
SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAXMaximum of magnitudes
Fortran
Equivalent
REAL*4 FUNCTION SAMAX (N,X,INCX)
REAL*4 X(*)
SAMAX = 0.0
INCXA = ABS ( INCX )
IX = 1
DO 10 I = 1, N
SAMAX = MAX ( SAMAX , ABS ( X(IX) ) )
IX = IX + INCXA
10 CONTINUE
RETURN
END
ExampleCompute the maximum of the magnitudes of the elements of a REAL*8 vector
x, where x is a vector 10 elements long stored in a one-dimensional array X of
dimension 20.
INTEGER*4 N,INCX
REAL*8 S,DAMAX,X(20)
N = 10
INCX = 1
S = DAMAX (N,X,INCX)
58 HP MLIB User’s Guide
Minimum of magnitudesSAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN
NameSAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN
Minimum of magnitudes
PurposeGiven a real or integer vector x of length n, SAMIN, DAMIN, or IAMIN
computes the minimum of the magnitudes of the elements of the vector
smin xi: i12… n,, ,=().=
Given a complex vector x of length n, SCAMIN or DZAMIN computes
smin Re xi() Im xi(): i+12… n,, ,=()=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively.
The usual definition of the minimum of magnitudes of a complex vector is
12⁄
tmin Re xi()2Im xi()
+{}
s is computed instead of t because, with its lack of square roots, it is faster to
compute. Because, s is often an acceptable substitute for t.
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
2
: i12… n,, ,=().=
t ≤ s ≤2t
UsageVECLIB:
INTEGER*4n, incx
REAL*4s, SAMIN, x(lenx)
s = SAMIN(n, x, incx)
INTEGER*4n, incx
REAL*8s, DAMIN, x(lenx)
s = DAMIN(n, x, incx)
INTEGER*4n, incx, s, IAMIN, x(lenx)
s = IAMIN(n, x, incx)
Minimum of magnitudesSAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN
InputnNumber of elements of vector x to be used. If n ≤ 0, the
subprograms do not reference x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
OutputsIf n ≤ 0, then s = ∞, the largest representable machine
number. Otherwise, s is the minimum of the
magnitudes of the elements of x.
Fortran
Equivalent
REAL*4 FUNCTION SAMIN (N,X,INCX)
REAL*4 X(*)
SAMIN = ∞
INCXA = ABS ( INCX )
IX = 1
DO 10 I = 1, N
SAMIN = MIN ( SAMIN , ABS ( X(IX) ) )
IX = IX + INCXA
10 CONTINUE
RETURN
END
ExampleCompute the minimum of the magnitudes of the elements of a REAL*8 vector
x, where x is a vector 10 elements long stored in a one-dimensional array X of
dimension 20.
INTEGER*4 N,INCX
REAL*8 S,DAMIN,X(20)
N = 10
INCX = 1
S = DAMIN (N,X,INCX)
Chapter 2 Basic Vector Operations 61
SASUM/DASUM/IASUM/SCASUM/DZASUMSum of magnitudes
NameSASUM/DASUM/IASUM/SCASUM/DZASUM
Sum of magnitudes
PurposeGiven a real or integer vector x of length n, SASUM, DASUM, or IASUM
computes the l1 norm of x, that is, the sum of magnitudes of the elements of the
vector
n
sx
==
1
Given a complex vector x of length n, SCASUM or DZASUM computes
n
s
=
∑
i 1=
where Re(xi) and Im(xi) are the real and imaginary parts of xi, respectively.
The usual definition of sum of magnitudes of a complex vector is
∑
i 1=
Re x
()
x
i
+
Im x
()
i
i
tX
==
2
s is computed instead of t because, with its lack of square roots, it is faster to
compute. Because, s is often an acceptable substitute for t.
The vector can be stored in a one-dimensional array or in either a row or a
column of a two-dimensional array.
UsageVECLIB:
INTEGER*4n, incx
REAL*4s, SASUM, x(lenx)
s = SASUM(n, x, incx)
INTEGER*4n, incx
REAL*8s, DASUM, x(lenx)
s = DASUM(n, x, incx)
INTEGER*4n, incx, s, IASUM, x(lenx)
s = IASUM(n, x, incx)
InputnNumber of elements of vector x to be used in the sum of
magnitudes. If n ≤ 0, the subprograms do not reference
x.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x. x is stored forward in array x
with increment |incx|; that is, xi is stored in
x((i−1)×|incx|+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Chapter 2 Basic Vector Operations 63
SASUM/DASUM/IASUM/SCASUM/DZASUMSum of magnitudes
Indexing Conventions” in the introduction to this
chapter.
OutputsIf n ≤ 0, then s = 0. Otherwise, s is the sum of
magnitudes of the elements of x.
Fortran
Equivalent
REAL*4 FUNCTION SASUM (N, X,INCX)
REAL*4 X(*)
SASUM = 0.0
IF ( N .LE. 0 ) RETURN
IX = 1
INCXA = ABS ( INCX )
DO 10 I = 1, N
SASUM = SASUM + ABS ( X(IX) )
IX = IX + INCXA
10 CONTINUE
RETURN
END
ExampleCompute the sum of magnitudes of the elements of a REAL*8 vector x, where x
is a vector 10 elements long stored in a one-dimensional array X of dimension
20.
INTEGER*4 N,INCX
REAL*8 S,DASUM,X(20)
N = 10
INCX = 1
S = DASUM (N,X,INCX)
InputnNumber of elements of vectors x and y to be used in the
elementary vector operation. If n ≤ 0, the subprograms
do not reference x or y.
aThe scalar a.
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x. x is used in conjugated form by CAXPYC
and ZAXPYC and in unconjugated form by the other
subprograms.
incxIncrement for the array x:
incx ≥ 0x is stored forward in array x; that is,
xi is stored in x((i−1)×incx+1).
66 HP MLIB User’s Guide
incx < 0x is stored backward in array x; that
is, xi is stored in x((i−n)×incx+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
yArray of length leny = (n−1)×|incy|+1 containing the
Use incy = 1 if the vector y is stored contiguously in
array y; that is, if yi is stored in y(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
OutputyIf n ≤ 0 or a = 0, then y is unchanged. Otherwise, ax+y
overwrites the input.
NotesIf incx = 0, then xi= x(1) for all i.
The result is unspecified if incy = 0 or if x and y overlap such that any element
of x shares a memory location with any element of y.
Fortran
Equivalent
SUBROUTINE SAXPY (N, A, X,INCX, Y,INCY)
REAL*4 X(*),Y(*)
IF ( N .LE. 0 ) RETURN
IF ( A .EQ. 0.0 ) RETURN
IX = 1
IY = 1
IF ( INCX .LT. 0 ) IX = 1 - (N-1) * INCX
IF ( INCY .LT. 0 ) IY = 1 - (N-1) * INCY
DO 10 I = 1, N
Y(IY) = A * X(IX) + Y(IY)
IX = IX + INCX
IY = IY + INCY
10 CONTINUE
RETURN
END
Example 1Compute the REAL*8 elementary vector operation
y2x←y,+
where x and y are vectors 10 elements long stored in one-dimensional arrays X
and Y of dimension 20.
INTEGER*4 N,INCX,INCY
REAL*8 A,X(20),Y(20)
N = 10
A = 2.0D0
INCX = 1
INCY = 1
CALL DAXPY (N,A,X,INCX,Y,INCY)
Example 2Subtract 3 times the 4th row of a 10-by-10 matrix from the 5th row. The matrix
is stored in a two-dimensional array B of dimension 20-by-21.
INTEGER*4 N
REAL*8 A,B(20,21)
N = 10
A = -3.0D0
CALL DAXPY (N,A,B(4,1),20,B(5,1),20)
PurposeGiven a real or complex scalar a, a sparse vector x stored in compact form via a
set of indices and a dense vector y stored in full storage form, these
subprograms perform the elementary vector operation
yax←y.+
More precisely, let x be a sparse n-vector with m ≤ n interesting (usually
nonzero) elements, and let {k1, k2, ..., km} be the indices of these elements. All
uninteresting elements of x are assumed to be zero. Let y be an ordinary
n-vector. If x is represented by arrays x and indx such that indx(i)=ki and
OutputyIf m ≤ 0 or a = 0, then y is unchanged. Otherwise, ax+y
overwrites the input. Only the elements of y whose
indices are included in indx are changed.
NotesThe result is unspecified if any element of indx is out of range, if any two
elements of indx have the same value, or if x, indx, and y overlap such that any
element of x or any index shares a memory location with any element of y.
Fortran
Equivalent
SUBROUTINE SAXPYI (M, A, X,INDX, Y)
REAL*4 A,X(*),Y(*)
INTEGER*4 INDX(*)
IF ( M .LE. 0 ) RETURN
IF ( A .EQ. 0.0 ) RETURN
DO 10 I = 1, M
Y(INDX(I)) = A * X(I) + Y(INDX(I))
10 CONTINUE
RETURN
END
ExampleCompute the REAL*8 elementary vector operation
y2x←y,+
where x is a sparse vector with interesting elements x1, x4, x5, and x9 stored in
one-dimensional array X, and y is stored in a one-dimensional array Y of
dimension 20.
INTEGER*4 M,INDX(4)
REAL*8 A,X(4),Y(20)
DATA INDX / 1, 4, 5, 9 /
M = 4
A = 2.0D0
CALL DAXPYI (M,A,X,INDX,Y)
70 HP MLIB User’s Guide
Two sided vector clipSCLIP/DCLIP/ICLIP
NameSCLIP/DCLIP/ICLIP
Two sided vector clip
PurposeGiven scalars a and b and a vector x of length n, these subprograms form the
vector y by the clip operation
aifxia≤
x
=
y
i
bifbx
The vectors can be stored in one-dimensional arrays or in either rows or
columns of two-dimensional arrays. Indexing through the arrays can be either
forward or backward.
INTEGER*8n, incx, incy, a, b, x(lenx), y(leny)
CALL ICLIP(n, a, b, x, incx, y, incy)
InputnNumber of elements of vectors x and y to be used. If
n 0≤
, the subprograms do not reference x or y.
aThe scalar a.
bThe scalar b.
Chapter 2 Basic Vector Operations 71
SCLIP/DCLIP/ICLIPTwo sided vector clip
xArray of length lenx = (n−1)×|incx|+1 containing the
n-vector x.
incxIncrement for the array x:
incx ≥ 0x is stored forward in array x; that is,
xi is stored in x((i−1)×incx+1).
incx < 0x is stored backward in array x; that
is, xi is stored in x((i−n)×incx+1).
Use incx = 1 if the vector x is stored contiguously in
array x; that is, if xi is stored in x(i). Refer to “BLAS
Indexing Conventions” in the introduction to this
chapter.
72 HP MLIB User’s Guide
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.