License | Authors | Download |
---|---|---|

Simplified BSD | F. Corsetti, A. Lazzaro (DBCSR support), I. Lebedeva (application in linear and cubic-scaling solvers), Y. Pouillon (packaging) | Gitlab |

MatrixSwitch is a module which acts as an intermediary interface layer between high-level routines for physics-related algorithms and low-level routines dealing with matrix storage and manipulation. This allows the high-level routines to be written in a way which is physically transparent, and enables them to switch seamlessly between different software implementations of the matrix operations.

Many computational physics algorithms (e.g., iterative Kohn-Sham eigensolvers) are based on sequences of matrix operations. These are typically described using standard mathematical notation, which does not depend on the specifics of the computational implementation, i.e., how the matrices are stored and manipulated in the code. Many different storage formats exist, depending also on the architecture (serial/parallel) and the type of matrix (dense/sparse), as well as many libraries that can perform matrix operations for particular storage formats. Libraries can be more or less transparent in the way the matrices are handled: some hide the details of the storage scheme in a derived type, while others require auxiliary data to be carried around by the user. Generally, the matrix operations themselves are contained within subroutines that are simple to call. However, the interface is specific to each library.

The aim of MatrixSwitch is to provide a simple, unified interface to
allow users to code physics-related algorithms with a minimal amount of
knowledge of the underlying implementation of the matrix algebra, and,
crucially, to be able to *switch* between different implementations
without modifying their code. Therefore, if a new matrix algebra library
is released which is particularly suited to a new architecture, this
simply has to be interfaced within MatrixSwitch to start being used.

The emphasis for this project is on implementing physically relevant operations in as simple a way as possible. Therefore, the focus will be on the core set of functionalities typically needed for physics (particularly electronic structure), and on streamlining the interface to make programs easy to read, understand, and code in terms of the mathematical formulation of the algorithm.

The basic routines can be installed with only a Fortran compiler. This
will allow you to use the `s?den`

format and `ref`

operations.

Optional requirements are:

- BLAS + LAPACK for
`lap`

operations with the`s?den`

format - MPI + BLAS + LAPACK + ScaLAPACK for the
`p?dbc`

format - MPI + BLAS + LAPACK + DBCSR for the
`pdcsr`

format

- Enter the
`src`

directory. - Copy
`make.inc.example`

to`make.inc`

and modify it to suit your needs. Available options for`FPPFLAGS`

are:`-DHAVE_MPI`

: enable MPI parallel routines`-DHAVE_LAPACK`

: enable LAPACK routines`-DHAVE_SCALAPACK`

: enable ScaLAPACK routines (requires`-DHAVE_MPI`

)`-DHAVE_PSPBLAS`

: enable PSPBLAS routines`-DHAVE_DBCSR`

: enable DBCSR routines (requires`-DHAVE_MPI`

)`-DCONV`

: enable automatic conversion of scalar types (real/complex) to agree with matrix definitions (real/complex). Note that conversions from complex to real will simply discard the imaginary part.

- Type
`make`

.

The `examples`

directory contains a number of small programs that make
use of MatrixSwitch. These can be useful both for testing the
installation and for learning how to use the library. To compile them:

- Enter the
`examples`

directory. - Copy
`make.inc.example`

to`make.inc`

and modify it to suit your needs. Be aware that`make.inc`

in the`src`

directory will also be used. - Type
`make`

.

Each example contains a header explaining what the program does and providing sample output to compare against.

`MatrixSwitch`

is a module that you can `use`

in Fortran routines. Note
that both the `.a`

and `.mod`

files need to be available. An example
compilation command for a code using MatrixSwitch is:
`gfortran MyCode.f90 /path/to/MatrixSwitch-x.y.z/src/MatrixSwitch.a -I/path/to/MatrixSwitch-x.y.z/src/ -llapack -lblas`

The best way of learning how to use MatrixSwitch is by example. See the
examples in the `examples`

directory for this. In a typical code, there
are four steps that are followed:

- Setup the matrices:

Matrices need to first be declared with the MatrixSwitch public type`matrix`

. There are then two roots to initialising a matrix. The easiest is to do so from scratch, by calling`m_allocate`

. However, if the matrix data already exists (e.g., if it comes from a different section of the code) and is in the correct format, it can simply be registered into the TYPE(MATRIX) variable, by calling the appropriate subroutine; for example, two-dimensional arrays can be registered as`s?den`

matrices by calling`m_register_sden`

. In this case, the data is not copied; rather, elements of the TYPE(MATRIX) variable are set to point to the existing array(s). Note that some storage formats may require additional setup operations (detailed below). - Fill the matrices:

Matrix element values can be set by calling`m_set`

and`m_set_element`

. - Perform some matrix operations:

See the list of available matrix operations. - Destroy the matrices:

Matrices can be deallocated by calling`m_deallocate`

. - Read and write matrices:

Matrices can be written to a file by calling`m_write`

and read from a file by calling`m_read`

(at the moment available only for pddbc and pdcsr matrices).

The storage formats that can currently be used with MatrixSwitch are
listed below. A `?`

in a format name stands for either `d`

(real matrix)
or `z`

(complex matrix).

`s?den`

: simple dense (serial distribution)This is the most basic type of storage: a two-dimensional array storing
the matrix elements on a single core. It can be used to perform
operations with `ref`

or `lap`

.

Requirements:

- External libraries: none
- Usage: no special routines need to be called to use this format

Storage details within `type matrix`

:

`dval`

/`zval`

, dimension (`dim1`

,`dim2`

): stores the matrix elements (real/complex matrix)

`p?dbc`

: dense block cyclic (parallel distribution)This format follows the standard used by ScaLAPACK for parallel
distribution of a dense matrix (see this
page
for some introduction). This makes it is extremely easy to use
MatrixSwitch in a small portion of a larger code which already uses
ScaLAPACK, as it allows for matrices to be passed in and out of the
MatrixSwitch section (see `ms_lap_icontxt`

, `ms_scalapack_setup`

,
`m_register_pdbc`

).

This format can be used to perform operations with `lap`

.

Requirements:

- External libraries: MPI + BLAS + LAPACK + ScaLAPACK
- Usage:
`ms_scalapack_setup`

needs to be called at the start of the code

Storage details within `type matrix`

:

`iaux1`

, dimension (`9`

): stores the BLACS array descriptor`iaux2`

, dimension (`2`

): stores the size of the local portion of the matrix`dval`

/`zval`

, dimension (`iaux2(1)`

,`iaux2(2)`

): stores the local matrix elements (real/complex matrix)

`s?coo`

: sparse coordinate list (serial distribution)Documentation coming soon.

`p?coo`

: sparse coordinate list (parallel distribution)Documentation coming soon.

`s?csc`

: compressed sparse column (serial distribution)Documentation coming soon.

`p?csc`

: compressed sparse column (parallel distribution)Documentation coming soon.

`s?csr`

: compressed sparse row (serial distribution)Documentation coming soon.

`pdcsr`

: compressed sparse row (parallel distribution)This format follows the distributed block-compressed sparse row format
as implemented in the DBCSR library.
The distribution of the blocks over the processors follows a
block-cycling distribution a la ScaLAPACK (see this
page
for some introduction). A 2D grid (MPI cartesian grid) is automatically
created by DBCSR (by means of `mpi_dims_create`

and `mpi_cart_create`

functions). Note that blocks are monolithic, i.e. it is impossible to
read/write single elements inside a block.

Requirements:

- External libraries: MPI + BLAS + LAPACK +
DBCSR. Download and install DBCSR
somewhere (use
`make install PREFIX=`

) `ms_dbcsr_setup(global MPI communicator)`

needs to be called at the start of the code- Define the number of blocks per rows and columns
- Define two Integer arrays for the definition of the block sizes per row and columns
`ms_dbcsr_finalize`

needs to be called at the end of the code

`pdrow`

: compressed sparse row for individual matrix elements (parallel distribution)This format is only used to register a matrix in the compressed sparse
row format dealing with individual matrix elements and
with rows distributed on a 1D process grid. No algebraic

operations can be performed for a matrix of this type. It can only be
converted to/from a `pdcsr`

or `pddbc`

format using the subroutine
`m_copy`

.

Storage details within `type matrix`

:

`csr_nrows`

: number of local rows`csr_nze`

: number of nonempty local matrix elements`iaux1`

, dimension (`csr_nrows`

): column indices corresponding to the start of local rows`iaux2`

, dimension (`csr_nze`

): column indices of nonempty local matrix elements`iaux3`

, dimension (`csr_nrows`

): numbers of nonempty matrix elements in each row`iaux4`

, dimension (`csr_nze`

): convertion of column indices in such a way that they become in the growing order for each row`csr_dval`

, dimension (`csr_nze`

): values of nonempty local matrix elements

A general overview of the different computational implementations of the MatrixSwitch matrix operations is given below. These implementations need not be tied to specific storage formats, and vice versa. See the next section for a more detailed description of which storage formats can be used with which implementations for a particular operation.

`ref`

: referenceThe reference implementation is coded within MatrixSwitch. It can be
used with `s?den`

matrices. It is not fast, but is useful for checking
results and does not require any external libraries.

Requirements:

- External libraries: none

`lap`

: LAPACK/ScaLAPACKThis implementation makes use of BLAS + LAPACK to operate on `s?den`

matrices, and additionally ScaLAPACK to operate on `p?dbc`

matrices. It
should be considerably faster than `ref`

, but the performance will
depend on the external libraries provided by the user.

Requirements:

- External libraries:
- Serial: BLAS + LAPACK
- Parallel: MPI + BLAS + LAPACK + ScaLAPACK

`psp`

: pspBLASDocumentation coming soon.

This section contains a comprehensive list of the allowed combinations
of storage formats and implementations of the matrix operations. There
is a separate table for each matrix operation
subroutine. The table lists
the input and output matrices required by the subroutine. Each row gives
a possible combination of storage formats that can be used when calling
it. The last column then lists the possible implementations of the
operation for the particular combination of storage formats; usually
only one implementation is available, but sometimes more than one is.
The three-character code for the implementation should be passed to the
subroutine in the `label`

variable; if `label`

is absent, the default
implementation for the storage formats provided will be called.

`mm_multiply`

`A` |
`B` |
`C` |
`label` |
---|---|---|---|

`s?den` |
`s?den` |
`s?den` |
`ref` (default) or `lap` |

`p?dbc` |
`p?dbc` |
`p?dbc` |
`lap` (default) |

`pdcsr` |
`pdcsr` |
`pdcsr` |
(ignored) |

`m_add`

`A` |
`C` |
`label` |
---|---|---|

`s?den` |
`s?den` |
`ref` (default) or `lap` (redirects to `ref` ) |

`p?dbc` |
`p?dbc` |
`lap` (default) |

`pdcsr` |
`pdcsr` |
(ignored) |

`m_trace`

`A` |
`label` |
---|---|

`s?den` |
`ref` (default) or `lap` (redirects to `ref` ) |

`p?dbc` |
`lap` (default) |

`pdcsr` |
(ignored) |

`mm_trace`

`A` |
`B` |
`label` |
---|---|---|

`sdden` |
`sdden` |
`ref` (default) or `lap` |

`szden` |
`szden` |
`ref` (default) or `lap` (redirects to `ref` ) |

`pddbc` |
`pddbc` |
`lap` (default) |

`pzdbc` |
`pzdbc` |
`ref` (default) [1] or `lap` (redirects to `ref` ) |

`pdcsr` |
`pdcsr` |
(ignored) |

[1] Note that identical parallel distributions for `A`

and `B`

are
required.

`m_scale`

`C` |
`label` |
---|---|

`s?den` |
`ref` (default) [1] `lap` (redirects to `ref` ) |

`p?dbc` |
`lap` (default - redirects to [1]) |

`pdcsr` |
(ignored) |

`m_set`

`C` |
`label` |
---|---|

`s?den` |
`ref` (default) or `lap` (redirects to `ref` ) |

`p?dbc` |
`lap` (default) |

`pdcsr` |
(ignored) |

`m_set_element`

`C` |
`label` |
---|---|

`s?den` |
`ref` (default) or `lap` (redirects to `ref` ) |

`p?dbc` |
`lap` (default) |

`pdcsr` |
(ignored) |

`m_get_element`

`C` |
`label` |
---|---|

`s?den` |
`ref` (default) or `lap` (redirects to `ref` ) |

`p?dbc` |
`lap` (default) |

`pdcsr` |
(ignored) |

- Sparse matrix formats: distributed compressed column, block sparse
- Hermitian matrices

Note that some entries are specifically of use for a particular storage format or implementation. This is marked in [red] at the beginning of the description.

`ms_lap_icontxt`

INTEGER

[`p?dbc`

] BLACS context handle used by
MatrixSwitch. This is made public to allow allocated and registered
`p?dbc`

matrices to be placed in the same context. This can be done in
two ways:

- If BLACS has already been initialised, the existing context handle
can be passed to MatrixSwitch via
`ms_scalapack_setup`

, which will then set`ms_lap_icontxt`

to the same value. Note that in this case the other variables passed to`ms_scalapack_setup`

need to be consistent with the process grid enclosed in the existing context. - If BLACS is first initialised through MatrixSwitch with
`ms_scalapack_setup`

,`ms_lap_icontxt`

can then be used as the context handle for BLACS operations outside of MatrixSwitch.

`type matrix`

This is the derived type that encapsulates all matrix storage possibilities and hides the details from the user. Typically, the elements below will never need to be accessed directly.

`str_type`

CHARACTER*3

Label identifying the storage format.`is_initialized`

LOGICAL

`T`

: Matrix has been initialized (with`m_allocate`

or one of the`m_register`

routines).

`F`

: Matrix has not been initialized.`is_serial`

LOGICAL

`T`

: Matrix is serial distributed.

`F`

: Matrix is parallel distributed.`is_real`

LOGICAL

`T`

: Matrix is real (DOUBLE PRECISION default).

`F`

: Matrix is complex (COMPLEX*16 default).`is_square`

LOGICAL

`T`

: Matrix is square.

`F`

: Matrix is non-square.`is_sparse`

LOGICAL

`T`

: Matrix is sparse.

`F`

: Matrix is dense.`iaux1_is_allocated`

LOGICAL

`T`

:`iaux1`

is directly allocated.

`F`

:`iaux1`

is a pointer.`iaux2_is_allocated`

LOGICAL

`T`

:`iaux2`

is directly allocated.

`F`

:`iaux2`

is a pointer.`iaux3_is_allocated`

LOGICAL

`T`

:`iaux3`

is directly allocated.

`F`

:`iaux3`

is a pointer.`iaux4_is_allocated`

LOGICAL

`T`

:`iaux4`

is directly allocated.

`F`

:`iaux4`

is a pointer.`dval_is_allocated`

LOGICAL

`T`

:`dval`

is directly allocated.

`F`

:`dval`

is a pointer.`csr_dval_is_allocated`

LOGICAL

`T`

:`csr_dval`

is directly allocated.

`F`

:`csr_dval`

is a pointer.`zval_is_allocated`

LOGICAL

`T`

:`zval`

is directly allocated.

`F`

:`zval`

is a pointer.`use2D`

LOGICAL

`T`

: 2D process grid is used.

`F`

: 1D process grid is used.`dim1`

INTEGER

Row dimension size of the matrix.`dim2`

INTEGER

Column dimension size of the matrix.`csr_nrows`

INTEGER

The number of local rows of the csr matrix dealing with individual matrix elements (see`pdrow`

format). The default value is 0.`csr_nze`

INTEGER

The number of nonempty local elements of the csr matrix dealing with individual matrix elements (see`pdrow`

format). The default value is 0.`blk_size1`

INTEGER

The block size for rows. The default value is 0.`blk_size2`

INTEGER

The block size for columns. The default value is 0.`iaux1`

INTEGER pointer, dimension (`:`

)

Auxiliary information for certain storage formats.`iaux2`

INTEGER pointer, dimension (`:`

)

Auxiliary information for certain storage formats.`iaux3`

INTEGER pointer, dimension (`:`

)

Auxiliary information for certain storage formats.`iaux4`

INTEGER pointer, dimension (`:`

)

Auxiliary information for certain storage formats.`dval`

DOUBLE PRECISION pointer, dimension (`:`

,`:`

)

Matrix elements for a real matrix.`csr_dval`

DOUBLE PRECISION pointer, dimension (`:`

)

Values of nonempty matrix elements for a csr matrix dealing with individual matrix elements (see`pdrow`

format).`zval`

COMPLEX*16 pointer, dimension (`:`

,`:`

)

Matrix elements for a complex matrix.`spm`

TYPE(PSP_MATRIX_SPM)

pspBLAS matrix type.`dbcsr_dist`

TYPE(DBCSR_DISTRIBUTION_TYPE)

DBCSR distribution.`dbcsr_mat`

TYPE(DBCSR_TYPE)

DBCSR matrix.

`subroutine m_allocate( m_name, dim1, dim2, label, use2D, blocksize1, blocksize2, row_sizes, col_sizes )`

Initializes a TYPE(MATRIX) variable by saving some basic information about the matrix, and allocating the necessary arrays for the requested storage format. Matrix elements are set to zero (or empty for sparse matrices). If the block sizes are not provided, the default values are used.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be allocated.`dim1`

(input) INTEGER

Row dimension size of the matrix.`dim2`

(input) INTEGER

Column dimension size of the matrix.`label`

(input, optional) CHARACTER*5

Storage format to use. See the list of available formats. Default is`sdden`

.`use2D`

(input, optional) LOGICAL

Specifies whether to use a 2D (or 1D) process grid.`blocksize1`

(input, optional) INTEGER

The block size for rows (if equal for all the blocks).`blocksize2`

(input, optional) INTEGER

The block size for columns (if equal for all the blocks).`row_sizes`

(input, optional) INTEGER, dimension (:)

Row block sizes.`col_sizes`

(input, optional) INTEGER, dimension (:)

Column block sizes.

`subroutine m_deallocate( m_name )`

Deallocates any allocated arrays in a TYPE(MATRIX) variable. For a registered matrix, the pointers are nullified.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be deallocated.

`subroutine m_register_sden( m_name, A )`

[`s?den`

] Registers pre-existing
matrix data into a TYPE(MATRIX) variable with `s?den`

format.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be allocated.`A`

(input) DOUBLE PRECISION/COMPLEX*16 array, dimension (`:`

,`:`

)

The values of the matrix elements, stored as a two-dimensional array.

subroutine ms_scalapack_setup(mpi_comm,nprow,order,bs_def,bs_list,icontxt,icontxt_1D)

`subroutine ms_scalapack_setup( mpi_comm, nprow, order, bs_def, bs_list, icontxt, icontxt_1D )`

[`p?dbc`

] Sets up everything needed to
use `p?dbc`

matrices with ScaLAPACK. Has to be called once at the start
of the code.

`mpi_comm`

(input) INTEGER

The MPI communicator to use.`nprow`

(input) INTEGER

The row dimension of the process grid (has to be a divisor of the size of the group defined by`mpi_comm`

).`order`

(input) CHARACTER*1

Ordering of the process grid:

`c`

/`C`

: column-major ordering

`r`

/`R`

/other: row-major ordering`bs_def`

(input) INTEGER

The default block size to use when allocating`p?dbc`

matrices.`bs_list`

(input, optional) INTEGER array, dimension (`:`

)

List of exceptions to`bs_def`

to use for specific matrix dimension sizes. Has to be formatted as (`dim_1`

,`bs_1`

,`dim_2`

,`bs_2`

,etc.), where`dim_x`

is the matrix dimension size, and`bs_x`

is the corresponding block size to use for it.`icontxt`

(input, optional) INTEGER

BLACS context handle, if already initialized (see`ms_lap_icontxt`

).`icontxt_1D`

(input, optional) INTEGER

BLACS context handle for a 1D process grid, if already initialized (see`ms_lap_icontxt_1D`

).

`subroutine m_register_pdbc( m_name, A, desc )`

[`p?dbc`

] Registers pre-existing
matrix data into a TYPE(MATRIX) variable with `p?dbc`

format.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be allocated.`A`

(input) DOUBLE PRECISION/COMPLEX*16 array, dimension (`:`

,`:`

)

The values of the local matrix elements, stored as a two-dimensional array.`desc`

(input) INTEGER array, dimension (`9`

)

BLACS array descriptor.

`subroutine ms_dbcsr_setup( mpi_comm, bs_def, use2D )`

[`pdcsr`

] Sets up everything needed to
use `pdcsr`

matrices with DBCSR. Has to be called once at the start of
the code.

`mpi_comm`

(input) INTEGER

MPI communicator to use.`bs_def`

(input) INTEGER

The default block size to use when allocating`pdcsr`

matrices.`use2D`

(input, optional) LOGICAL

Specifies whether to use a 2D (or a 1D) process grid by default.

`subroutine ms_dbcsr_finalize( )`

[`pdcsr`

] Finalizes the use of
the DBCSR library. Has to be called once at the end of the code.

`subroutine m_register_pdrow( m_name, dim1, dim2, nrows_loc, id_rows, id_cols, nze_row, val, ind_ordered, order, blk_size )`

[`pdrow`

] Registers pre-existing
csr matrix data for individual matrix elements into a TYPE(MATRIX) variable
with `pdrow`

format. Passes the pointers to the arrays of the csr matrix
and the information on the dimensions and block size, etc. to MatrixSwitch.
The array describing the change of indices that required to organize the column
indices for each row in the growing order is prepared.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be allocated.`dim1`

(input) INTEGER

The total number of rows in the matrix.`dim2`

(input) INTEGER

The total number of columns in the matrix.`nrows_loc`

(input) INTEGER

The number of local rows.`id_rows`

(input) INTEGER, dimension(`:`

)

The 1D array of indices corresponding to start of each local row (in`id_cols`

and`val`

).`id_cols`

(input) INTEGER, dimension(`:`

)

The 1D array of column indices for local rows.`nze_row`

(input) INTEGER, dimension(`:`

)

The 1D array with the number of nonzero (nonempty) elements for each local row.`val`

(input) DOUBLE PRECISION, dimension (`:`

)

The 1D array of values of the matrix elements for local rows`ind_ordered`

(input/output, optional) INTEGER, dimension(`:`

)

The 1D array of indices ordered in such a way that column indices are in the growing order for each row.`order`

(input, optional) LOGICAL

Specifies whether to order the array of indices.`blk_size`

(input, optional) INTEGER

The block size for rows.

`subroutine mm_multiply( A, opA, B, opB, C, alpha, beta, label, keep_sparsity )`

Performs the operation:

$\mathbf{C} \leftarrow \alpha \tilde{\mathbf{A}} \tilde{\mathbf{B}} + \beta \mathbf{C}$, where $\tilde{\mathbf{M}} = \begin{cases} \mathbf{M} \\ \mathbf{M}^\mathrm{T} \\ \mathbf{M}^\mathrm{H} \end{cases}$

`A`

(input) TYPE(MATRIX)

Matrix $\mathbf{A}$. Note that the definition of the matrix (real/complex) needs to be the same as for the other matrices.`opA`

(input) CHARACTER*1

Form of $\tilde{\mathbf{A}}$:

`n`

/`N`

: $\mathbf{A}$

`t`

/`T`

: $\mathbf{A}^\mathrm{T}$

`c`

/`C`

: $\mathbf{A}^\mathrm{H}$ (equivalent to $\mathbf{A}^\mathrm{T}$ for a real matrix)`B`

(input) TYPE(MATRIX)

Matrix $\mathbf{B}$. Note that the definition of the matrix (real/complex) needs to be the same as for the other matrices.`opB`

(input) CHARACTER*1

Form of $\tilde{\mathbf{B}}$:

`n`

/`N`

: $\mathbf{B}$

`t`

/`T`

: $\mathbf{B}^\mathrm{T}$

`c`

/`C`

: $\mathbf{B}^\mathrm{H}$ (equivalent to $\mathbf{B}^\mathrm{T}$ for a real matrix)`C`

(input/output) TYPE(MATRIX)

Matrix $\mathbf{C}$. Note that the definition of the matrix (real/complex) needs to be the same as for the other matrices.`alpha`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\alpha$. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrices (real/complex); otherwise, it only has to match the type of`beta`

, and will be automatically converted to match the matrices.`beta`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\beta$. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrices (real/complex); otherwise, it only has to match the type of`alpha`

, and will be automatically converted to match the matrices.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.`keep_sparsity`

(input, optional) LOGICAL

Specifies whether to maintain the sparsity of matrix $\mathbf{C}$.

`subroutine m_add ( A, opA, C, alpha, beta, label )`

Performs the operation:

$\mathbf{C} \leftarrow \alpha \tilde{\mathbf{A}} + \beta \mathbf{C}$, where $\tilde{\mathbf{M}} = \begin{cases} \mathbf{M} \\ \mathbf{M}^\mathrm{T} \\ \mathbf{M}^\mathrm{H} \end{cases}$

`A`

(input) TYPE(MATRIX)

Matrix $\mathbf{A}$. Note that the definition of the matrix (real/complex) needs to be the same as for the other matrix.`opA`

(input) CHARACTER*1

Form of $\tilde{\mathbf{A}}$:

`n`

/`N`

: $\mathbf{A}$

`t`

/`T`

: $\mathbf{A}^\mathrm{T}$

`c`

/`C`

: $\mathbf{A}^\mathrm{H}$ (equivalent to $\mathbf{A}^\mathrm{T}$ for a real matrix)`C`

(input/output) TYPE(MATRIX)

Matrix $\mathbf{C}$. Note that the definition of the matrix (real/complex) needs to be the same as for the other matrix.`alpha`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\alpha$. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrices (real/complex); otherwise, it only has to match the type of`beta`

, and will be automatically converted to match the matrices.`beta`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\beta$. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrices (real/complex); otherwise, it only has to match the type of`alpha`

, and will be automatically converted to match the matrices.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.

`subroutine m_trace( A, alpha, label )`

Performs the operation:

$\alpha \leftarrow \operatorname{tr} \left ( \mathbf{A} \right )$

`A`

(input) TYPE(MATRIX)

Matrix $\mathbf{A}$.`alpha`

(output) DOUBLE PRECISION/COMPLEX*16

Scalar $\alpha$. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrix (real/complex); otherwise, it will be automatically converted to match it.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.

`subroutine mm_trace( A, B, alpha, label )`

Performs the operation:

$\alpha \leftarrow \operatorname{tr} \left ( \mathbf{A}^\mathrm{H} \mathbf{B} \right ) \equiv \operatorname{tr} \left ( \mathbf{B} \mathbf{A}^\mathrm{H} \right )$

`A`

(input) TYPE(MATRIX)

Matrix $\mathbf{A}$. Note that the definition of the matrix (real/complex) needs to be the same as for the other matrix.`B`

(input) TYPE(MATRIX)

Matrix $\mathbf{B}$. Note that the definition of the matrix (real/complex) needs to be the same as for the other matrix.`alpha`

(output) DOUBLE PRECISION/COMPLEX*16

Scalar $\alpha$. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrices (real/complex); otherwise, it will be automatically converted to match them.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.

`subroutine m_scale ( C, beta, label )`

Performs the operation:

$\mathbf{C} \leftarrow \beta \mathbf{C}$

`C`

(input/output) TYPE(MATRIX)

Matrix $\mathbf{C}$.`beta`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\beta$. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrix (real/complex); otherwise, it will be automatically converted to match it.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.

`subroutine m_set( C, seC, alpha, beta, label )`

Performs the operation: $\left [ \mathbf{C} \right ]_{i,j} \leftarrow \begin{cases} \alpha, & i \ne j \\ \beta, & i = j \end{cases}$ for either all matrix elements, or only the lower/upper triangle (generalised to elements below/above the diagonal for rectangular matrices)

`C`

(input/output) TYPE(MATRIX)

Matrix $\mathbf{C}$ to be set.`seC`

(input) CHARACTER*1

Form of the operation:

`l`

/`L`

: lower triangle (only for dense matrices)

`u`

/`U`

: upper triangle (only for dense matrices)

other: complete matrix`alpha`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\alpha$, the value of nondiagonal elements. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrix (real/complex); otherwise, it only has to match the type of`beta`

, and will be automatically converted to match the matrix.`beta`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\beta$, the value of diagonal elements. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrix (real/complex); otherwise, it only has to match the type of`alpha`

, and will be automatically converted to match the matrix.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.

`subroutine m_set_element( C, i, j, alpha, beta, label )`

Performs the operation:

$\left [ \mathbf{C} \right ]_{i,j} \leftarrow \alpha + \beta \left [ \mathbf{C} \right ]_{i,j}$

`C`

(input/output) TYPE(MATRIX)

Matrix $\mathbf{C}$ to which the element (block for`pdcsr`

matrices) is set.`i`

(input) INTEGER

The row index of the element for dense matrices or of the block for`pdcsr`

matrices.`j`

(input) INTEGER

The column index of the element for dense matrices or of the block for`pdcsr`

matrices.`alpha`

(input) DOUBLE PRECISION/COMPLEX*16 ([`pdcsr`

] DOUBLE PRECISION, dimension (`:, :`

))

Scalar $\alpha$, the value of the element, for dense matrices or a 2D block for`pdcsr`

matrices. For dense matrices, if the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrix (real/complex); otherwise, it only has to match the type of`beta`

, and will be automatically converted to match the matrix.`beta`

(input) DOUBLE PRECISION/COMPLEX*16

Scalar $\beta$ for the operation described above. If the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrix (real/complex); otherwise, it only has to match the type of`alpha`

, and will be automatically converted to match the matrix.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.

`subroutine m_get_element( C, i, j, alpha, found, label )`

Performs the operation:

$\alpha \leftarrow \left [ \mathbf{C} \right ]_{i,j}$

`C`

(input) TYPE(MATRIX)

Matrix $\mathbf{C}$ considered.`i`

(input) INTEGER

The row index of the element for dense matrices or of the block for`pdcsr`

matrices.`j`

(input) INTEGER

The column index of the element for dense matrices or of the block for`pdcsr`

matrices..`alpha`

(output) DOUBLE PRECISION/COMPLEX*16 ([`pdcsr`

] DOUBLE PRECISION, dimension (`:, :`

), pointer)

Scalar $\alpha$, the value of the element, for dense matrices or a 2D block for`pdcsr`

matrices. For dense matrices, if the library is compiler without the`-DCONV`

flag, the type has to match the definition of the matrix (real/complex); otherwise, it will be automatically converted to match it. For for`pdcsr`

matrices, if the block doesn't exist in the matrix, $\alpha$ is not changed.`found`

(output, optional) LOGICAL

Returns`.True.`

if the element or block was found and the values are retrieved, otherwise it is`.False.`

.`label`

(input, optional) CHARACTER*3

Implementation of the operation to use. See the list of available implementations.

`subroutine m_reserve_blocks( C, rows, cols )`

[`pdcsr`

] Reserves nonempty blocks
of a `pdcsr`

matrix using arrays of their row and column indices.
Required to use before setting the blocks one by one to achieve
linear scaling.

`C`

(input/output) TYPE(MATRIX)

`pdcsr`

matrix $\mathbf{C}$.`rows`

(input) INTEGER, dimension (`:`

)

The array of row indices of nonempty blocks.`cols`

(input) INTEGER, dimension (`:`

)

The array of row indices of nonempty blocks.

`subroutine m_occupation( C, occ )`

[`pdcsr`

] Computes the occupation
of a `pdcsr`

matrix, i.e. the fraction of nonempty blocks.

`C`

(input/output) TYPE(MATRIX)

`pdcsr`

matrix $\mathbf{C}$.`occ`

(output) DOUBLE PRECISION

The occupation computed.

`subroutine m_copy( m_name, A, label, threshold, threshold_is_soft, m_sp )`

Copies the data from matrix `A`

to `m_name`

.
If `m_name`

is not initialized and the new storage format is
the same or not provided, an exact copy of matrix `A`

is created. If matrix `m_name`

is initialized, only the values of
the matrix elements from `A`

are copied. If `m_name`

is
an allocated `pdcsr`

matrix, its sparsity pattern is maintained.
If the formats of `A`

and `m_name`

are different,
the format conversion is performed. For the conversion between the
`pdcsr`

and `pdrow`

formats, an intermediate matrix with
the same distribution of rows on the 1D process grid as the `pdrow`

matrix can be provided. Its sparsity is maintained during
the conversion. Optional thresholding variables are used to increase
the matrix sparsity.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to where to copy.`A`

(input/output) TYPE(MATRIX)

The matrix to be copied.`label`

(input, optional) CHARACTER*5

Storage format to use for the matrix`m_name`

. See the list of available formats. The default is that of the matrix`A`

.`threshold`

(input, optional) DOUBLE PRECISION

Tolerance for zeroing elements. Elements with an absolute value below this threshold are omitted for sparse storage formats, and set to zero for dense storage formats. For blocks, the threshold applies to the Frobenius norm of the blocks.`threshold_is_soft`

(input, optional) DOUBLE PRECISION

Specifies whether the thresholding soft. If`.True.`

, the values above the threshold are shifted down to remove the jump discontinuity (not implemented for sparse matrices). If`.False.`

and by default, the values are not shifted.`m_sp`

(input/output, optional) TYPE(MATRIX)

The intermediate matrix distributed on the 1D process grid used for conversion from/to`pdrow`

format to/from`pdcsr`

format.

`subroutine m_convert( m_name, label, threshold, threshold_is_soft )`

This routine facilitates an in-place conversion between storage formats.
Internally it uses the `m_copy`

subroutine to produce a temporary matrix
with the new format, then overwrites the original matrix with this
information and finally deletes the temporary matrix.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be converted.`label`

(input, optional) CHARACTER*5

The new storage format to use. See the list of available formats.`threshold`

(input, optional) DOUBLE PRECISION

Tolerance for zeroing elements. Elements with an absolute value below this threshold are omitted for sparse storage formats, and set to zero for dense storage formats. For blocks, the threshold applies to the Frobenius norm of the blocks.`threshold_is_soft`

(input, optional) DOUBLE PRECISION

Specifies whether the thresholding soft. If`.True.`

, the values above the threshold are shifted down to remove the jump discontinuity (not implemented for sparse matrices). If`.False.`

and by default, the values are not shifted.

`subroutine m_write( m_name, filepath, use_dbcsrlib, nze )`

[`pddbc`

, `pdcsr`

] Writes a matrix
to the file.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be written.`filepath`

(input) CHARACTER*

The path to the file.`use_dbcsrlib`

(input, optional) LOGICAL

Specifies whether to use the DBCSR library for writing (experimental).`nze`

(input, optional) INTEGER

The number of nonempty (nonzero) elements.

`subroutine m_read( m_name, filepath, file_exist, keep_sparsity, use_dbcsrlib, nze )`

[`pddbc`

, `pdcsr`

] Reads a matrix from
a file. The new block sizes and process grids can be different.

`m_name`

(input/output) TYPE(MATRIX)

The matrix to be read.`filepath`

(input) CHARACTER*

The path to the file.`file_exist`

(output) LOGICAL

Specifies whether the file is found.`keep_sparsity`

(input, optional) LOGICAL

Whether to keep the sparsity of the input matrix.`use_dbcsrlib`

(input, optional) LOGICAL

Specifies whether to use the DBCSR library for reading (experimental).`nze`

(input, optional) INTEGER

An expected number of nonempty (nonzero) elements.