LibOMM
Source authors:
Fabiano Corsetti, Synopsys QuantumWise, Denmark
License: Simplified BSD (see README.md
in the source code)
Download: ECAM Gitlab
Functionalities:
Software:
libOMM solves the KohnSham equation as a generalized eigenvalue problem for a fixed Hamiltonian. It implements the orbital minimization method (OMM), which works within a density matrix formalism. The basic strategy of the OMM is to find the set of Wannier functions (WFs) describing the occupied subspace by direct unconstrained minimization of an appropriatelyconstructed functional. The density matrix can then be calculated from the WFs. The solver is usually employed within an outer selfconsistency (SCF) cycle. Therefore, the WFs resulting from one SCF iteration can be saved and then reused as the initial guess for the next iteration.
Features:
 OMM functional of Mauri, Galli, and Car^{[1]}^{[2]}, and Ordejón et al.^{[3]}^{[4]}
 Real (Γpoint) and complex (arb. k point) Hamiltonians
 Efficient reuse of information within SCF loop and for multiple spins/k points^{[5]}
 Simple (orthogonal basis) and generalized (nonorthogonal basis) eigenvalue problems
 Cholesky factorization of generalized problem
 Preconditioner for localized orbital basis^{[6]}
Contents
Installation
Prerequisites
libOMM makes use of MatrixSwitch for handling the matrix operations. This is included in the libOMM distribution (alternatively, an external version of MatrixSwitch can be used). The basic routines can be installed with only a Fortran compiler.
Optional requirements are:
 BLAS + LAPACK for
lap
operations with thes?den
format  MPI + BLAS + LAPACK + ScaLAPACK for the
p?dbc
format
Instructions
 Enter the
src
directory.  Copy
make.inc.example
tomake.inc
and modify it to suit your needs.MSLIBPATH
should point to the MatrixSwitch directory (default inmake.inc.example
is for the version included in the distribution). MatrixSwitch should be compiled with theDCONV
flag. Available options forFPPFLAGS
are:
DMPI
: enable MPI parallel routines 
DLAP
: enable LAPACK routines (currently necessary for preconditioning/Cholesky factorization) 
DSLAP
: enable ScaLAPACK routines (requiresDMPI
) 
DNORAND
: fixed seed for the random number generator. Enable for testing purposes. 
DCBIND
: use ISO_C_BINDING for LOGICAL inputs in the wrapper interfaces. Enable for linking to C.

 Type
make
.
Tests
The examples
directory contains a number of small programs that make use of libOMM with MatrixSwitch. These can be useful both for testing the installation and for learning how to use the library. To compile them:
 Enter the
examples
directory.  Copy
make.inc.example
tomake.inc
and modify it to suit your needs. Be aware thatmake.inc
in thesrc
directory will also be used.  Type
make
.
Each example contains a header explaining what the program does and providing sample output to compare against.
Publications
The algorithms and implementation of the library are described here^{[5]}.
Algorithm
The orbital minimization method (OMM) is a class of algorithms for solving the KohnSham (KS) eigenproblem within a density matrix formalism. The basic strategy is to find the set of Wannier functions (WFs) describing the occupied subspace of a system by direct unconstrained minimization of an appropriatelyconstructed energy functional. The OMM was originally proposed in the context of linearscaling DFT, by Mauri, Galli, and Car^{[1]}^{[2]}, and Ordejón et al.^{[3]}^{[4]}.
In this original formulation, the energy functional is given by
,
where is the matrix of coefficients of the WFs in the original basis, and and are the Hamiltonian and overlap matrices in the basis of WFs. It can be shown that, by minimizing with respect to , we obtain and , where describes the occupied subspace, and is the groundstate KS (band) energy.
Although this functional has found applications as a cubicscaling solver^{[7]}^{[5]} (available in the libOMM library), the original linearscaling implementations found serious problems with local minima, arising from the enforcement of spatial localization for the WFs. In order to solve this, more generalized/advanced versions of the OMM have been proposed, such as that of Kim, Mauri, and Galli^{[8]}. A comprehensive review of these can be found here^{[9]}.
Documentation
The best way of learning how to use libOMM is by example. See the examples in the examples
directory for this. The basic idea is very simple: you call omm
, providing as input the Hamiltonian and overlap matrices and the number of occupied bands, and receiving as output the groundstate density matrix and KohnSham (band) energy. However, there are a number of important subtleties, due to the choice of different flavours of the OMM functional (Cholesky factorization, preconditioning, etc.), and the fact that the subroutine is typically called many times, looping over MD and SCF iterations, as well as spins and k points.
Matrices
The first thing to notice is that all matrices in libOMM are handled by MatrixSwitch. This allows for easy usage of different storage formats and implementations of the matrix operations. The matrices have to therefore be passed to omm
as type matrix
. This is quite easy! Also note that if you are already using one of the allowed formats (e.g., the ScaLAPACK format), you can simply register your matrices with MatrixSwitch without needing to copy any data. You will need to pass to libOMM the 5character code denoting the storage format and the 3character code denoting the implementation of the matrix operations to use.
There are no optional arguments in the call to omm
. Therefore, matrices which are not needed should simpy be passed without having been allocated by MatrixSwitch (m_allocate
routine). Other variables which are not needed will be ignored.
If you are calling libOMM from C you cannot make use of type matrix
. In this case you can instead call one of the available wrappers.
Reuse of data
The most important instance of data reuse is in the WFs describing the occupied subspace (stored in the coefficients matrix ). The number of conjugate gradient (CG) steps needed by libOMM will fall drastically in the initial few SCF iterations if the final from one iteration is used as the starting guess for the next one. Since this matrix is passed in and out of omm
, the user should typically take care not to modify it between calls. Sometimes, however, it might be advantageous to do so, e.g., if the user can provide an extrapolation scheme for , or following an explicit diagonalization call (in this case, the matrix can be populated by the occupied eigenstates). If there is more than one spin/k point within an SCF iteration, a separate matrix for each point should be stored.
libOMM internally allocates and uses a number of matrices. Some of these are saved between calls in order to maximize data reuse, and, hence, to minimize the computational effort. The data is specific to each spin/k point, and its reuse depends on passing the same from the previous SCF iteration at the same spin/k point. For this reason, it is important to specify the total number of spin+k points and the current spin+k point identifier (np
and ip
, respectively) in each call to omm
.
Due to this internal data reuse, it is also necessary to inform libOMM when a new overlap matrix is being passed in (i.e., the first SCF iteration of each new MD step). This is specified by new_S
. It is important to note that each spin+k point is treated independently, so, e.g., if the calculation is spin polarized and uses the same for the up and downspin Hamiltonians, it is necessary to set new_S
to T
for both calls to omm
at the start of a new MD step.
Finally, you should always remember to free all allocated memory at the end of your program. In order to deallocate libOMM's internal matrices, call omm
with dealloc
set to T
. This will deallocate the data for all spin/k points; therefore, only the very last call to omm
should do so.
The energyweighted density matrix
If the energyweighted density matrix is needed (typically only at the end of the SCF cycle), it can be calculated in a separate call to omm
following a minimization call, by setting calc_ED
to T
(the default value should be F
). The unmodified matrix from the previous call has to be passed in, and data stored internally will also be reused.
Note that the output density matrix from the previous call is not needed to calculate the energyweighted density matrix; the latter will be returned by omm
as D_min
.
OMM flavours
The choice of flavour of the OMM functional can have a huge impact on performance. The basic flavour should be avoided when is present. Cholesky factorization is usually the most efficient, although preconditioning will be better if kinetic energy illconditioning is significant.
The choice is controlled by flavour
. The options are:
Basic
This is selected by setting flavour
to 0
. The basic algorithm does not provide any tensorial correction for nonorthogonal bases, which can lead to a very illconditioned problem (especially for large system/basis sizes) requiring many CG steps.
Cholesky factorization
This is selected by setting flavour
to either 1
or 2
. Cholesky factorization reduces the generalized eigenvalue problem to standard form; the basic algorithm is then performed with no . For a localized basis, this is usually the most important form of illconditioning.
The user can provide either the matrix (case 1
) or its factorization into the upper triangular matrix (case 2
). In the former case, the factorization will be performed by libOMM, and will be overwritten by . In both cases, will be overwritten by .
A small note for case 1
: if the calculation is spin polarized and uses the same for the up and downspin Hamiltonians, the call to omm
for the first spin point will replace with ; the call for the second spin point should then set flavour
to 2
. (These considerations are only important when new_S
is set to T
; otherwise, is not referenced at all.)
Preconditioning
This is selected by setting flavour
to 3
. The preconditioner is suitable for a nonorthogonal localized orbital basis, and is defined as^{[6]}:
,
where is the kinetic energy matrix, and sets the scale for the kinetic energy preconditioning. If is not given, the preconditioner becomes a pure tensorial correction, completely equivalent to Cholesky factorization. The choice of can affect the speed of convergence; reducing it will result in more aggressive kinetic energy preconditioning^{[5]}.
Future developments
 Sparse/dense operations for cubicscaling solver with localized orbitals
 Planewave preconditioner
 Linearscaling solver (all sparse operations, KimMauriGalli functional^{[8]}, fixing of total electron number)
Programming interface
The interface makes use of the public type matrix
defined by MatrixSwitch.
Public subroutines
subroutine omm( m, n, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation, mpi_rank )

m
(input) INTEGER
The size of the basis. 
n
(input) INTEGER
The number of occupied eigenstates. 
H
(input/output) TYPE(MATRIX)
The input Hamiltonian matrix , size (m
,m
). This must be provided already shifted such that . If Cholesky factorization is performed, on output it will be overwritten by . 
S
(input/output) TYPE(MATRIX)
The input overlap matrix , size (m
,m
). This matrix is optional; if it is not needed, it should be passed uninitialized. If Cholesky factorization is performed, on output it will be overwritten by the factorized upper triangular matrix . 
new_S
(input) LOGICALT
: is new (i.e., it has changed from the previous call toomm
). Note that this only refers to calls with the same value ofip
.F
: has not changed from the previous call toomm
. 
e_min
(output) DOUBLE PRECISION
The total energy from the OMM functional at the end of the minimization (spin degeneracy factor not included). 
D_min
(input/output) TYPE(MATRIX)
The output density matrix at the end of the minimization, size (m
,m
). 
calc_ED
(input) LOGICALT
: The energyweighted density matrix is calculated (must follow a minimization call) and written toD_min
, and all other output remains unchanged. No minimization is performed.F
: The minimization is performed as usual. 
eta
(input) DOUBLE PRECISION
The eigenspectrum shift parameter ^{[5]}. 
C_min
(input/output) TYPE(MATRIX)
The coefficients matrix for the set of WFs, size (n
,m
). This is used as the initial guess for the minimization, and on output will be overwritten by the output coefficients at the end of the minimization. The initial guess does not need to be provided by the user; it will be generated automatically ifinit_C
is set toF
. 
init_C
(input) LOGICALT
: The contents ofC_min
has been set (or modified since the last call toomm
) by the user.F
: The contents ofC_min
has not been modified by the user. 
T
(input/output) TYPE(MATRIX)
The input kinetic energy matrix, size (m
,m
), used for calculating the preconditioner. This matrix is optional; if it is not needed, it should be passed uninitialized. 
scale_T
(input) DOUBLE PRECISION
Scale of the kinetic energy preconditioning ; a value around 10 Ry is recommended^{[5]} (note that the units should match those of the operator matrices provided). Not referenced ifT
is uninitialized. 
flavour
(input) INTEGER
Flavour of the OMM functional:0
: basic, can be used with or without1
: Cholesky factorization, provided inS
2
: Cholesky factorization, provided inS
(i.e., factorization of performed by the user)3
: preconditioning, requires and, optionally,

np
(input) INTEGER
Total number of spin+k points for a single SCF iteration (i.e., the number of spins times the number of k points). 
ip
(input) INTEGER
Spin+k point identifier (must be between1
andnp
). 
cg_tol
(input) DOUBLE PRECISION
Tolerance for the minimization. The stopping criterion is for the relative energy difference between subsequent line searches to be smaller than or equal to this value. If negative, the default value of1.0d9
is used. 
long_out
(input) LOGICALT
: Verbose output.F
: Standard output. 
dealloc
(input) LOGICALT
: Deallocate all internal matrices. Typically should only be selected for the very last call toomm
.F
: Internal matrices are saved and the data is reused for the next call toomm
. 
m_storage
(input) CHARACTER*5
MatrixSwitch storage format to use for internal matrices. See the list of available formats. 
m_operation
(input) CHARACTER*3
Implementation of the MatrixSwitch operations to use. See the list of available implementations.
Wrappers/C interface
The purpose of the wrappers is to be able to call omm
without using the MatrixSwitch
Fortran module and the type matrix
defined within in. This makes it straightforward to call the wrappers from C. Note that libOMM should be compiled with the DCBIND
flag when calling from C; this will ensure that LOGICAL inputs are compatible with _Bool types in C.
Each wrapper provides an interface for a particular combination of a matrix storage format and an implementation of the matrix operations. These are specified by the 5 and 3character codes in the name of the wrapper.
subroutine omm_sdden_ref( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense real matrix storage (serial distribution) and MatrixSwitch reference matrix operations. The interface is identical to omm
with the following exceptions (note that m_storage
, m_operation
are no longer present):

H_vals
(input/output) DOUBLE PRECISION array, dimension (m
,m
)
stored as a twodimensional array. 
S_present
(input) LOGICALT
: is present in the calculation.F
: is not present in the calculation. Note thatS_vals
must nevertheless be provided although it will not be referenced. 
S_vals
(input/output) DOUBLE PRECISION array, dimension (m
,m
)
stored as a twodimensional array. 
D_min_vals
(input/output) DOUBLE PRECISION array, dimension (m
,m
)
The density matrix stored as a twodimensional array. 
C_min_vals
(input/output) DOUBLE PRECISION array, dimension (n
,m
)
stored as a twodimensional array. 
T_present
(input) LOGICALT
: is present in the calculation.F
: is not present in the calculation. Note thatT_vals
must nevertheless be provided although it will not be referenced. 
T_vals
(input/output) DOUBLE PRECISION array, dimension (m
,m
)
stored as a twodimensional array.
subroutine omm_sdden_lap( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense real matrix storage (serial distribution) and LAPACK matrix operations. The interface is identical to omm_sdden_ref
.
subroutine omm_szden_ref( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense complex matrix storage (serial distribution) and MatrixSwitch reference matrix operations. The interface is identical to omm_sdden_ref
, except for the fact that H_vals
, S_vals
, D_min_vals
, C_min_vals
, T_vals
are COMPLEX*16 arrays.
subroutine omm_szden_lap( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense complex matrix storage (serial distribution) and LAPACK matrix operations. The interface is identical to omm_sdden_ref
, except for the fact that H_vals
, S_vals
, D_min_vals
, C_min_vals
, T_vals
are COMPLEX*16 arrays.
subroutine omm_pddbc_lap( m, n, H_dim, H_vals, desc_H, S_present, S_dim, S_vals, desc_S, new_S, e_min, D_min_dim, D_min_vals, desc_D_min, calc_ED, eta, C_min_dim, C_min_vals, desc_C_min, init_C, T_present, T_dim, T_vals, desc_T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank, mpi_size, nprow, order, bs_def, icontxt )
This wrapper uses a dense block cyclic real matrix storage (parallel distribution) and ScaLAPACK matrix operations. The interface is identical to omm
with the following exceptions (note that m_storage
, m_operation
are no longer present):

H_dim
(input) INTEGER array, dimension (2
)
Dimensions ofH_vals
. 
H_vals
(input/output) DOUBLE PRECISION array, dimension (H_dim(1)
,H_dim(2)
)
Local elements of stored as a twodimensional array. The size of the global array is (m
,m
). 
desc_H
(input) INTEGER array, dimension (9
)
BLACS array descriptor forH_vals
. 
S_present
(input) LOGICALT
: is present in the calculation.F
: is not present in the calculation. Note thatS_dim
,S_vals
,desc_S
must nevertheless be provided although they will not be referenced (in this case, it is suggested to use a dummy array of size (1
,1
)). 
S_dim
(input) INTEGER array, dimension (2
)
Dimensions ofS_vals
. 
S_vals
(input/output) DOUBLE PRECISION array, dimension (S_dim(1)
,S_dim(2)
)
Local elements of stored as a twodimensional array. The size of the global array is (m
,m
). 
desc_S
(input) INTEGER array, dimension (9
)
BLACS array descriptor forS_vals
. 
D_min_dim
(input) INTEGER array, dimension (2
)
Dimensions ofD_min_vals
. 
D_min_vals
(input/output) DOUBLE PRECISION array, dimension (D_min_dim(1)
,D_min_dim(2)
)
Local elements of the density matrix stored as a twodimensional array. The size of the global array is (m
,m
). 
desc_D_min
(input) INTEGER array, dimension (9
)
BLACS array descriptor forD_min_vals
. 
C_min_dim
(input) INTEGER array, dimension (2
)
Dimensions ofC_min_vals
. 
C_min_vals
(input/output) DOUBLE PRECISION array, dimension (C_min_dim(1)
,C_min_dim(2)
)
Local elements of stored as a twodimensional array. The size of the global array is (n
,m
). 
desc_C_min
(input) INTEGER array, dimension (9
)
BLACS array descriptor forC_min_vals
. 
T_present
(input) LOGICALT
: is present in the calculation.F
: is not present in the calculation. Note thatT_dim
,T_vals
,desc_T
must nevertheless be provided although they will not be referenced (in this case, it is suggested to use a dummy array of size (1
,1
)). 
T_dim
(input) INTEGER array, dimension (2
)
Dimensions ofT_vals
. 
T_vals
(input/output) DOUBLE PRECISION array, dimension (T_dim(1)
,T_dim(2)
)
Local elements of stored as a twodimensional array. The size of the global array is (m
,m
). 
desc_T
(input) INTEGER array, dimension (9
)
BLACS array descriptor forT_vals
. 
mpi_size
(input) INTEGER
The total number of MPI processes for the BLACS process grid. 
nprow
(input) INTEGER
Row dimension of the process grid (has to be a divisor ofmpi_size
). 
order
(input) CHARACTER*1
Ordering of the process grid:c
/C
: columnmajor orderingr
/R
/other: rowmajor ordering 
bs_def
(input) INTEGER
Block size to use when allocatingp?dbc
matrices. 
icontxt
(input) INTEGER
BLACS context handle.
subroutine omm_pzdbc_lap( m, n, H_dim, H_vals, desc_H, S_present, S_dim, S_vals, desc_S, new_S, e_min, D_min_dim, D_min_vals, desc_D_min, calc_ED, eta, C_min_dim, C_min_vals, desc_C_min, init_C, T_present, T_dim, T_vals, desc_T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank, mpi_size, nprow, order, bs_def, icontxt )
This wrapper uses a dense block cyclic complex matrix storage (parallel distribution) and ScaLAPACK matrix operations. The interface is identical to omm_pddbc_lap
, except for the fact that H_vals
, S_vals
, D_min_vals
, C_min_vals
, T_vals
are COMPLEX*16 arrays.
References
 ↑ ^{1.0} ^{1.1} F. Mauri, G. Galli, and R. Car, Orbital formulation for electronicstructure calculations with linear systemsize scaling, Phys. Rev. B 47, 9973 (1993). DOI: 10.1103/PhysRevB.47.9973
 ↑ ^{2.0} ^{2.1} F. Mauri and G. Galli, Electronicstructure calculations and moleculardynamics simulations with linear systemsize scaling, Phys. Rev. B 50, 4316 (1994). DOI: 10.1103/PhysRevB.50.4316
 ↑ ^{3.0} ^{3.1} P. Ordejón, D. A. Drabold, M. P. Grumbach, and R. M. Martin, Unconstrained minimization approach for electronic computations that scales linearly with system size, Phys. Rev. B 48, 14646 (1993). DOI: 10.1103/PhysRevB.48.14646
 ↑ ^{4.0} ^{4.1} P. Ordejón, D. A. Drabold, R. M. Martin, and M. P. Grumbach, Linear systemsize scaling methods for electronicstructure calculations, Phys. Rev. B 51, 1456 (1995). DOI: 10.1103/PhysRevB.51.1456
 ↑ ^{5.0} ^{5.1} ^{5.2} ^{5.3} ^{5.4} ^{5.5} F. Corsetti, The orbital minimization method for electronic structure calculations with finiterange atomic basis sets, Comput. Phys. Commun. 185, 873 (2014). DOI: 10.1016/j.cpc.2013.12.008
 ↑ ^{6.0} ^{6.1} C. Gan, P. Haynes, and M. Payne, Preconditioned conjugate gradient method for the sparse generalized eigenvalue problem in electronic structure calculations, Comput. Phys. Comm. 134, 33 (2001). DOI: 10.1016/S00104655(00)001880
 ↑ B. G. Pfrommer, J. Demmel, and H. Simon, Unconstrained energy functionals for electronic structure calculations, J. Comput. Phys. 150, 287 (1999). DOI: 10.1006/jcph.1998.6181
 ↑ ^{8.0} ^{8.1} J. Kim, F. Mauri, and G. Galli, Totalenergy global optimizations using nonorthogonal localized orbitals, Phys. Rev. B 52, 1640 (1995). DOI: 10.1103/PhysRevB.52.1640
 ↑ D. Bowler and T. Miyazaki, methods in electronic structure calculations, Rep. Prog. Phys. 75, 036503 (2012). DOI: 10.1088/00344885/75/3/036503