License | Authors | Download |
---|---|---|
Simplified BSD | Fabiano Corsetti, Synopsys QuantumWise, Denmark | Gitlab |
libOMM solves the Kohn-Sham equation as a generalized eigenvalue problem for a fixed Hamiltonian. It implements the orbital minimization method (OMM), which works within a density matrix formalism. The basic strategy of the OMM is to find the set of Wannier functions (WFs) describing the occupied subspace by direct unconstrained minimization of an appropriately-constructed functional. The density matrix can then be calculated from the WFs. The solver is usually employed within an outer self-consistency (SCF) cycle. Therefore, the WFs resulting from one SCF iteration can be saved and then re-used as the initial guess for the next iteration.
Features:
libOMM makes use of MatrixSwitch for handling the matrix operations. This is included in the libOMM distribution (alternatively, an external version of MatrixSwitch can be used). The basic routines can be installed with only a Fortran compiler.
Optional requirements are:
lap
operations with the s?den
formatp?dbc
formatsrc
directory.make.inc.example
to make.inc
and modify it to suit your
needs. MSLIBPATH
should point to the MatrixSwitch directory
(default in make.inc.example
is for the version included in the
distribution). MatrixSwitch should be compiled with the -DCONV
flag. Available options for FPPFLAGS
are:
-DMPI
: enable MPI parallel routines-DLAP
: enable LAPACK routines (currently necessary for
preconditioning/Cholesky factorization)-DSLAP
: enable ScaLAPACK routines (requires -DMPI
)-DNORAND
: fixed seed for the random number generator. Enable
for testing purposes.-DCBIND
: use ISO_C_BINDING for LOGICAL inputs in the wrapper
interfaces. Enable for linking to C.make
.The examples
directory contains a number of small programs that make
use of libOMM with MatrixSwitch. These can be useful both for testing
the installation and for learning how to use the library. To compile
them:
examples
directory.make.inc.example
to make.inc
and modify it to suit your
needs. Be aware that make.inc
in the src
directory will also be
used.make
.Each example contains a header explaining what the program does and providing sample output to compare against.
The algorithms and implementation of the library are described here^{5}.
The orbital minimization method (OMM) is a class of algorithms for solving the Kohn-Sham (KS) eigenproblem within a density matrix formalism. The basic strategy is to find the set of $n$ Wannier functions (WFs) describing the occupied subspace of a system by direct unconstrained minimization of an appropriately-constructed energy functional. The OMM was originally proposed in the context of linear-scaling DFT, by Mauri, Galli, and Car^{1}^{2}, and Ordejón et al.^{3}^{4}.
In this original formulation, the energy functional is given by
$\tilde{E} \left [ \mathbf{C} \right ] = 2\mathrm{Tr} \left { \left [ \mathbf{I}n + \left ( \mathbf{I}n-\mathbf{S}\mathrm{W} \right ) \right ] \mathbf{H}\mathrm{W} \right }$,
where $\mathbf{C}$ is the matrix of coefficients of the $n$ WFs in the original basis, and $\mathbf{H}\mathrm{W}$ and $\mathbf{S}\mathrm{W}$ are the Hamiltonian and overlap matrices in the basis of WFs. It can be shown that, by minimizing $\tilde{E}$ with respect to $\mathbf{C}$, we obtain $\mathbf{S}_\mathrm{W}=\mathbf{I}_n$ and $\tilde{E} \left [ \mathbf{C}_0 \right ] = E_0$, where $\mathbf{C}_0$ describes the occupied subspace, and $E_0$ is the ground-state KS (band) energy.
Although this functional has found applications as a cubic-scaling solver^{7}^{5} (available in the libOMM library), the original linear-scaling implementations found serious problems with local minima, arising from the enforcement of spatial localization for the WFs. In order to solve this, more generalized/advanced versions of the OMM have been proposed, such as that of Kim, Mauri, and Galli^{8}. A comprehensive review of these can be found here^{9}.
The best way of learning how to use libOMM is by example. See the
examples in the examples
directory for this. The basic idea is very
simple: you call omm
, providing as input the Hamiltonian and overlap
matrices and the number of occupied bands, and receiving as output the
ground-state density matrix and Kohn-Sham (band) energy. However, there
are a number of important subtleties, due to the choice of different
flavours of the OMM functional (Cholesky factorization, preconditioning,
etc.), and the fact that the subroutine is typically called many times,
looping over MD and SCF iterations, as well as spins and k points.
The first thing to notice is that all matrices in libOMM are handled by
MatrixSwitch. This allows for easy usage of
different storage formats and implementations of the matrix operations.
The matrices have to therefore be passed to omm
as type matrix
. This
is quite easy! Also note that if you are already using one of the
allowed formats (e.g., the ScaLAPACK format), you can simply
register
your matrices with MatrixSwitch without needing to copy any data. You
will need to pass to libOMM the 5-character code denoting the storage
format and the 3-character
code denoting the implementation of the matrix
operations
to use.
There are no optional arguments in the call to omm
. Therefore,
matrices which are not needed should simpy be passed without having been
allocated by MatrixSwitch (m_allocate
routine). Other variables which
are not needed will be ignored.
If you are calling libOMM from C you cannot make use of type matrix
.
In this case you can instead call one of the available
wrappers.
The most important instance of data reuse is in the WFs describing the
occupied subspace (stored in the coefficients matrix $\mathbf{C}$). The
number of conjugate gradient (CG) steps needed by libOMM will fall
drastically in the initial few SCF iterations if the final $\mathbf{C}$
from one iteration is used as the starting guess for the next one. Since
this matrix is passed in and out of omm
, the user should typically
take care not to modify it between calls. Sometimes, however, it might
be advantageous to do so, e.g., if the user can provide an extrapolation
scheme for $\mathbf{C}$, or following an explicit diagonalization call
(in this case, the matrix can be populated by the occupied eigenstates).
If there is more than one spin/k point within an SCF iteration, a
separate $\mathbf{C}$ matrix for each point should be stored.
libOMM internally allocates and uses a number of matrices. Some of these
are saved between calls in order to maximize data reuse, and, hence, to
minimize the computational effort. The data is specific to each spin/k
point, and its reuse depends on passing the same $\mathbf{C}$ from the
previous SCF iteration at the same spin/k point. For this reason, it
is important to specify the total number of spin+k points and the
current spin+k point identifier (np
and ip
, respectively) in each
call to omm
.
Due to this internal data reuse, it is also necessary to inform libOMM
when a new overlap matrix $\mathbf{S}$ is being passed in (i.e., the
first SCF iteration of each new MD step). This is specified by new_S
.
It is important to note that each spin+k point is treated independently,
so, e.g., if the calculation is spin polarized and uses the same
$\mathbf{S}$ for the up- and down-spin Hamiltonians, it is necessary to
set new_S
to T
for both calls to omm
at the start of a new MD
step.
Finally, you should always remember to free all allocated memory at the
end of your program. In order to deallocate libOMM's internal matrices,
call omm
with dealloc
set to T
. This will deallocate the data for
all spin/k points; therefore, only the very last call to omm
should
do so.
If the energy-weighted density matrix is needed (typically only at the
end of the SCF cycle), it can be calculated in a separate call to omm
following a minimization call, by setting calc_ED
to T
(the default
value should be F
). The unmodified $\mathbf{C}$ matrix from the
previous call has to be passed in, and data stored internally will also
be reused.
Note that the output density matrix from the previous call is not
needed to calculate the energy-weighted density matrix; the latter will
be returned by omm
as D_min
.
The choice of flavour of the OMM functional can have a huge impact on performance. The basic flavour should be avoided when $\mathbf{S}$ is present. Cholesky factorization is usually the most efficient, although preconditioning will be better if kinetic energy ill-conditioning is significant.
The choice is controlled by flavour
. The options are:
This is selected by setting flavour
to 0
. The basic algorithm does
not provide any tensorial correction for non-orthogonal bases, which can
lead to a very ill-conditioned problem (especially for large
system/basis sizes) requiring many CG steps.
This is selected by setting flavour
to either 1
or 2
. Cholesky
factorization reduces the generalized eigenvalue problem to standard
form; the basic algorithm is then performed with no $\mathbf{S}$. For a
localized basis, this is usually the most important form of
ill-conditioning.
The user can provide either the $\mathbf{S}$ matrix (case 1
) or its
factorization into the upper triangular matrix $\mathbf{U}$ (case 2
).
In the former case, the factorization will be performed by libOMM, and
$\mathbf{S}$ will be overwritten by $\mathbf{U}$. In both cases,
$\mathbf{H}$ will be overwritten by
$\mathbf{U}^{-\mathrm{T}}\mathbf{H}\mathbf{U}^{-1}$.
A small note for case 1
: if the calculation is spin polarized and uses
the same $\mathbf{S}$ for the up- and down-spin Hamiltonians, the call
to omm
for the first spin point will replace $\mathbf{S}$ with
$\mathbf{U}$; the call for the second spin point should then set
flavour
to 2
. (These considerations are only important when new_S
is set to T
; otherwise, $\mathbf{S}$ is not referenced at all.)
This is selected by setting flavour
to 3
. The preconditioner is
suitable for a non-orthogonal localized orbital basis, and is defined
as^{6}:
$\mathbf{P} = \left ( \mathbf{S} + \frac{1}{\tau} \mathbf{T} \right )^{-1}$,
where $\mathbf{T}$ is the kinetic energy matrix, and $\tau$ sets the scale for the kinetic energy preconditioning. If $\mathbf{T}$ is not given, the preconditioner becomes a pure tensorial correction, completely equivalent to Cholesky factorization. The choice of $\tau$ can affect the speed of convergence; reducing it will result in more aggressive kinetic energy preconditioning^{5}.
The interface makes use of the public type matrix
defined by
MatrixSwitch.
subroutine omm( m, n, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation, mpi_rank )
m
(input) INTEGERn
(input) INTEGERH
(input/output) TYPE(MATRIX)m
,m
). This must
be provided already shifted such that
$\mathbf{H} \rightarrow \mathbf{H} - \eta \mathbf{S}$. If Cholesky
factorization is performed, on output it will be overwritten by
$\mathbf{U}^{-\mathrm{T}}\mathbf{H}\mathbf{U}^{-1}$.S
(input/output) TYPE(MATRIX)m
,m
). This matrix
is optional; if it is not needed, it should be passed uninitialized.
If Cholesky factorization is performed, on output it will be
overwritten by the factorized upper triangular matrix $\mathbf{U}$.new_S
(input) LOGICALT
: $\mathbf{S}$ is new (i.e., it has changed from the previous
call to omm
). Note that this only refers to calls with the same
value of ip
.F
: $\mathbf{S}$ has not changed from the previous call to omm
.e_min
(output) DOUBLE PRECISIOND_min
(input/output) TYPE(MATRIX)m
,m
).calc_ED
(input) LOGICALT
: The energy-weighted density matrix is calculated (must follow a
minimization call) and written to D_min
, and all other output
remains unchanged. No minimization is performed.F
: The minimization is performed as usual.eta
(input) DOUBLE PRECISIONC_min
(input/output) TYPE(MATRIX)n
,m
). This is used as the initial guess for the minimization,
and on output will be overwritten by the output coefficients at the
end of the minimization. The initial guess does not need to be
provided by the user; it will be generated automatically if init_C
is set to F
.init_C
(input) LOGICALT
: The contents of C_min
has been set (or modified since the
last call to omm
) by the user.F
: The contents of C_min
has not been modified by the user.T
(input/output) TYPE(MATRIX)m
,m
), used for
calculating the preconditioner. This matrix is optional; if it is
not needed, it should be passed uninitialized.scale_T
(input) DOUBLE PRECISIONT
is
uninitialized.flavour
(input) INTEGER0
: basic, can be used with or without $\mathbf{S}$1
: Cholesky factorization, $\mathbf{S}$ provided in S
2
: Cholesky factorization, $\mathbf{U}$ provided in S
(i.e.,
factorization of $\mathbf{S}$ performed by the user)3
: preconditioning, requires $\mathbf{S}$ and, optionally,
$\mathbf{T}$\np
(input) INTEGERip
(input) INTEGER1
and np
).cg_tol
(input) DOUBLE PRECISION1.0d-9
is used.long_out
(input) LOGICALT
: Verbose output.F
: Standard output.dealloc
(input) LOGICALT
: Deallocate all internal matrices. Typically should only be
selected for the very last call to omm
.F
: Internal matrices are saved and the data is reused for the next
call to omm
.m_storage
(input) CHARACTER*5m_operation
(input) CHARACTER*3The purpose of the wrappers is to be able to call omm
without using
the MatrixSwitch
Fortran module and the type matrix
defined within
in. This makes it straightforward to call the wrappers from C. Note that
libOMM should be compiled with the -DCBIND
flag when calling from C;
this will ensure that LOGICAL inputs are compatible with _Bool types in
C.
Each wrapper provides an interface for a particular combination of a matrix storage format and an implementation of the matrix operations. These are specified by the 5- and 3-character codes in the name of the wrapper.
subroutine omm_sdden_ref( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense real matrix storage (serial
distribution) and MatrixSwitch reference matrix operations. The
interface is identical to omm
with the following exceptions (note that
m_storage
, m_operation
are no longer present):
H_vals
(input/output) DOUBLE PRECISION array, dimension (m
,m
)S_present
(input) LOGICALT
: $\mathbf{S}$ is present in the calculation.F
: $\mathbf{S}$ is not present in the calculation. Note that
S_vals
must nevertheless be provided although it will not be
referenced.S_vals
(input/output) DOUBLE PRECISION array, dimension (m
,m
)D_min_vals
(input/output) DOUBLE PRECISION array, dimension
(m
,m
)C_min_vals
(input/output) DOUBLE PRECISION array, dimension
(n
,m
)T_present
(input) LOGICALT
: $\mathbf{T}$ is present in the calculation.F
: $\mathbf{T}$ is not present in the calculation. Note that
T_vals
must nevertheless be provided although it will not be
referenced.T_vals
(input/output) DOUBLE PRECISION array, dimension (m
,m
)subroutine omm_sdden_lap( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense real matrix storage (serial
distribution) and LAPACK matrix operations. The interface is identical
to omm_sdden_ref
.
subroutine omm_szden_ref( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense complex matrix storage (serial
distribution) and MatrixSwitch reference matrix operations. The
interface is identical to omm_sdden_ref
, except for the fact that
H_vals
, S_vals
, D_min_vals
, C_min_vals
, T_vals
are COMPLEX*16
arrays.
subroutine omm_szden_lap( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )
This wrapper uses a simple dense complex matrix storage (serial
distribution) and LAPACK matrix operations. The interface is identical
to omm_sdden_ref
, except for the fact that H_vals
, S_vals
,
D_min_vals
, C_min_vals
, T_vals
are COMPLEX*16 arrays.
subroutine omm_pddbc_lap( m, n, H_dim, H_vals, desc_H, S_present, S_dim, S_vals, desc_S, new_S, e_min, D_min_dim, D_min_vals, desc_D_min, calc_ED, eta, C_min_dim, C_min_vals, desc_C_min, init_C, T_present, T_dim, T_vals, desc_T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank, mpi_size, nprow, order, bs_def, icontxt )
This wrapper uses a dense block cyclic real matrix storage (parallel
distribution) and ScaLAPACK matrix operations. The interface is
identical to omm
with the following exceptions (note that m_storage
,
m_operation
are no longer present):
H_dim
(input) INTEGER array, dimension (2
)H_vals
.H_vals
(input/output) DOUBLE PRECISION array, dimension
(H_dim(1)
,H_dim(2)
)m
,m
).desc_H
(input) INTEGER array, dimension (9
)H_vals
.S_present
(input) LOGICALT
: $\mathbf{S}$ is present in the calculation.F
: $\mathbf{S}$ is not present in the calculation. Note that
S_dim
, S_vals
, desc_S
must nevertheless be provided although
they will not be referenced (in this case, it is suggested to use a
dummy array of size (1
,1
)).S_dim
(input) INTEGER array, dimension (2
)S_vals
.S_vals
(input/output) DOUBLE PRECISION array, dimension
(S_dim(1)
,S_dim(2)
)m
,m
).desc_S
(input) INTEGER array, dimension (9
)S_vals
.D_min_dim
(input) INTEGER array, dimension (2
)D_min_vals
.D_min_vals
(input/output) DOUBLE PRECISION array, dimension
(D_min_dim(1)
,D_min_dim(2)
)m
,m
).desc_D_min
(input) INTEGER array, dimension (9
)D_min_vals
.C_min_dim
(input) INTEGER array, dimension (2
)C_min_vals
.C_min_vals
(input/output) DOUBLE PRECISION array, dimension
(C_min_dim(1)
,C_min_dim(2)
)n
,m
).desc_C_min
(input) INTEGER array, dimension (9
)C_min_vals
.T_present
(input) LOGICALT
: $\mathbf{T}$ is present in the calculation.F
: $\mathbf{T}$ is not present in the calculation. Note that
T_dim
, T_vals
, desc_T
must nevertheless be provided although
they will not be referenced (in this case, it is suggested to use a
dummy array of size (1
,1
)).T_dim
(input) INTEGER array, dimension (2
)T_vals
.T_vals
(input/output) DOUBLE PRECISION array, dimension
(T_dim(1)
,T_dim(2)
)m
,m
).desc_T
(input) INTEGER array, dimension (9
)T_vals
.mpi_size
(input) INTEGERnprow
(input) INTEGERmpi_size
).order
(input) CHARACTER*1c
/C
: column-major orderingr
/R
/other: row-major orderingbs_def
(input) INTEGERp?dbc
matrices.icontxt
(input) INTEGERsubroutine omm_pzdbc_lap( m, n, H_dim, H_vals, desc_H, S_present, S_dim, S_vals, desc_S, new_S, e_min, D_min_dim, D_min_vals, desc_D_min, calc_ED, eta, C_min_dim, C_min_vals, desc_C_min, init_C, T_present, T_dim, T_vals, desc_T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank, mpi_size, nprow, order, bs_def, icontxt )
This wrapper uses a dense block cyclic complex matrix storage (parallel
distribution) and ScaLAPACK matrix operations. The interface is
identical to omm_pddbc_lap
, except for the fact that H_vals
,
S_vals
, D_min_vals
, C_min_vals
, T_vals
are COMPLEX*16 arrays.
F. Mauri, G. Galli, and R. Car, Orbital formulation for electronic-structure calculations with linear system-size scaling, Phys. Rev. B 47, 9973 (1993). DOI: 10.1103/PhysRevB.47.9973 ↩︎
F. Mauri and G. Galli, Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling, Phys. Rev. B 50, 4316 (1994). DOI: 10.1103/PhysRevB.50.4316 ↩︎
P. Ordejón, D. A. Drabold, M. P. Grumbach, and R. M. Martin, Unconstrained minimization approach for electronic computations that scales linearly with system size, Phys. Rev. B 48, 14646 (1993). DOI: 10.1103/PhysRevB.48.14646 ↩︎
P. Ordejón, D. A. Drabold, R. M. Martin, and M. P. Grumbach, Linear system-size scaling methods for electronic-structure calculations, Phys. Rev. B 51, 1456 (1995). DOI: 10.1103/PhysRevB.51.1456 ↩︎
F. Corsetti, The orbital minimization method for electronic structure calculations with finite-range atomic basis sets, Comput. Phys. Commun. 185, 873 (2014). DOI: 10.1016/j.cpc.2013.12.008 ↩︎
C. Gan, P. Haynes, and M. Payne, Preconditioned conjugate gradient method for the sparse generalized eigenvalue problem in electronic structure calculations, Comput. Phys. Comm. 134, 33 (2001). DOI: 10.1016/S0010-4655(00)00188-0 ↩︎
B. G. Pfrommer, J. Demmel, and H. Simon, Unconstrained energy functionals for electronic structure calculations, J. Comput. Phys. 150, 287 (1999). DOI: 10.1006/jcph.1998.6181 ↩︎
J. Kim, F. Mauri, and G. Galli, Total-energy global optimizations using nonorthogonal localized orbitals, Phys. Rev. B 52, 1640 (1995). DOI: 10.1103/PhysRevB.52.1640 ↩︎
D. Bowler and T. Miyazaki, ${\mathcal O}(N)$ methods in electronic structure calculations, Rep. Prog. Phys. 75, 036503 (2012). DOI: 10.1088/0034-4885/75/3/036503 ↩︎