License | Authors | Download |
---|---|---|
Simplified BSD | Fabiano Corsetti (Synopsys QuantumWise, Denmark), Irina Lebedeva (Simune Atomistics, Spain; linear-scaling OMM) | Gitlab |
libOMM solves the Kohn-Sham equation as a generalized eigenvalue problem for a fixed Hamiltonian. It implements the orbital minimization method (OMM), which works within a density matrix formalism. The basic strategy of the OMM is to find the set of Wannier functions (WFs) describing the occupied subspace by direct unconstrained minimization of an appropriately-constructed functional. The density matrix can then be calculated from the WFs. The solver is usually employed within an outer self-consistency (SCF) cycle. Therefore, the WFs resulting from one SCF iteration can be saved and then re-used as the initial guess for the next iteration.
Features:
libOMM makes use of MatrixSwitch for handling the matrix operations. This is included in the libOMM distribution (alternatively, an external version of MatrixSwitch can be used). The basic routines can be installed with only a Fortran compiler.
Optional requirements are:
lap
operations with the s?den
formatp?dbc
formatpdcsr
formatsrc
directory.make.inc.example
to make.inc
and modify it to suit your
needs. MSLIBPATH
should point to the MatrixSwitch directory
(default in make.inc.example
is for the version included in the
distribution). MatrixSwitch should be compiled with the -DCONV
flag. Available options for FPPFLAGS
are:
-DHAVE_MPI
: enable MPI parallel routines-DHAVE_LAPACK
: enable LAPACK routines-DHAVE_SCALAPACK
: enable ScaLAPACK routines (requires
-DHAVE_MPI
)-DHAVE_PSPBLAS
: enable PSPBLAS routines-DNORAND
: fixed seed for the random number generator. Enable
for testing purposes.-DCBIND
: use ISO_C_BINDING for LOGICAL inputs in the wrapper
interfaces. Enable for linking to C.make
.The examples
directory contains a number of small programs that make
use of libOMM with MatrixSwitch. These can be useful both for testing
the installation and for learning how to use the library. To compile
them:
examples
directory.make.inc.example
to make.inc
and modify it to suit your
needs. Be aware that make.inc
in the src
directory will also be
used.make
.Each example contains a header explaining what the program does and providing sample output to compare against.
The algorithms and implementation of the library are described here6.
The orbital minimization method (OMM) is a class of algorithms for solving the Kohn-Sham (KS) eigenproblem within a density matrix formalism. The basic strategy is to find the set of $n$ Wannier functions (WFs) describing the occupied subspace of a system by direct unconstrained minimization of an appropriately-constructed energy functional. The OMM was originally proposed in the context of linear-scaling DFT, by Mauri, Galli, and Car12, and Ordejón et al.34.
In this original formulation, the energy functional is given by
$\tilde{E} \left[ \mathbf{C} \right] = 2\mathrm{Tr} \left\{\left[ \mathbf{I}_n + \left( \mathbf{I}_n-\mathbf{S}_\mathrm{W} \right) \right] \mathbf{H}_\mathrm{W} \right\}$,
where $\mathbf{C}$ is the matrix of coefficients of the $n$ WFs in the original basis, $\mathbf{I}_n$ is the $n\times n$ identity matrix and $\mathbf{H}_\mathrm{W}$ and $\mathbf{S}_\mathrm{W}$ are the Hamiltonian and overlap matrices in the basis of WFs. It can be shown that, by minimizing $\tilde{E}$ with respect to $\mathbf{C}$, we obtain $\mathbf{S}_\mathrm{W}=\mathbf{I}_n$ and $\tilde{E} \left [ \mathbf{C}_0 \right ] = E_0$, where $\mathbf{C}_0$ describes the occupied subspace, and $E_0$ is the ground-state KS (band) energy.
The density matrix in this formalism is computed as
$\mathbf{D} = 2\mathbf{C}\left[ \mathbf{I}_n + \left( \mathbf{I}_n-\mathbf{S}_\mathrm{W} \right) \right] \mathbf{C}^\dagger$,
where $\mathbf{C}^\dagger$ is the Hermitian conjugate of the coefficient matrix. The energy-density matrix that is also needed to calculate stresses and forces is given by
$\mathbf{D}_E = 2\mathbf{C}\mathbf{H}_\mathrm{W} \mathbf{C}^\dagger$
Although the Ordejón-Mauri functional has found applications as a cubic-scaling solver86 (available in the libOMM library), the original linear-scaling implementations found serious problems with local minima, arising from the enforcement of spatial localization for the WFs. In order to solve this, more generalized/advanced versions of the OMM have been proposed, such as that of Kim, Mauri, and Galli5. A comprehensive review of these can be found here9.
The best way of learning how to use libOMM is by example. See the
examples in the examples
directory for this. The basic idea is very
simple: you call omm
, providing as input the Hamiltonian and overlap
matrices and the number of occupied bands, and receiving as output the
ground-state density matrix and Kohn-Sham (band) energy. However, there
are a number of important subtleties, due to the choice of different
flavours of the OMM functional (Cholesky factorization, preconditioning,
etc.), and the fact that the subroutine is typically called many times,
looping over MD and SCF iterations, as well as spins and k points.
The first thing to notice is that all matrices in libOMM are handled by
MatrixSwitch. This allows for easy usage of
different storage formats and implementations of the matrix operations.
The matrices have to therefore be passed to omm
as type matrix
. This
is quite easy! Also note that if you are already using one of the
allowed formats (e.g., the ScaLAPACK format), you can simply
register
your matrices with MatrixSwitch without needing to copy any data. You
will need to pass to libOMM the 5-character code denoting the storage
format and the 3-character
code denoting the implementation of the matrix
operations
to use.
There are no optional arguments in the call to omm
. Therefore,
matrices which are not needed should simpy be passed without having been
allocated by MatrixSwitch (m_allocate
routine). Other variables which
are not needed will be ignored.
If you are calling libOMM from C you cannot make use of type matrix
.
In this case you can instead call one of the available
wrapper.
The most important instance of data reuse is in the WFs describing the
occupied subspace (stored in the coefficients matrix $\mathbf{C}$). The
number of conjugate gradient (CG) steps needed by libOMM will fall
drastically in the initial few SCF iterations if the final $\mathbf{C}$
from one iteration is used as the starting guess for the next one. Since
this matrix is passed in and out of omm
, the user should typically
take care not to modify it between calls. Sometimes, however, it might
be advantageous to do so, e.g., if the user can provide an extrapolation
scheme for $\mathbf{C}$, or following an explicit diagonalization call
(in this case, the matrix can be populated by the occupied eigenstates).
If there is more than one spin/k point within an SCF iteration, a
separate $\mathbf{C}$ matrix for each point should be stored.
libOMM internally allocates and uses a number of matrices. Some of these
are saved between calls in order to maximize data reuse, and, hence, to
minimize the computational effort. The data is specific to each spin/k
point, and its reuse depends on passing the same $\mathbf{C}$ from the
previous SCF iteration at the same spin/k point. For this reason, it
is important to specify the total number of spin+k points and the
current spin+k point identifier (np
and ip
, respectively) in each
call to omm
.
Due to this internal data reuse, it is also necessary to inform libOMM
when a new overlap matrix $\mathbf{S}$ is being passed in (i.e., the
first SCF iteration of each new MD step). This is specified by new_S
.
It is important to note that each spin+k point is treated independently,
so, e.g., if the calculation is spin polarized and uses the same
$\mathbf{S}$ for the up- and down-spin Hamiltonians, it is necessary to
set new_S
to T
for both calls to omm
at the start of a new MD
step.
Finally, you should always remember to free all allocated memory at the
end of your program. In order to deallocate libOMM's internal matrices,
call omm
with dealloc
set to T
. This will deallocate the data for
all spin/k points; therefore, only the very last call to omm
should
do so.
If the energy-density matrix is needed (typically only at the
end of the SCF cycle to compute forces), it can be calculated in a
separate call to omm
following a minimization call, by setting
calc_ED
to T
(the default value should be F
). The unmodified
$\mathbf{C}$ matrix from the previous call has to be passed in, and
data stored internally will also be reused.
Note that the output density matrix from the previous call is not
needed to calculate the energy-density matrix; the latter will
be returned by omm
as D_min
.
The choice of flavour of the OMM functional can have a huge impact on performance. The basic flavour should be avoided when $\mathbf{S}$ is present. Cholesky factorization is usually the most efficient, although preconditioning will be better if kinetic energy ill-conditioning is significant.
The choice is controlled by flavour
. The options are:
This is selected by setting flavour
to 0
. The basic algorithm does
not provide any tensorial correction for non-orthogonal bases, which can
lead to a very ill-conditioned problem (especially for large
system/basis sizes) requiring many CG steps. This is the only falvour
available for linear-scaling calculations at the moment.
This is selected by setting flavour
to either 1
or 2
and applies
only for cubic-scaling calculations with dense matrices. Cholesky
factorization reduces the generalized eigenvalue problem to standard
form; the basic algorithm is then performed with no $\mathbf{S}$. For a
localized basis, this is usually the most important form of
ill-conditioning.
The user can provide either the $\mathbf{S}$ matrix (case 1
) or its
factorization into the upper triangular matrix $\mathbf{U}$ (case 2
).
In the former case, the factorization will be performed by libOMM, and
$\mathbf{S}$ will be overwritten by $\mathbf{U}$. In both cases,
$\mathbf{H}$ will be overwritten by
$\mathbf{U}^{-\mathrm{T}}\mathbf{H}\mathbf{U}^{-1}$.
A small note for case 1
: if the calculation is spin polarized and uses
the same $\mathbf{S}$ for the up- and down-spin Hamiltonians, the call
to omm
for the first spin point will replace $\mathbf{S}$ with
$\mathbf{U}$; the call for the second spin point should then set
flavour
to 2
. (These considerations are only important when new_S
is set to T
; otherwise, $\mathbf{S}$ is not referenced at all.)
This is selected by setting flavour
to 3
and also applies only
for cubic-scaling calculations with dense matrices. The preconditioner
is suitable for a non-orthogonal localized orbital basis, and is
defined as7:
$\mathbf{P} = \left ( \mathbf{S} + \frac{1}{\tau} \mathbf{T} \right )^{-1}$,
where $\mathbf{T}$ is the kinetic energy matrix, and $\tau$ sets the scale for the kinetic energy preconditioning. If $\mathbf{T}$ is not given, the preconditioner becomes a pure tensorial correction, completely equivalent to Cholesky factorization. The choice of $\tau$ can affect the speed of convergence; reducing it will result in more aggressive kinetic energy preconditioning6.
The interface makes use of the public type matrix
defined by
MatrixSwitch.
subroutine omm( m, n, n_occ, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation)
m
(input) INTEGERn
(input) INTEGERn_occ
(input) INTEGERH
(input/output) TYPE(MATRIX)m
,m
).
If Cholesky factorization is performed, on output it will be overwritten by
$\mathbf{U}^{-\mathrm{T}}\mathbf{H}\mathbf{U}^{-1}$.S
(input/output) TYPE(MATRIX)m
,m
). This matrix
is optional; if it is not needed, it should be passed uninitialized.
If Cholesky factorization is performed, on output it will be
overwritten by the factorized upper triangular matrix $\mathbf{U}$.new_S
(input) LOGICALT
: $\mathbf{S}$ is new (i.e., it has changed from the previous
call to omm
). Note that this only refers to calls with the same
value of ip
.F
: $\mathbf{S}$ has not changed from the previous call to omm
.e_min
(output) DOUBLE PRECISIOND_min
(input/output) TYPE(MATRIX)m
,m
).calc_ED
(input) LOGICALT
: The energy-density matrix is calculated (must follow a
minimization call) and written to D_min
, and all other output
remains unchanged. No minimization is performed.F
: The minimization is performed as usual.eta
(input) DOUBLE PRECISIONC_min
(input/output) TYPE(MATRIX)n
,m
). This is used as the initial guess
for the minimization, and on output will be overwritten by the
output coefficients at the end of the minimization. For dense matrices,
the initial guess does not need to be provided by the user;
it will be generated automatically if init_C
is set to F
.init_C
(input) LOGICALT
: The contents of C_min
has been set (or modified since the
last call to omm
) by the user.F
: The contents of C_min
has not been modified by the user.T
(input/output) TYPE(MATRIX)m
,m
), used for
calculating the preconditioner. This matrix is optional; if it is
not needed, it should be passed uninitialized.scale_T
(input) DOUBLE PRECISIONT
is
uninitialized.flavour
(input) INTEGER0
: basic, can be used with or without $\mathbf{S}$1
: Cholesky factorization, $\mathbf{S}$ provided in S
2
: Cholesky factorization, $\mathbf{U}$ provided in S
(i.e.,
factorization of $\mathbf{S}$ performed by the user)3
: preconditioning, requires $\mathbf{S}$ and, optionally,
$\mathbf{T}$np
(input) INTEGERip
(input) INTEGER1
and np
).cg_tol
(input) DOUBLE PRECISION1.0d-9
is used.long_out
(input) LOGICALT
: Verbose output.F
: Standard output.dealloc
(input) LOGICALT
: Deallocate all internal matrices. Typically should only be
selected for the very last call to omm
.F
: Internal matrices are saved and the data is reused for the next
call to omm
.m_storage
(input) CHARACTER*5m_operation
(input, optional) CHARACTER*3The purpose of the wrapper is to be able to call omm
without using
the MatrixSwitch
Fortran module and the type matrix
defined within
in. This makes it straightforward to call the wrapper from C. Note that
libOMM should be compiled with the -DCBIND
flag when calling from C;
this will ensure that LOGICAL inputs are compatible with _Bool types in
C.
subroutine omm_wrapper( m, n, n_occ, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation)
m
(input) INTEGERn
(input) INTEGERn_occ
(input) INTEGERH
(input/output) CHARACTERS
(input/output) CHARACTERnew_S
(input) LOGICALT
: $\mathbf{S}$ is new (i.e., it has changed from the previous
call to omm
). Note that this only refers to calls with the same
value of ip
.F
: $\mathbf{S}$ has not changed from the previous call to omm
.e_min
(output) DOUBLE PRECISIOND_min
(input/output) CHARACTERcalc_ED
(input) LOGICALT
: The energy-density matrix is calculated (must follow a
minimization call) and written to D_min
, and all other output
remains unchanged. No minimization is performed.F
: The minimization is performed as usual.eta
(input) DOUBLE PRECISIONC_min
(input/output) CHARACTERinit_C
(input) LOGICALT
: The contents of C_min
has been set (or modified since the
last call to omm
) by the user.F
: The contents of C_min
has not been modified by the user.T
(input/output) CHARACTERscale_T
(input) DOUBLE PRECISIONT
is
uninitialized.flavour
(input) INTEGER0
: basic, can be used with or without $\mathbf{S}$1
: Cholesky factorization, $\mathbf{S}$ provided in S
2
: Cholesky factorization, $\mathbf{U}$ provided in S
(i.e.,
factorization of $\mathbf{S}$ performed by the user)3
: preconditioning, requires $\mathbf{S}$ and, optionally,
$\mathbf{T}$np
(input) INTEGERip
(input) INTEGER1
and np
).cg_tol
(input) DOUBLE PRECISION1.0d-9
is used.long_out
(input) LOGICALT
: Verbose output.F
: Standard output.dealloc
(input) LOGICALT
: Deallocate all internal matrices. Typically should only be
selected for the very last call to omm
.F
: Internal matrices are saved and the data is reused for the next
call to omm
.m_storage
(input) CHARACTER*5m_operation
(input, optional) CHARACTER*3The purpose of the subroutine is to get the Hamiltonian during OMM calculations via a callback. For this subroutine, the input Hamiltonian and output energy should be shifted by $\eta$ externally ($\mathbf{H} \rightarrow \mathbf{H} - \eta \mathbf{S}$, $\tilde{E} = \tilde{E} + \eta n_\mathrm{occ}$, where $n_\mathrm{occ}$ is the number of occupied states).
subroutine omm_callback( m, n, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation)
m
(input) INTEGERn
(input) INTEGERH
(input/output) TYPE(MATRIX)S
(input/output) TYPE(MATRIX)m
,m
). This matrix
is optional; if it is not needed, it should be passed uninitialized.
If Cholesky factorization is performed, on output it will be
overwritten by the factorized upper triangular matrix $\mathbf{U}$.new_S
(input) LOGICALT
: $\mathbf{S}$ is new (i.e., it has changed from the previous
call to omm
). Note that this only refers to calls with the same
value of ip
.F
: $\mathbf{S}$ has not changed from the previous call to omm
.e_min
(output) DOUBLE PRECISIOND_min
(input/output) TYPE(MATRIX)m
,m
).calc_ED
(input) LOGICALT
: The energy-density matrix is calculated (must follow a
minimization call) and written to D_min
, and all other output
remains unchanged. No minimization is performed.F
: The minimization is performed as usual.eta
(input) DOUBLE PRECISIONC_min
(input/output) TYPE(MATRIX)n
,m
). This is used as the initial guess
for the minimization, and on output will be overwritten by the
output coefficients at the end of the minimization. For dense matrices,
the initial guess does not need to be provided by the user;
it will be generated automatically if init_C
is set to F
.init_C
(input) LOGICALT
: The contents of C_min
has been set (or modified since the
last call to omm
) by the user.F
: The contents of C_min
has not been modified by the user.T
(input/output) TYPE(MATRIX)m
,m
), used for
calculating the preconditioner. This matrix is optional; if it is
not needed, it should be passed uninitialized.scale_T
(input) DOUBLE PRECISIONT
is
uninitialized.flavour
(input) INTEGER0
: basic, can be used with or without $\mathbf{S}$1
: Cholesky factorization, $\mathbf{S}$ provided in S
2
: Cholesky factorization, $\mathbf{U}$ provided in S
(i.e.,
factorization of $\mathbf{S}$ performed by the user)3
: preconditioning, requires $\mathbf{S}$ and, optionally,
$\mathbf{T}$np
(input) INTEGERip
(input) INTEGER1
and np
).cg_tol
(input) DOUBLE PRECISION1.0d-9
is used.long_out
(input) LOGICALT
: Verbose output.F
: Standard output.dealloc
(input) LOGICALT
: Deallocate all internal matrices. Typically should only be
selected for the very last call to omm
.F
: Internal matrices are saved and the data is reused for the next
call to omm
.m_storage
(input) CHARACTER*5m_operation
(input, optional) CHARACTER*3F. Mauri, G. Galli, and R. Car, Orbital formulation for electronic-structure calculations with linear system-size scaling, Phys. Rev. B 47, 9973 (1993). DOI: 10.1103/PhysRevB.47.9973 ↩︎ ↩︎
F. Mauri and G. Galli, Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling, Phys. Rev. B 50, 4316 (1994). DOI: 10.1103/PhysRevB.50.4316 ↩︎ ↩︎
P. Ordejón, D. A. Drabold, M. P. Grumbach, and R. M. Martin, Unconstrained minimization approach for electronic computations that scales linearly with system size, Phys. Rev. B 48, 14646 (1993). DOI: 10.1103/PhysRevB.48.14646 ↩︎ ↩︎
P. Ordejón, D. A. Drabold, R. M. Martin, and M. P. Grumbach, Linear system-size scaling methods for electronic-structure calculations, Phys. Rev. B 51, 1456 (1995). DOI: 10.1103/PhysRevB.51.1456 ↩︎ ↩︎
J. Kim, F. Mauri, and G. Galli, Total-energy global optimizations using nonorthogonal localized orbitals, Phys. Rev. B 52, 1640 (1995). DOI: 10.1103/PhysRevB.52.1640 ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
F. Corsetti, The orbital minimization method for electronic structure calculations with finite-range atomic basis sets, Comput. Phys. Commun. 185, 873 (2014). DOI: 10.1016/j.cpc.2013.12.008 ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
C. Gan, P. Haynes, and M. Payne, Preconditioned conjugate gradient method for the sparse generalized eigenvalue problem in electronic structure calculations, Comput. Phys. Comm. 134, 33 (2001). DOI: 10.1016/S0010-4655(00)00188-0 ↩︎ ↩︎
B. G. Pfrommer, J. Demmel, and H. Simon, Unconstrained energy functionals for electronic structure calculations, J. Comput. Phys. 150, 287 (1999). DOI: 10.1006/jcph.1998.6181 ↩︎
D. Bowler and T. Miyazaki, ${\mathcal O}(N)$ methods in electronic structure calculations, Rep. Prog. Phys. 75, 036503 (2012). DOI: 10.1088/0034-4885/75/3/036503 ↩︎