# LibOMM

Simplified BSD Fabiano Corsetti (Synopsys QuantumWise, Denmark), Irina Lebedeva (Simune Atomistics, Spain; linear-scaling OMM) Gitlab

libOMM solves the Kohn-Sham equation as a generalized eigenvalue problem for a fixed Hamiltonian. It implements the orbital minimization method (OMM), which works within a density matrix formalism. The basic strategy of the OMM is to find the set of Wannier functions (WFs) describing the occupied subspace by direct unconstrained minimization of an appropriately-constructed functional. The density matrix can then be calculated from the WFs. The solver is usually employed within an outer self-consistency (SCF) cycle. Therefore, the WFs resulting from one SCF iteration can be saved and then re-used as the initial guess for the next iteration.

Features:

• Ordejón-Mauri1234 and Kim functionals5
• Real (Γ-point) and complex (arb. k point) Hamiltonians for cubic-scaling calculations with dense matrices
• Linear-scaling calculations with sparse matrices (Γ-point only)
• Efficient reuse of information within SCF loop and for multiple spins/k points6
• Simple (orthogonal basis) and generalized (non-orthogonal basis) eigenvalue problems
• Cholesky factorization of generalized problem for dense matrices
• Preconditioner for localized orbital basis7 for dense matrices

## Installation

### Prerequisites

libOMM makes use of MatrixSwitch for handling the matrix operations. This is included in the libOMM distribution (alternatively, an external version of MatrixSwitch can be used). The basic routines can be installed with only a Fortran compiler.

Optional requirements are:

• BLAS + LAPACK for lap operations with the s?den format
• MPI + BLAS + LAPACK + ScaLAPACK for the p?dbc format
• MPI + BLAS + LAPACK + DBCSR for the pdcsr format

### Instructions

1. Enter the src directory.
2. Copy make.inc.example to make.inc and modify it to suit your needs. MSLIBPATH should point to the MatrixSwitch directory (default in make.inc.example is for the version included in the distribution). MatrixSwitch should be compiled with the -DCONV flag. Available options for FPPFLAGS are:
• -DHAVE_MPI: enable MPI parallel routines
• -DHAVE_LAPACK: enable LAPACK routines
• -DHAVE_SCALAPACK: enable ScaLAPACK routines (requires -DHAVE_MPI)
• -DHAVE_PSPBLAS: enable PSPBLAS routines
• -DNORAND: fixed seed for the random number generator. Enable for testing purposes.
• -DCBIND: use ISO_C_BINDING for LOGICAL inputs in the wrapper interfaces. Enable for linking to C.
3. Type make.

### Tests

The examples directory contains a number of small programs that make use of libOMM with MatrixSwitch. These can be useful both for testing the installation and for learning how to use the library. To compile them:

1. Enter the examples directory.
2. Copy make.inc.example to make.inc and modify it to suit your needs. Be aware that make.inc in the src directory will also be used.
3. Type make.

Each example contains a header explaining what the program does and providing sample output to compare against.

## Publications

The algorithms and implementation of the library are described here6.

## Algorithm

The orbital minimization method (OMM) is a class of algorithms for solving the Kohn-Sham (KS) eigenproblem within a density matrix formalism. The basic strategy is to find the set of $n$ Wannier functions (WFs) describing the occupied subspace of a system by direct unconstrained minimization of an appropriately-constructed energy functional. The OMM was originally proposed in the context of linear-scaling DFT, by Mauri, Galli, and Car12, and Ordejón et al.34.

In this original formulation, the energy functional is given by

$\tilde{E} \left[ \mathbf{C} \right] = 2\mathrm{Tr} \left\{\left[ \mathbf{I}_n + \left( \mathbf{I}_n-\mathbf{S}_\mathrm{W} \right) \right] \mathbf{H}_\mathrm{W} \right\}$,

where $\mathbf{C}$ is the matrix of coefficients of the $n$ WFs in the original basis, $\mathbf{I}_n$ is the $n\times n$ identity matrix and $\mathbf{H}_\mathrm{W}$ and $\mathbf{S}_\mathrm{W}$ are the Hamiltonian and overlap matrices in the basis of WFs. It can be shown that, by minimizing $\tilde{E}$ with respect to $\mathbf{C}$, we obtain $\mathbf{S}_\mathrm{W}=\mathbf{I}_n$ and $\tilde{E} \left [ \mathbf{C}_0 \right ] = E_0$, where $\mathbf{C}_0$ describes the occupied subspace, and $E_0$ is the ground-state KS (band) energy.

The density matrix in this formalism is computed as

$\mathbf{D} = 2\mathbf{C}\left[ \mathbf{I}_n + \left( \mathbf{I}_n-\mathbf{S}_\mathrm{W} \right) \right] \mathbf{C}^\dagger$,

where $\mathbf{C}^\dagger$ is the Hermitian conjugate of the coefficient matrix. The energy-density matrix that is also needed to calculate stresses and forces is given by

$\mathbf{D}_E = 2\mathbf{C}\mathbf{H}_\mathrm{W} \mathbf{C}^\dagger$

Although the Ordejón-Mauri functional has found applications as a cubic-scaling solver86 (available in the libOMM library), the original linear-scaling implementations found serious problems with local minima, arising from the enforcement of spatial localization for the WFs. In order to solve this, more generalized/advanced versions of the OMM have been proposed, such as that of Kim, Mauri, and Galli5. A comprehensive review of these can be found here9.

## Documentation

The best way of learning how to use libOMM is by example. See the examples in the examples directory for this. The basic idea is very simple: you call omm, providing as input the Hamiltonian and overlap matrices and the number of occupied bands, and receiving as output the ground-state density matrix and Kohn-Sham (band) energy. However, there are a number of important subtleties, due to the choice of different flavours of the OMM functional (Cholesky factorization, preconditioning, etc.), and the fact that the subroutine is typically called many times, looping over MD and SCF iterations, as well as spins and k points.

### Matrices

The first thing to notice is that all matrices in libOMM are handled by MatrixSwitch. This allows for easy usage of different storage formats and implementations of the matrix operations. The matrices have to therefore be passed to omm as type matrix. This is quite easy! Also note that if you are already using one of the allowed formats (e.g., the ScaLAPACK format), you can simply register your matrices with MatrixSwitch without needing to copy any data. You will need to pass to libOMM the 5-character code denoting the storage format and the 3-character code denoting the implementation of the matrix operations to use.

There are no optional arguments in the call to omm. Therefore, matrices which are not needed should simpy be passed without having been allocated by MatrixSwitch (m_allocate routine). Other variables which are not needed will be ignored.

If you are calling libOMM from C you cannot make use of type matrix. In this case you can instead call one of the available wrapper.

### Reuse of data

The most important instance of data reuse is in the WFs describing the occupied subspace (stored in the coefficients matrix $\mathbf{C}$). The number of conjugate gradient (CG) steps needed by libOMM will fall drastically in the initial few SCF iterations if the final $\mathbf{C}$ from one iteration is used as the starting guess for the next one. Since this matrix is passed in and out of omm, the user should typically take care not to modify it between calls. Sometimes, however, it might be advantageous to do so, e.g., if the user can provide an extrapolation scheme for $\mathbf{C}$, or following an explicit diagonalization call (in this case, the matrix can be populated by the occupied eigenstates). If there is more than one spin/k point within an SCF iteration, a separate $\mathbf{C}$ matrix for each point should be stored.

libOMM internally allocates and uses a number of matrices. Some of these are saved between calls in order to maximize data reuse, and, hence, to minimize the computational effort. The data is specific to each spin/k point, and its reuse depends on passing the same $\mathbf{C}$ from the previous SCF iteration at the same spin/k point. For this reason, it is important to specify the total number of spin+k points and the current spin+k point identifier (np and ip, respectively) in each call to omm.

Due to this internal data reuse, it is also necessary to inform libOMM when a new overlap matrix $\mathbf{S}$ is being passed in (i.e., the first SCF iteration of each new MD step). This is specified by new_S. It is important to note that each spin+k point is treated independently, so, e.g., if the calculation is spin polarized and uses the same $\mathbf{S}$ for the up- and down-spin Hamiltonians, it is necessary to set new_S to T for both calls to omm at the start of a new MD step.

Finally, you should always remember to free all allocated memory at the end of your program. In order to deallocate libOMM's internal matrices, call omm with dealloc set to T. This will deallocate the data for all spin/k points; therefore, only the very last call to omm should do so.

### The energy-density matrix

If the energy-density matrix is needed (typically only at the end of the SCF cycle to compute forces), it can be calculated in a separate call to omm following a minimization call, by setting calc_ED to T (the default value should be F). The unmodified $\mathbf{C}$ matrix from the previous call has to be passed in, and data stored internally will also be reused.

Note that the output density matrix from the previous call is not needed to calculate the energy-density matrix; the latter will be returned by omm as D_min.

### OMM flavours

The choice of flavour of the OMM functional can have a huge impact on performance. The basic flavour should be avoided when $\mathbf{S}$ is present. Cholesky factorization is usually the most efficient, although preconditioning will be better if kinetic energy ill-conditioning is significant.

The choice is controlled by flavour. The options are:

#### Basic

This is selected by setting flavour to 0. The basic algorithm does not provide any tensorial correction for non-orthogonal bases, which can lead to a very ill-conditioned problem (especially for large system/basis sizes) requiring many CG steps. This is the only falvour available for linear-scaling calculations at the moment.

#### Cholesky factorization

This is selected by setting flavour to either 1 or 2 and applies only for cubic-scaling calculations with dense matrices. Cholesky factorization reduces the generalized eigenvalue problem to standard form; the basic algorithm is then performed with no $\mathbf{S}$. For a localized basis, this is usually the most important form of ill-conditioning.

The user can provide either the $\mathbf{S}$ matrix (case 1) or its factorization into the upper triangular matrix $\mathbf{U}$ (case 2). In the former case, the factorization will be performed by libOMM, and $\mathbf{S}$ will be overwritten by $\mathbf{U}$. In both cases, $\mathbf{H}$ will be overwritten by $\mathbf{U}^{-\mathrm{T}}\mathbf{H}\mathbf{U}^{-1}$.

A small note for case 1: if the calculation is spin polarized and uses the same $\mathbf{S}$ for the up- and down-spin Hamiltonians, the call to omm for the first spin point will replace $\mathbf{S}$ with $\mathbf{U}$; the call for the second spin point should then set flavour to 2. (These considerations are only important when new_S is set to T; otherwise, $\mathbf{S}$ is not referenced at all.)

#### Preconditioning

This is selected by setting flavour to 3 and also applies only for cubic-scaling calculations with dense matrices. The preconditioner is suitable for a non-orthogonal localized orbital basis, and is defined as7:

$\mathbf{P} = \left ( \mathbf{S} + \frac{1}{\tau} \mathbf{T} \right )^{-1}$,

where $\mathbf{T}$ is the kinetic energy matrix, and $\tau$ sets the scale for the kinetic energy preconditioning. If $\mathbf{T}$ is not given, the preconditioner becomes a pure tensorial correction, completely equivalent to Cholesky factorization. The choice of $\tau$ can affect the speed of convergence; reducing it will result in more aggressive kinetic energy preconditioning6.

### Future developments

• Plane-wave preconditioner
• Preconditioning for calculations with sparse matrices

## Programming interface

The interface makes use of the public type matrix defined by MatrixSwitch.

### Public subroutines

subroutine omm( m, n, n_occ, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation)

• m (input) INTEGER
The size of the basis.
• n (input) INTEGER
The number of considered eigenstates.
• n_occ (input) INTEGER
The number of occupied eigenstates.
• H (input/output) TYPE(MATRIX)
The input Hamiltonian matrix $\mathbf{H}$, size (m,m). If Cholesky factorization is performed, on output it will be overwritten by $\mathbf{U}^{-\mathrm{T}}\mathbf{H}\mathbf{U}^{-1}$.
• S (input/output) TYPE(MATRIX)
The input overlap matrix $\mathbf{S}$, size (m,m). This matrix is optional; if it is not needed, it should be passed uninitialized. If Cholesky factorization is performed, on output it will be overwritten by the factorized upper triangular matrix $\mathbf{U}$.
• new_S (input) LOGICAL
T: $\mathbf{S}$ is new (i.e., it has changed from the previous call to omm). Note that this only refers to calls with the same value of ip.
F: $\mathbf{S}$ has not changed from the previous call to omm.
• e_min (output) DOUBLE PRECISION
The total energy from the OMM functional $\tilde{E} \left [ \mathbf{C} \right ]$ at the end of the minimization (spin degeneracy factor not included).
• D_min (input/output) TYPE(MATRIX)
The output density matrix at the end of the minimization, size (m,m).
• calc_ED (input) LOGICAL
T: The energy-density matrix is calculated (must follow a minimization call) and written to D_min, and all other output remains unchanged. No minimization is performed.
F: The minimization is performed as usual.
• eta (input) DOUBLE PRECISION
The eigenspectrum shift parameter $\eta$5 (the input Hamiltonian should not be shifted).
• C_min (input/output) TYPE(MATRIX)
The Hermitian conjugate of the coefficients matrix $\mathbf{C}$ for the set of WFs, size (n,m). This is used as the initial guess for the minimization, and on output will be overwritten by the output coefficients at the end of the minimization. For dense matrices, the initial guess does not need to be provided by the user; it will be generated automatically if init_C is set to F.
• init_C (input) LOGICAL
T: The contents of C_min has been set (or modified since the last call to omm) by the user.
F: The contents of C_min has not been modified by the user.
• T (input/output) TYPE(MATRIX)
The input kinetic energy matrix, size (m,m), used for calculating the preconditioner. This matrix is optional; if it is not needed, it should be passed uninitialized.
• scale_T (input) DOUBLE PRECISION
Scale of the kinetic energy preconditioning $\tau$; a value around 10 Ry is recommended6 (note that the units should match those of the operator matrices provided). Not referenced if T is uninitialized.
• flavour (input) INTEGER
Flavour of the OMM functional:
0: basic, can be used with or without $\mathbf{S}$
1: Cholesky factorization, $\mathbf{S}$ provided in S
2: Cholesky factorization, $\mathbf{U}$ provided in S (i.e., factorization of $\mathbf{S}$ performed by the user)
3: preconditioning, requires $\mathbf{S}$ and, optionally, $\mathbf{T}$
• np (input) INTEGER
Total number of spin+k points for a single SCF iteration (i.e., the number of spins times the number of k points).
• ip (input) INTEGER
Spin+k point identifier (must be between 1 and np).
• cg_tol (input) DOUBLE PRECISION
Tolerance for the minimization. The stopping criterion is for the relative energy difference between subsequent line searches $2 \left ( \tilde{E} \left [ \mathbf{C}^\mathrm{new} \right ]-\tilde{E} \left [ \mathbf{C} \right ] \right ) / \left ( \tilde{E} \left [ \mathbf{C}^\mathrm{new} \right ]+\tilde{E} \left [ \mathbf{C} \right ] \right )$ to be smaller than or equal to this value. If negative, the default value of 1.0d-9 is used.
• long_out (input) LOGICAL
T: Verbose output.
F: Standard output.
• dealloc (input) LOGICAL
T: Deallocate all internal matrices. Typically should only be selected for the very last call to omm.
F: Internal matrices are saved and the data is reused for the next call to omm.
• m_storage (input) CHARACTER*5
MatrixSwitch storage format to use for internal matrices. See the list of available formats.
• m_operation (input, optional) CHARACTER*3
Implementation of the MatrixSwitch operations to use. See the list of available implementations.

### Wrapper/C interface

The purpose of the wrapper is to be able to call omm without using the MatrixSwitch Fortran module and the type matrix defined within in. This makes it straightforward to call the wrapper from C. Note that libOMM should be compiled with the -DCBIND flag when calling from C; this will ensure that LOGICAL inputs are compatible with _Bool types in C.

subroutine omm_wrapper( m, n, n_occ, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation)

• m (input) INTEGER
The size of the basis.
• n (input) INTEGER
The number of considered eigenstates.
• n_occ (input) INTEGER
The number of occupied eigenstates.
• H (input/output) CHARACTER
The symbol of the Hamiltonian matrix $\mathbf{H}$.
• S (input/output) CHARACTER
The symbol of the overlap matrix $\mathbf{S}$.
• new_S (input) LOGICAL
T: $\mathbf{S}$ is new (i.e., it has changed from the previous call to omm). Note that this only refers to calls with the same value of ip.
F: $\mathbf{S}$ has not changed from the previous call to omm.
• e_min (output) DOUBLE PRECISION
The total energy from the OMM functional $\tilde{E} \left [ \mathbf{C} \right ]$ at the end of the minimization (spin degeneracy factor not included).
• D_min (input/output) CHARACTER
The symbol of the (energy-)density matrix.
• calc_ED (input) LOGICAL
T: The energy-density matrix is calculated (must follow a minimization call) and written to D_min, and all other output remains unchanged. No minimization is performed.
F: The minimization is performed as usual.
• eta (input) DOUBLE PRECISION
The eigenspectrum shift parameter $\eta$5.
• C_min (input/output) CHARACTER
The name of the Hermitian conjugate of the coefficients matrix.
• init_C (input) LOGICAL
T: The contents of C_min has been set (or modified since the last call to omm) by the user.
F: The contents of C_min has not been modified by the user.
• T (input/output) CHARACTER
The name of the kinetic energy matrix.
• scale_T (input) DOUBLE PRECISION
Scale of the kinetic energy preconditioning $\tau$; a value around 10 Ry is recommended6 (note that the units should match those of the operator matrices provided). Not referenced if T is uninitialized.
• flavour (input) INTEGER
Flavour of the OMM functional:
0: basic, can be used with or without $\mathbf{S}$
1: Cholesky factorization, $\mathbf{S}$ provided in S
2: Cholesky factorization, $\mathbf{U}$ provided in S (i.e., factorization of $\mathbf{S}$ performed by the user)
3: preconditioning, requires $\mathbf{S}$ and, optionally, $\mathbf{T}$
• np (input) INTEGER
Total number of spin+k points for a single SCF iteration (i.e., the number of spins times the number of k points).
• ip (input) INTEGER
Spin+k point identifier (must be between 1 and np).
• cg_tol (input) DOUBLE PRECISION
Tolerance for the minimization. The stopping criterion is for the relative energy difference between subsequent line searches $2 \left ( \tilde{E} \left [ \mathbf{C}^\mathrm{new} \right ]-\tilde{E} \left [ \mathbf{C} \right ] \right ) / \left ( \tilde{E} \left [ \mathbf{C}^\mathrm{new} \right ]+\tilde{E} \left [ \mathbf{C} \right ] \right )$ to be smaller than or equal to this value. If negative, the default value of 1.0d-9 is used.
• long_out (input) LOGICAL
T: Verbose output.
F: Standard output.
• dealloc (input) LOGICAL
T: Deallocate all internal matrices. Typically should only be selected for the very last call to omm.
F: Internal matrices are saved and the data is reused for the next call to omm.
• m_storage (input) CHARACTER*5
MatrixSwitch storage format to use for internal matrices. See the list of available formats.
• m_operation (input, optional) CHARACTER*3
Implementation of the MatrixSwitch operations to use. See the list of available implementations.

### Callback

The purpose of the subroutine is to get the Hamiltonian during OMM calculations via a callback. For this subroutine, the input Hamiltonian and output energy should be shifted by $\eta$ externally ($\mathbf{H} \rightarrow \mathbf{H} - \eta \mathbf{S}$, $\tilde{E} = \tilde{E} + \eta n_\mathrm{occ}$, where $n_\mathrm{occ}$ is the number of occupied states).

subroutine omm_callback( m, n, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation)

• m (input) INTEGER
The size of the basis.
• n (input) INTEGER
The number of considered eigenstates.
• H (input/output) TYPE(MATRIX)
The Hamiltonian matrix.
• S (input/output) TYPE(MATRIX)
The input overlap matrix $\mathbf{S}$, size (m,m). This matrix is optional; if it is not needed, it should be passed uninitialized. If Cholesky factorization is performed, on output it will be overwritten by the factorized upper triangular matrix $\mathbf{U}$.
• new_S (input) LOGICAL
T: $\mathbf{S}$ is new (i.e., it has changed from the previous call to omm). Note that this only refers to calls with the same value of ip.
F: $\mathbf{S}$ has not changed from the previous call to omm.
• e_min (output) DOUBLE PRECISION
The total energy from the OMM functional $\tilde{E} \left [ \mathbf{C} \right ]$ at the end of the minimization (spin degeneracy factor not included).
• D_min (input/output) TYPE(MATRIX)
The output density matrix at the end of the minimization, size (m,m).
• calc_ED (input) LOGICAL
T: The energy-density matrix is calculated (must follow a minimization call) and written to D_min, and all other output remains unchanged. No minimization is performed.
F: The minimization is performed as usual.
• eta (input) DOUBLE PRECISION
The eigenspectrum shift parameter $\eta$5.
• C_min (input/output) TYPE(MATRIX)
The Hermitian conjugate of the coefficients matrix $\mathbf{C}$ for the set of WFs, size (n,m). This is used as the initial guess for the minimization, and on output will be overwritten by the output coefficients at the end of the minimization. For dense matrices, the initial guess does not need to be provided by the user; it will be generated automatically if init_C is set to F.
• init_C (input) LOGICAL
T: The contents of C_min has been set (or modified since the last call to omm) by the user.
F: The contents of C_min has not been modified by the user.
• T (input/output) TYPE(MATRIX)
The input kinetic energy matrix, size (m,m), used for calculating the preconditioner. This matrix is optional; if it is not needed, it should be passed uninitialized.
• scale_T (input) DOUBLE PRECISION
Scale of the kinetic energy preconditioning $\tau$; a value around 10 Ry is recommended6 (note that the units should match those of the operator matrices provided). Not referenced if T is uninitialized.
• flavour (input) INTEGER
Flavour of the OMM functional:
0: basic, can be used with or without $\mathbf{S}$
1: Cholesky factorization, $\mathbf{S}$ provided in S
2: Cholesky factorization, $\mathbf{U}$ provided in S (i.e., factorization of $\mathbf{S}$ performed by the user)
3: preconditioning, requires $\mathbf{S}$ and, optionally, $\mathbf{T}$
• np (input) INTEGER
Total number of spin+k points for a single SCF iteration (i.e., the number of spins times the number of k points).
• ip (input) INTEGER
Spin+k point identifier (must be between 1 and np).
• cg_tol (input) DOUBLE PRECISION
Tolerance for the minimization. The stopping criterion is for the relative energy difference between subsequent line searches $2 \left ( \tilde{E} \left [ \mathbf{C}^\mathrm{new} \right ]-\tilde{E} \left [ \mathbf{C} \right ] \right ) / \left ( \tilde{E} \left [ \mathbf{C}^\mathrm{new} \right ]+\tilde{E} \left [ \mathbf{C} \right ] \right )$ to be smaller than or equal to this value. If negative, the default value of 1.0d-9 is used.
• long_out (input) LOGICAL
T: Verbose output.
F: Standard output.
• dealloc (input) LOGICAL
T: Deallocate all internal matrices. Typically should only be selected for the very last call to omm.
F: Internal matrices are saved and the data is reused for the next call to omm.
• m_storage (input) CHARACTER*5
MatrixSwitch storage format to use for internal matrices. See the list of available formats.
• m_operation (input, optional) CHARACTER*3
Implementation of the MatrixSwitch operations to use. See the list of available implementations.

## References

1. F. Mauri, G. Galli, and R. Car, Orbital formulation for electronic-structure calculations with linear system-size scaling, Phys. Rev. B 47, 9973 (1993). DOI: 10.1103/PhysRevB.47.9973 ↩︎

2. F. Mauri and G. Galli, Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling, Phys. Rev. B 50, 4316 (1994). DOI: 10.1103/PhysRevB.50.4316 ↩︎

3. P. Ordejón, D. A. Drabold, M. P. Grumbach, and R. M. Martin, Unconstrained minimization approach for electronic computations that scales linearly with system size, Phys. Rev. B 48, 14646 (1993). DOI: 10.1103/PhysRevB.48.14646 ↩︎

4. P. Ordejón, D. A. Drabold, R. M. Martin, and M. P. Grumbach, Linear system-size scaling methods for electronic-structure calculations, Phys. Rev. B 51, 1456 (1995). DOI: 10.1103/PhysRevB.51.1456 ↩︎

5. J. Kim, F. Mauri, and G. Galli, Total-energy global optimizations using nonorthogonal localized orbitals, Phys. Rev. B 52, 1640 (1995). DOI: 10.1103/PhysRevB.52.1640 ↩︎

6. F. Corsetti, The orbital minimization method for electronic structure calculations with finite-range atomic basis sets, Comput. Phys. Commun. 185, 873 (2014). DOI: 10.1016/j.cpc.2013.12.008 ↩︎

7. C. Gan, P. Haynes, and M. Payne, Preconditioned conjugate gradient method for the sparse generalized eigenvalue problem in electronic structure calculations, Comput. Phys. Comm. 134, 33 (2001). DOI: 10.1016/S0010-4655(00)00188-0 ↩︎

8. B. G. Pfrommer, J. Demmel, and H. Simon, Unconstrained energy functionals for electronic structure calculations, J. Comput. Phys. 150, 287 (1999). DOI: 10.1006/jcph.1998.6181 ↩︎

9. D. Bowler and T. Miyazaki, ${\mathcal O}(N)$ methods in electronic structure calculations, Rep. Prog. Phys. 75, 036503 (2012). DOI: 10.1088/0034-4885/75/3/036503 ↩︎