LibOMM

From ESL
Jump to: navigation, search

Source authors:
Fabiano Corsetti, Synopsys QuantumWise, Denmark

License: Simplified BSD (see README.md in the source code)

Download: E-CAM Gitlab

Links to other ESL entries

Functionalities:

Software:

libOMM solves the Kohn-Sham equation as a generalized eigenvalue problem for a fixed Hamiltonian. It implements the orbital minimization method (OMM), which works within a density matrix formalism. The basic strategy of the OMM is to find the set of Wannier functions (WFs) describing the occupied subspace by direct unconstrained minimization of an appropriately-constructed functional. The density matrix can then be calculated from the WFs. The solver is usually employed within an outer self-consistency (SCF) cycle. Therefore, the WFs resulting from one SCF iteration can be saved and then re-used as the initial guess for the next iteration.

Features:

  • OMM functional of Mauri, Galli, and Car[1][2], and Ordejón et al.[3][4]
  • Real (Γ-point) and complex (arb. k point) Hamiltonians
  • Efficient reuse of information within SCF loop and for multiple spins/k points[5]
  • Simple (orthogonal basis) and generalized (non-orthogonal basis) eigenvalue problems
  • Cholesky factorization of generalized problem
  • Preconditioner for localized orbital basis[6]

Installation

Prerequisites

libOMM makes use of MatrixSwitch for handling the matrix operations. This is included in the libOMM distribution (alternatively, an external version of MatrixSwitch can be used). The basic routines can be installed with only a Fortran compiler.

Optional requirements are:

  • BLAS + LAPACK for lap operations with the s?den format
  • MPI + BLAS + LAPACK + ScaLAPACK for the p?dbc format

Instructions

  1. Enter the src directory.
  2. Copy make.inc.example to make.inc and modify it to suit your needs. MSLIBPATH should point to the MatrixSwitch directory (default in make.inc.example is for the version included in the distribution). MatrixSwitch should be compiled with the -DCONV flag. Available options for FPPFLAGS are:
    • -DMPI: enable MPI parallel routines
    • -DLAP: enable LAPACK routines (currently necessary for preconditioning/Cholesky factorization)
    • -DSLAP: enable ScaLAPACK routines (requires -DMPI)
    • -DNORAND: fixed seed for the random number generator. Enable for testing purposes.
    • -DCBIND: use ISO_C_BINDING for LOGICAL inputs in the wrapper interfaces. Enable for linking to C.
  3. Type make.

Tests

The examples directory contains a number of small programs that make use of libOMM with MatrixSwitch. These can be useful both for testing the installation and for learning how to use the library. To compile them:

  1. Enter the examples directory.
  2. Copy make.inc.example to make.inc and modify it to suit your needs. Be aware that make.inc in the src directory will also be used.
  3. Type make.

Each example contains a header explaining what the program does and providing sample output to compare against.

Publications

The algorithms and implementation of the library are described here[5].

Algorithm

The orbital minimization method (OMM) is a class of algorithms for solving the Kohn-Sham (KS) eigenproblem within a density matrix formalism. The basic strategy is to find the set of n Wannier functions (WFs) describing the occupied subspace of a system by direct unconstrained minimization of an appropriately-constructed energy functional. The OMM was originally proposed in the context of linear-scaling DFT, by Mauri, Galli, and Car[1][2], and Ordejón et al.[3][4].

In this original formulation, the energy functional is given by


{\tilde  {E}}\left[{\mathbf  {C}}\right]=2{\mathrm  {Tr}}\left\{\left[{\mathbf  {I}}_{n}+\left({\mathbf  {I}}_{n}-{\mathbf  {S}}_{{\mathrm  {W}}}\right)\right]{\mathbf  {H}}_{{\mathrm  {W}}}\right\},


where {\mathbf  {C}} is the matrix of coefficients of the n WFs in the original basis, and {\mathbf  {H}}_{{\mathrm  {W}}} and {\mathbf  {S}}_{{\mathrm  {W}}} are the Hamiltonian and overlap matrices in the basis of WFs. It can be shown that, by minimizing {\tilde  {E}} with respect to {\mathbf  {C}}, we obtain {\mathbf  {S}}_{{\mathrm  {W}}}={\mathbf  {I}}_{n} and {\tilde  {E}}\left[{\mathbf  {C}}_{0}\right]=E_{0}, where {\mathbf  {C}}_{0} describes the occupied subspace, and E_{0} is the ground-state KS (band) energy.

Although this functional has found applications as a cubic-scaling solver[7][5] (available in the libOMM library), the original linear-scaling implementations found serious problems with local minima, arising from the enforcement of spatial localization for the WFs. In order to solve this, more generalized/advanced versions of the OMM have been proposed, such as that of Kim, Mauri, and Galli[8]. A comprehensive review of these can be found here[9].

Documentation

The best way of learning how to use libOMM is by example. See the examples in the examples directory for this. The basic idea is very simple: you call omm, providing as input the Hamiltonian and overlap matrices and the number of occupied bands, and receiving as output the ground-state density matrix and Kohn-Sham (band) energy. However, there are a number of important subtleties, due to the choice of different flavours of the OMM functional (Cholesky factorization, preconditioning, etc.), and the fact that the subroutine is typically called many times, looping over MD and SCF iterations, as well as spins and k points.

Matrices

The first thing to notice is that all matrices in libOMM are handled by MatrixSwitch. This allows for easy usage of different storage formats and implementations of the matrix operations. The matrices have to therefore be passed to omm as type matrix. This is quite easy! Also note that if you are already using one of the allowed formats (e.g., the ScaLAPACK format), you can simply register your matrices with MatrixSwitch without needing to copy any data. You will need to pass to libOMM the 5-character code denoting the storage format and the 3-character code denoting the implementation of the matrix operations to use.

There are no optional arguments in the call to omm. Therefore, matrices which are not needed should simpy be passed without having been allocated by MatrixSwitch (m_allocate routine). Other variables which are not needed will be ignored.

If you are calling libOMM from C you cannot make use of type matrix. In this case you can instead call one of the available wrappers.

Reuse of data

The most important instance of data reuse is in the WFs describing the occupied subspace (stored in the coefficients matrix {\mathbf  {C}}). The number of conjugate gradient (CG) steps needed by libOMM will fall drastically in the initial few SCF iterations if the final {\mathbf  {C}} from one iteration is used as the starting guess for the next one. Since this matrix is passed in and out of omm, the user should typically take care not to modify it between calls. Sometimes, however, it might be advantageous to do so, e.g., if the user can provide an extrapolation scheme for {\mathbf  {C}}, or following an explicit diagonalization call (in this case, the matrix can be populated by the occupied eigenstates). If there is more than one spin/k point within an SCF iteration, a separate {\mathbf  {C}} matrix for each point should be stored.

libOMM internally allocates and uses a number of matrices. Some of these are saved between calls in order to maximize data reuse, and, hence, to minimize the computational effort. The data is specific to each spin/k point, and its reuse depends on passing the same {\mathbf  {C}} from the previous SCF iteration at the same spin/k point. For this reason, it is important to specify the total number of spin+k points and the current spin+k point identifier (np and ip, respectively) in each call to omm.

Due to this internal data reuse, it is also necessary to inform libOMM when a new overlap matrix {\mathbf  {S}} is being passed in (i.e., the first SCF iteration of each new MD step). This is specified by new_S. It is important to note that each spin+k point is treated independently, so, e.g., if the calculation is spin polarized and uses the same {\mathbf  {S}} for the up- and down-spin Hamiltonians, it is necessary to set new_S to T for both calls to omm at the start of a new MD step.

Finally, you should always remember to free all allocated memory at the end of your program. In order to deallocate libOMM's internal matrices, call omm with dealloc set to T. This will deallocate the data for all spin/k points; therefore, only the very last call to omm should do so.

The energy-weighted density matrix

If the energy-weighted density matrix is needed (typically only at the end of the SCF cycle), it can be calculated in a separate call to omm following a minimization call, by setting calc_ED to T (the default value should be F). The unmodified {\mathbf  {C}} matrix from the previous call has to be passed in, and data stored internally will also be reused.

Note that the output density matrix from the previous call is not needed to calculate the energy-weighted density matrix; the latter will be returned by omm as D_min.

OMM flavours

The choice of flavour of the OMM functional can have a huge impact on performance. The basic flavour should be avoided when {\mathbf  {S}} is present. Cholesky factorization is usually the most efficient, although preconditioning will be better if kinetic energy ill-conditioning is significant.

The choice is controlled by flavour. The options are:

Basic

This is selected by setting flavour to 0. The basic algorithm does not provide any tensorial correction for non-orthogonal bases, which can lead to a very ill-conditioned problem (especially for large system/basis sizes) requiring many CG steps.

Cholesky factorization

This is selected by setting flavour to either 1 or 2. Cholesky factorization reduces the generalized eigenvalue problem to standard form; the basic algorithm is then performed with no {\mathbf  {S}}. For a localized basis, this is usually the most important form of ill-conditioning.

The user can provide either the {\mathbf  {S}} matrix (case 1) or its factorization into the upper triangular matrix {\mathbf  {U}} (case 2). In the former case, the factorization will be performed by libOMM, and {\mathbf  {S}} will be overwritten by {\mathbf  {U}}. In both cases, {\mathbf  {H}} will be overwritten by {\mathbf  {U}}^{{-{\mathrm  {T}}}}{\mathbf  {H}}{\mathbf  {U}}^{{-1}}.

A small note for case 1: if the calculation is spin polarized and uses the same {\mathbf  {S}} for the up- and down-spin Hamiltonians, the call to omm for the first spin point will replace {\mathbf  {S}} with {\mathbf  {U}}; the call for the second spin point should then set flavour to 2. (These considerations are only important when new_S is set to T; otherwise, {\mathbf  {S}} is not referenced at all.)

Preconditioning

This is selected by setting flavour to 3. The preconditioner is suitable for a non-orthogonal localized orbital basis, and is defined as[6]:

{\mathbf  {P}}=\left({\mathbf  {S}}+{\frac  {1}{\tau }}{\mathbf  {T}}\right)^{{-1}},

where {\mathbf  {T}} is the kinetic energy matrix, and \tau sets the scale for the kinetic energy preconditioning. If {\mathbf  {T}} is not given, the preconditioner becomes a pure tensorial correction, completely equivalent to Cholesky factorization. The choice of \tau can affect the speed of convergence; reducing it will result in more aggressive kinetic energy preconditioning[5].

Future developments

  • Sparse/dense operations for cubic-scaling solver with localized orbitals
  • Plane-wave preconditioner
  • Linear-scaling solver (all sparse operations, Kim-Mauri-Galli functional[8], fixing of total electron number)

Programming interface

The interface makes use of the public type matrix defined by MatrixSwitch.

Public subroutines


subroutine omm( m, n, H, S, new_S, e_min, D_min, calc_ED, eta, C_min, init_C, T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, m_storage, m_operation, mpi_rank )

  • m (input) INTEGER
    The size of the basis.
  • n (input) INTEGER
    The number of occupied eigenstates.
  • H (input/output) TYPE(MATRIX)
    The input Hamiltonian matrix {\mathbf  {H}}, size (m,m). This must be provided already shifted such that {\mathbf  {H}}\rightarrow {\mathbf  {H}}-\eta {\mathbf  {S}}. If Cholesky factorization is performed, on output it will be overwritten by {\mathbf  {U}}^{{-{\mathrm  {T}}}}{\mathbf  {H}}{\mathbf  {U}}^{{-1}}.
  • S (input/output) TYPE(MATRIX)
    The input overlap matrix {\mathbf  {S}}, size (m,m). This matrix is optional; if it is not needed, it should be passed uninitialized. If Cholesky factorization is performed, on output it will be overwritten by the factorized upper triangular matrix {\mathbf  {U}}.
  • new_S (input) LOGICAL
    T: {\mathbf  {S}} is new (i.e., it has changed from the previous call to omm). Note that this only refers to calls with the same value of ip.
    F: {\mathbf  {S}} has not changed from the previous call to omm.
  • e_min (output) DOUBLE PRECISION
    The total energy from the OMM functional {\tilde  {E}}\left[{\mathbf  {C}}\right] at the end of the minimization (spin degeneracy factor not included).
  • D_min (input/output) TYPE(MATRIX)
    The output density matrix at the end of the minimization, size (m,m).
  • calc_ED (input) LOGICAL
    T: The energy-weighted density matrix is calculated (must follow a minimization call) and written to D_min, and all other output remains unchanged. No minimization is performed.
    F: The minimization is performed as usual.
  • eta (input) DOUBLE PRECISION
    The eigenspectrum shift parameter \eta [5].
  • C_min (input/output) TYPE(MATRIX)
    The coefficients matrix {\mathbf  {C}} for the set of WFs, size (n,m). This is used as the initial guess for the minimization, and on output will be overwritten by the output coefficients at the end of the minimization. The initial guess does not need to be provided by the user; it will be generated automatically if init_C is set to F.
  • init_C (input) LOGICAL
    T: The contents of C_min has been set (or modified since the last call to omm) by the user.
    F: The contents of C_min has not been modified by the user.
  • T (input/output) TYPE(MATRIX)
    The input kinetic energy matrix, size (m,m), used for calculating the preconditioner. This matrix is optional; if it is not needed, it should be passed uninitialized.
  • scale_T (input) DOUBLE PRECISION
    Scale of the kinetic energy preconditioning \tau ; a value around 10 Ry is recommended[5] (note that the units should match those of the operator matrices provided). Not referenced if T is uninitialized.
  • flavour (input) INTEGER
    Flavour of the OMM functional:
    0: basic, can be used with or without {\mathbf  {S}}
    1: Cholesky factorization, {\mathbf  {S}} provided in S
    2: Cholesky factorization, {\mathbf  {U}} provided in S (i.e., factorization of {\mathbf  {S}} performed by the user)
    3: preconditioning, requires {\mathbf  {S}} and, optionally, {\mathbf  {T}}
  • np (input) INTEGER
    Total number of spin+k points for a single SCF iteration (i.e., the number of spins times the number of k points).
  • ip (input) INTEGER
    Spin+k point identifier (must be between 1 and np).
  • cg_tol (input) DOUBLE PRECISION
    Tolerance for the minimization. The stopping criterion is for the relative energy difference between subsequent line searches 2\left({\tilde  {E}}\left[{\mathbf  {C}}^{{\mathrm  {new}}}\right]-{\tilde  {E}}\left[{\mathbf  {C}}\right]\right)/\left({\tilde  {E}}\left[{\mathbf  {C}}^{{\mathrm  {new}}}\right]+{\tilde  {E}}\left[{\mathbf  {C}}\right]\right) to be smaller than or equal to this value. If negative, the default value of 1.0d-9 is used.
  • long_out (input) LOGICAL
    T: Verbose output.
    F: Standard output.
  • dealloc (input) LOGICAL
    T: Deallocate all internal matrices. Typically should only be selected for the very last call to omm.
    F: Internal matrices are saved and the data is reused for the next call to omm.
  • m_storage (input) CHARACTER*5
    MatrixSwitch storage format to use for internal matrices. See the list of available formats.
  • m_operation (input) CHARACTER*3
    Implementation of the MatrixSwitch operations to use. See the list of available implementations.

Wrappers/C interface

The purpose of the wrappers is to be able to call omm without using the MatrixSwitch Fortran module and the type matrix defined within in. This makes it straightforward to call the wrappers from C. Note that libOMM should be compiled with the -DCBIND flag when calling from C; this will ensure that LOGICAL inputs are compatible with _Bool types in C.

Each wrapper provides an interface for a particular combination of a matrix storage format and an implementation of the matrix operations. These are specified by the 5- and 3-character codes in the name of the wrapper.


subroutine omm_sdden_ref( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )

This wrapper uses a simple dense real matrix storage (serial distribution) and MatrixSwitch reference matrix operations. The interface is identical to omm with the following exceptions (note that m_storage, m_operation are no longer present):

  • H_vals (input/output) DOUBLE PRECISION array, dimension (m,m)
    {\mathbf  {H}} stored as a two-dimensional array.
  • S_present (input) LOGICAL
    T: {\mathbf  {S}} is present in the calculation.
    F: {\mathbf  {S}} is not present in the calculation. Note that S_vals must nevertheless be provided although it will not be referenced.
  • S_vals (input/output) DOUBLE PRECISION array, dimension (m,m)
    {\mathbf  {S}} stored as a two-dimensional array.
  • D_min_vals (input/output) DOUBLE PRECISION array, dimension (m,m)
    The density matrix stored as a two-dimensional array.
  • C_min_vals (input/output) DOUBLE PRECISION array, dimension (n,m)
    {\mathbf  {C}} stored as a two-dimensional array.
  • T_present (input) LOGICAL
    T: {\mathbf  {T}} is present in the calculation.
    F: {\mathbf  {T}} is not present in the calculation. Note that T_vals must nevertheless be provided although it will not be referenced.
  • T_vals (input/output) DOUBLE PRECISION array, dimension (m,m)
    {\mathbf  {T}} stored as a two-dimensional array.

subroutine omm_sdden_lap( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )

This wrapper uses a simple dense real matrix storage (serial distribution) and LAPACK matrix operations. The interface is identical to omm_sdden_ref.


subroutine omm_szden_ref( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )

This wrapper uses a simple dense complex matrix storage (serial distribution) and MatrixSwitch reference matrix operations. The interface is identical to omm_sdden_ref, except for the fact that H_vals, S_vals, D_min_vals, C_min_vals, T_vals are COMPLEX*16 arrays.


subroutine omm_szden_lap( m, n, H_vals, S_present, S_vals, new_S, e_min, D_min_vals, calc_ED, eta, C_min_vals, init_C, T_present, T_vals, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank )

This wrapper uses a simple dense complex matrix storage (serial distribution) and LAPACK matrix operations. The interface is identical to omm_sdden_ref, except for the fact that H_vals, S_vals, D_min_vals, C_min_vals, T_vals are COMPLEX*16 arrays.


subroutine omm_pddbc_lap( m, n, H_dim, H_vals, desc_H, S_present, S_dim, S_vals, desc_S, new_S, e_min, D_min_dim, D_min_vals, desc_D_min, calc_ED, eta, C_min_dim, C_min_vals, desc_C_min, init_C, T_present, T_dim, T_vals, desc_T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank, mpi_size, nprow, order, bs_def, icontxt )

This wrapper uses a dense block cyclic real matrix storage (parallel distribution) and ScaLAPACK matrix operations. The interface is identical to omm with the following exceptions (note that m_storage, m_operation are no longer present):

  • H_dim (input) INTEGER array, dimension (2)
    Dimensions of H_vals.
  • H_vals (input/output) DOUBLE PRECISION array, dimension (H_dim(1),H_dim(2))
    Local elements of {\mathbf  {H}} stored as a two-dimensional array. The size of the global array is (m,m).
  • desc_H (input) INTEGER array, dimension (9)
    BLACS array descriptor for H_vals.
  • S_present (input) LOGICAL
    T: {\mathbf  {S}} is present in the calculation.
    F: {\mathbf  {S}} is not present in the calculation. Note that S_dim, S_vals, desc_S must nevertheless be provided although they will not be referenced (in this case, it is suggested to use a dummy array of size (1,1)).
  • S_dim (input) INTEGER array, dimension (2)
    Dimensions of S_vals.
  • S_vals (input/output) DOUBLE PRECISION array, dimension (S_dim(1),S_dim(2))
    Local elements of {\mathbf  {S}} stored as a two-dimensional array. The size of the global array is (m,m).
  • desc_S (input) INTEGER array, dimension (9)
    BLACS array descriptor for S_vals.
  • D_min_dim (input) INTEGER array, dimension (2)
    Dimensions of D_min_vals.
  • D_min_vals (input/output) DOUBLE PRECISION array, dimension (D_min_dim(1),D_min_dim(2))
    Local elements of the density matrix stored as a two-dimensional array. The size of the global array is (m,m).
  • desc_D_min (input) INTEGER array, dimension (9)
    BLACS array descriptor for D_min_vals.
  • C_min_dim (input) INTEGER array, dimension (2)
    Dimensions of C_min_vals.
  • C_min_vals (input/output) DOUBLE PRECISION array, dimension (C_min_dim(1),C_min_dim(2))
    Local elements of {\mathbf  {C}} stored as a two-dimensional array. The size of the global array is (n,m).
  • desc_C_min (input) INTEGER array, dimension (9)
    BLACS array descriptor for C_min_vals.
  • T_present (input) LOGICAL
    T: {\mathbf  {T}} is present in the calculation.
    F: {\mathbf  {T}} is not present in the calculation. Note that T_dim, T_vals, desc_T must nevertheless be provided although they will not be referenced (in this case, it is suggested to use a dummy array of size (1,1)).
  • T_dim (input) INTEGER array, dimension (2)
    Dimensions of T_vals.
  • T_vals (input/output) DOUBLE PRECISION array, dimension (T_dim(1),T_dim(2))
    Local elements of {\mathbf  {T}} stored as a two-dimensional array. The size of the global array is (m,m).
  • desc_T (input) INTEGER array, dimension (9)
    BLACS array descriptor for T_vals.
  • mpi_size (input) INTEGER
    The total number of MPI processes for the BLACS process grid.
  • nprow (input) INTEGER
    Row dimension of the process grid (has to be a divisor of mpi_size).
  • order (input) CHARACTER*1
    Ordering of the process grid:
    c/C: column-major ordering
    r/R/other: row-major ordering
  • bs_def (input) INTEGER
    Block size to use when allocating p?dbc matrices.
  • icontxt (input) INTEGER
    BLACS context handle.

subroutine omm_pzdbc_lap( m, n, H_dim, H_vals, desc_H, S_present, S_dim, S_vals, desc_S, new_S, e_min, D_min_dim, D_min_vals, desc_D_min, calc_ED, eta, C_min_dim, C_min_vals, desc_C_min, init_C, T_present, T_dim, T_vals, desc_T, scale_T, flavour, np, ip, cg_tol, long_out, dealloc, mpi_rank, mpi_size, nprow, order, bs_def, icontxt )

This wrapper uses a dense block cyclic complex matrix storage (parallel distribution) and ScaLAPACK matrix operations. The interface is identical to omm_pddbc_lap, except for the fact that H_vals, S_vals, D_min_vals, C_min_vals, T_vals are COMPLEX*16 arrays.


References

  1. 1.0 1.1 F. Mauri, G. Galli, and R. Car, Orbital formulation for electronic-structure calculations with linear system-size scaling, Phys. Rev. B 47, 9973 (1993). DOI: 10.1103/PhysRevB.47.9973
  2. 2.0 2.1 F. Mauri and G. Galli, Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling, Phys. Rev. B 50, 4316 (1994). DOI: 10.1103/PhysRevB.50.4316
  3. 3.0 3.1 P. Ordejón, D. A. Drabold, M. P. Grumbach, and R. M. Martin, Unconstrained minimization approach for electronic computations that scales linearly with system size, Phys. Rev. B 48, 14646 (1993). DOI: 10.1103/PhysRevB.48.14646
  4. 4.0 4.1 P. Ordejón, D. A. Drabold, R. M. Martin, and M. P. Grumbach, Linear system-size scaling methods for electronic-structure calculations, Phys. Rev. B 51, 1456 (1995). DOI: 10.1103/PhysRevB.51.1456
  5. 5.0 5.1 5.2 5.3 5.4 5.5 F. Corsetti, The orbital minimization method for electronic structure calculations with finite-range atomic basis sets, Comput. Phys. Commun. 185, 873 (2014). DOI: 10.1016/j.cpc.2013.12.008
  6. 6.0 6.1 C. Gan, P. Haynes, and M. Payne, Preconditioned conjugate gradient method for the sparse generalized eigenvalue problem in electronic structure calculations, Comput. Phys. Comm. 134, 33 (2001). DOI: 10.1016/S0010-4655(00)00188-0
  7. B. G. Pfrommer, J. Demmel, and H. Simon, Unconstrained energy functionals for electronic structure calculations, J. Comput. Phys. 150, 287 (1999). DOI: 10.1006/jcph.1998.6181
  8. 8.0 8.1 J. Kim, F. Mauri, and G. Galli, Total-energy global optimizations using nonorthogonal localized orbitals, Phys. Rev. B 52, 1640 (1995). DOI: 10.1103/PhysRevB.52.1640
  9. D. Bowler and T. Miyazaki, {{\mathcal  O}}(N) methods in electronic structure calculations, Rep. Prog. Phys. 75, 036503 (2012). DOI: 10.1088/0034-4885/75/3/036503