# Difference between revisions of "PEXSI"

The Pole EXpansion and Selected Inversion method PEXSI is a fast method for evaluating certain selected elements of a matrix function and can be used for solving the KS-equation without explicit diagonalization. PEXSI is particularly advantageous for large systems since it reduces the computational effort to

• ${\mathcal {O}}(N^{2})$ for 3D systems
• ${\mathcal {O}}(N^{3/2})$ for (quasi-) 2D systems
• ${\mathcal {O}}(N)$ for (quasi-) 1D systems

Furthermore it is highly scalable on distributed memory parallel machines.

The Selected Inversion by itself can be used for calculating specific elements of the inverse of a sparse matrix.

Most of the following documentation originates from the PEXSI-homepage.

## Theory

### Basics

Given a sparse square matrix $A$ and a certain function $f(\cdot )$ , the basic idea of PEXSI is to expand $f(A)$ using a small number of rational functions (pole expansion)

$f(A)\approx \sum _{l=1}^{P}\omega _{l}(A-z_{l}I)^{-1},$ and to efficiently evaluate $f(A)_{i,j}$ by evaluating selected elements $(A-z_{l}I)_{i,j}^{-1}$ (selected inversion).

The currently supported form of $f(\cdot )$ includes:

• $f(z)=z^{-1}$ : Matrix inversion. Since the matrix inversion is already represented as a single term of rational function (pole), no pole expansion is needed. The selected inversion method can be significantly faster than directly inverting the matrix and then extract the selected elements of the inverse.
• $f(z)={\frac {2}{1+e^{\beta (z-\mu )}}}$ : Fermi-Dirac function. This can be used as a "smeared" matrix sign function at $z=\mu$ , without assuming a spectral gap near $z=\mu$ . This can be used for evaluating the electron density for electronic structure calculation.

For sparse matrices, the PEXSI method can be more efficient than the widely used diagonalization method for evaluating matrix functions, especially when a relatively large number of eigenpairs are needed to be computed in the diagonalization method.

### Application to electronic structure calculations

PEXSI can be used to solve the generalized eigenvalue problem that has to be solved in each SCF-iteration of the KS-DFT formalism without explicit diagonalization. Instead the density matrix is computed directly. Other magnitudes (free energy, atomic forces, local density of states) are given by the same formalism with negligible effort. The PEXSI framework can also be used for calculating the total density of states very efficiently.

With the pole expansion of the Fermi function

$f(\varepsilon -\mu )\approx \Im \sum _{l=1}^{P}{\frac {\omega _{l}^{\rho }}{\varepsilon -(z_{l}+\mu )}}$ the density matrix (in case of a non-orthogonal basis) becomes

${\rho }_{P}=\Im \left(\sum _{l=1}^{P}{\frac {\omega _{l}}{H-(z_{l}+\mu )S}}\right)$ The chemical potential $\mu$ has to fulfill $\int {\rho }(x)dx=N_{e},$ with $N_{e}$ the total number of electrons and ${\rho }(x)$ the electron density in real space.

For finding $\mu$ PEXSI implements a hybrid scheme, combining a bisection and a Newton method. This allows determining $\mu$ in a few iterations, even when starting with a wide guess and for difficult cases. Systems with a gap are naturally even easier to solve. If there is a good guess for $\mu$ available, for example from a previous SCF iteration, the bisection can be turned off.

The PEXSI library provides two interfaces. A simple one, finding many of the parameters automatically, and another one offering extensive control of the behavior of the solver.

An extensive description of the implementation of PEXSI in the numerical atomic orbital code SIESTA can be found in 

### Parallelization

Besides the favorable weak scaling, PEXSI can also scale to large numbers of processors due to its two-level parallelism:

1. For accurate evaluation, the pole expansion usually takes 40-60 poles. The calculations on each of these poles are completely independent of each other, enabling an efficient parallelization on this level.
2. The calculations on each pole can also be parallelized, with the number of processors that can be used efficiently strongly depending on the problem size, the matrix sparsity, and the sparsity pattern. Typically tens up to thousands of processors per pole are used. An extensive description of the selected inversion and its parallelization can be found in 

Large examples need a minimum amount of processors to have enough memory available. Typically this number is much smaller than with dense direct methods like ScaLAPACK.

## Authors

Library authors

• Lin Lin, University of California - Berkeley and Lawrence Berkeley National Laboratory, Homepage
• Chao Yang, Lawrence Berkeley National Laboratory, Homepage

Responsible for ESL entry

## Publications

• Implementation of PEXSI into SIESTA
• PEXSI 
• (Parallel) Selected Inversion    
• Pole Expansion 

## Installation

### Prerequisites

PEXSI requires an external parallel $LU$ factorization or $LDL^{T}$ factorization routine, and an external parallel matrix reordering routine to reduce the fill-in of the factorization routine.
Currently we use SuperLU_DIST for the parallel $LU$ factorization, and ParMETIS for the parallel fill-in reducing reordering. It is also possible to use PT-Scotch for the reordering. But we recommend to first download ParMETIS.

### Instructions

Edit make.inc

Configuration of PEXSI is controlled by a single make.inc file. Examples of the make.inc file are given under the config/ directory.

Find make.inc with the most similar architecture, and copy to the main PEXSI directory (using Edison for example, the latest Intel computer at NERSC). ${PEXSI_DIR} stands for the main directory of PEXSI.  cd${PEXSI_DIR}
cp config/make.inc.edison.intel make.inc


Edit the variables in make.inc

 PEXSI_DIR     = Main directory for PEXSI
DSUPERLU_DIR  = Main directory for SuperLU_DIST
METIS_DIR     = Main directory for METIS
PARMETIS_DIR  = Main directory for ParMETIS
PTSCOTCH_DIR  = Main directory for PT-Scotch


Note:
PEXSI can be compiled using debug or release mode in by the variable COMPILE_MODE in make.inc. This variable mainly controls the compiling flag -DRELEASE. The debug mode introduces tracing of call stacks at all levels of functions, and may significantly slow down the code. For production runs, use release mode. The *.profile configuration files are for debugging purpose and can be ignored.

Build the PEXSI library

If make.inc is configured correctly,

 cd ${PEXSI_DIR} cd src make  should produce libpexsi_(suffix).a under src/. ### Tests Build examples After libpexsi_(suffix).a is built, all driver routines are readily to be compiled. For example, the selected inversion for a complex matrix has the test routine  cd${PEXSI_DIR}
cd examples
make driver_pselinv_complex


should produce driver_pselinv_complex, which can be executed with MPI.

Run tests

After driver_pselinv_complex is compiled,

 examples\$ mpirun -n 1 ./driver_pselinv_complex


should return the diagonal of the matrix

$(A+iI)^{-1}$ saved on the 0-th processor, where A is the five-point discretization of a Laplacian operator on a 2D domain. The result can be compared with examples/driver_pselinv_complex.out to check the correctness of the result.

## Programming interface

The full documentation of the C/C++ and Fortran interfaces can be found on the project's page:

## Further information

The Homepage contains:

## Future developments

• Generalization for k-points
• MPI-OpenMP hybrid version
• Port to Intel MIC