# PEXSI

The **P**ole **EX**pansion and **S**elected **I**nversion method **PEXSI**
is a fast method for evaluating certain selected elements
of a matrix function and can be used for solving the KS-equation without explicit diagonalization.
PEXSI is particularly advantageous for large systems since it reduces the computational effort to

- for 3D systems
- for (quasi-) 2D systems
- for (quasi-) 1D systems

Furthermore it is highly scalable on distributed memory parallel machines.

The **Selected Inversion** by itself can be used for calculating specific elements of the inverse of a sparse matrix.

Most of the following documentation originates from the PEXSI-homepage.

## Contents

## Theory

### Basics

Given a sparse square matrix and a certain function , the basic idea of PEXSI is to expand using a small number of rational functions (pole expansion)

and to efficiently evaluate by evaluating selected elements (selected inversion).

The currently supported form of includes:

- : Matrix inversion. Since the matrix inversion is already represented as a single term of rational function (pole), no pole expansion is needed. The selected inversion method can be significantly faster than directly inverting the matrix and then extract the selected elements of the inverse.

- : Fermi-Dirac function. This can be used as a "smeared" matrix sign function at , without assuming a spectral gap near . This can be used for evaluating the electron density for electronic structure calculation.

For sparse matrices, the PEXSI method can be more efficient than the widely used diagonalization method for evaluating matrix functions, especially when a relatively large number of eigenpairs are needed to be computed in the diagonalization method.

### Application to electronic structure calculations

PEXSI can be used to solve the generalized eigenvalue problem that has to be solved in each SCF-iteration of the KS-DFT formalism without explicit diagonalization. Instead the density matrix is computed directly. Other magnitudes (free energy, atomic forces, local density of states) are given by the same formalism with negligible effort. The PEXSI framework can also be used for calculating the total density of states very efficiently.

With the pole expansion of the Fermi function

the density matrix (in case of a non-orthogonal basis) becomes

The chemical potential has to fulfill

with the total number of electrons and the electron density in real space.

For finding PEXSI implements a hybrid scheme, combining a bisection and a Newton method. This allows determining in a few iterations, even when starting with a wide guess and for difficult cases. Systems with a gap are naturally even easier to solve. If there is a good guess for available, for example from a previous SCF iteration, the bisection can be turned off.

The PEXSI library provides two interfaces. A simple one, finding many of the parameters automatically, and another one offering extensive control of the behavior of the solver.

An extensive description of the implementation of PEXSI in the numerical atomic orbital code SIESTA can be found in ^{[1]}

### Parallelization

Besides the favorable weak scaling, PEXSI can also scale to large numbers of processors due to its two-level parallelism:

- For accurate evaluation, the pole expansion usually takes 40-60 poles. The calculations on each of these poles are completely independent of each other, enabling an efficient parallelization on this level.
- The calculations on each pole can also be parallelized, with the number of processors that can be used efficiently strongly depending on the problem size, the matrix sparsity, and the sparsity pattern. Typically tens up to thousands of processors per pole are used. An extensive description of the selected inversion and its parallelization can be found in
^{[2]}

Large examples need a minimum amount of processors to have enough memory available. Typically this number is much smaller than with dense direct methods like ScaLAPACK.

## Publications

- Implementation of PEXSI into SIESTA
^{[1]} - PEXSI
^{[3]} - (Parallel) Selected Inversion
^{[2]}^{[4]}^{[5]}^{[6]} - Pole Expansion
^{[7]}

## License

PEXSI is distributed under BSD license (modified by Lawrence Berkeley National Laboratory).

## Download

A source tarball can be downloaded directly from here.

## Installation

### Prerequisites

- LAPACK for basic linear algebra operations
- MPI
- SuperLU-Dist (download)
- ParMetis (download) or PT-Scotch (download)

PEXSI requires an external parallel factorization or factorization routine, and an external parallel matrix reordering routine to reduce the fill-in of the factorization routine.

Currently we use SuperLU_DIST for the parallel factorization, and ParMETIS for the parallel fill-in reducing reordering. It is also possible to use PT-Scotch for the reordering. But we recommend to first download ParMETIS.

### Instructions

**Edit make.inc**

Configuration of PEXSI is controlled by a single `make.inc` file. Examples of the `make.inc` file are given under the `config/` directory.

Find `make.inc` with the most similar architecture, and copy to the main PEXSI directory (using Edison for example, the latest Intel computer at NERSC). `${PEXSI_DIR}` stands for the main directory of PEXSI.

cd ${PEXSI_DIR} cp config/make.inc.edison.intel make.inc

Edit the variables in `make.inc`

PEXSI_DIR = Main directory for PEXSI DSUPERLU_DIR = Main directory for SuperLU_DIST METIS_DIR = Main directory for METIS PARMETIS_DIR = Main directory for ParMETIS PTSCOTCH_DIR = Main directory for PT-Scotch

Note:

PEXSI can be compiled using debug or release mode in by the variable `COMPILE_MODE` in `make.inc`. This variable mainly controls the compiling flag `-DRELEASE`. The debug mode introduces tracing of call stacks at all levels of functions, and may significantly slow down the code. For production runs, use release mode.
The `*.profile` configuration files are for debugging purpose and can be ignored.

**Build the PEXSI library**

If `make.inc` is configured correctly,

cd ${PEXSI_DIR} cd src make

should produce `libpexsi_(suffix).a` under `src/`.

### Tests

**Build examples**

After `libpexsi_(suffix).a` is built, all driver routines are readily to be compiled. For example, the selected inversion for a complex matrix has the test routine

cd ${PEXSI_DIR} cd examples make driver_pselinv_complex

should produce `driver_pselinv_complex`, which can be executed with MPI.

For more information on the examples, see Tutorial.

**Run tests**

After `driver_pselinv_complex` is compiled,

examples$ mpirun -n 1 ./driver_pselinv_complex

should return the diagonal of the matrix

saved on the 0-th processor, where A is the five-point discretization of a Laplacian operator on a 2D domain. The result can be compared with `examples/driver_pselinv_complex.out` to check the correctness of the result.

## Programming interface

The full documentation of the C/C++ and Fortran interfaces can be found on the project's page:

## Further information

The **Homepage** contains:

- Introduction
- Download
- Installation
- Tutorial
- Core Functionality
- Frequently asked questions
- Troubleshooting

## Future developments

- Generalization for k-points
- MPI-OpenMP hybrid version
- Port to Intel MIC

## References

- ↑
^{1.0}^{1.1}L. Lin, A. Garcia, G. Huhs and C. Yang, SIESTA-PEXSI: Massively parallel method for efficient and accurate ab initio materials simulation without matrix diagonalization, arXiv - ↑
^{2.0}^{2.1}M. Jacquelin, L. Lin and C. Yang, PSelInv -- A Distributed Memory Parallel Algorithm for Selected Inversion : the Symmetric Case arXiv - ↑ L. Lin, M. Chen, C. Yang and L. He, Accelerating atomic orbital-based electronic structure calculation via pole expansion and selected inversion, J. Phys. Condens. Matter 25, 295501, 2013 journal
- ↑ L. Lin, C. Yang, J. Meza, J. Lu, L. Ying and W. E, SelInv -- An algorithm for selected inversion of a sparse symmetric matrix, ACM Trans. Math. Software 37, 40, 2011 journal
- ↑ L. Lin, C. Yang, J. Lu, L. Ying and W. E, A Fast Parallel algorithm for selected inversion of structured sparse matrices with application to 2D electronic structure calculations, SIAM J. Sci. Comput. 33, 1329, 2011 journal
- ↑ L. Lin, J. Lu, L. Ying, R. Car and W. E, Fast algorithm for extracting the diagonal of the inverse matrix with application to the electronic structure analysis of metallic systems, Commun. Math. Sci. 7, 755, 2009 journal
- ↑ L. Lin, J. Lu, L. Ying and W. E, Pole-based approximation of the Fermi-Dirac function, Chin. Ann. Math. 30B, 729, 2009 journal