UPF

Introduction

Current version of the UPF specification: 2.0.1
Codes able to read the UPF format: Quantum Espresso, abinit, gpaw (incomplete list?)
Pseudopotential generators able to produce UPF files: Quantum Espresso (incomplete list?)
Official documentation of the UPF specification by P. Giannozzi can be found here

The following is an introduction to the latest version of UPF, stable since 2010. It is important to note that all ultrasoft pseudopotentials with UPF version 2.0.0 might be compromised due to a bug that is fixed in later versions.

Content

UPF is designed to store majority of the element-specific information used in Quantum Espresso (QE) suite. This includes norm-conserving and ultrasoft pseudopotentials, PAW datasets, 1/r potentials, core electron wavefunctions needed for reconstruction of all-electron density used in ab initio magnetic resonance or X-ray absorption calculations.

Units

Coherent with the rest of Quantum Espresso, all quantities are in atomic Rydberg units (e^2^=2, m=1/2, hbar=1). Lengths are in Bohr radius and energies are in Ry. In UPF all potentials are multiplied with e to have the unit of energy.

Format

Originally UPF files were designed to be compatible with the IOTK (input/output tool kit) library, a FORTRAN90 library that is used to read/write text and binary files in QE. IOTK processes tagged data files where each data block is wrapped within a tag. This main idea is shared with several markup languages such as XML, however IOTK introduces some differences that are responsible for the "XML-like-but-not-quite" structure of UPF. An example to such differences is the enumeration of elements in UPF that does not comply with the extensive nature of XML: <PP_BETA.1> <PP_BETA.2> etc.

An XML-tutorial can be found here. The following summarises some specific syntax rules followed in UPF:

An element FOO starts with , ends with a line containing :

<FOO>
...
</FOO>

or for an empty element

<FOO/>

Element names are case-sensitive as in XML, and trailing characters in the line after > must be ignored.

Elements may have attributes. Some of the attributes such as type and size are used to check whether the enclosed data is consistent with the attribute value. Although closing tags are not allowed to have attributes, empty element tags are.
The root element has to be UPF and must contain version attribute:

<UPF version ="2.0.1">
 ...
</UPF>

Data elements: UPF has special "data elements" that have the following optional attributes: "type" for intrinsic type of the data, "size" for the size of data, "columns" for simple formatting, "len" for strings, "fmt" for format specifier of Fortran. Among these, if present, "type" and "size" must be consistent with the data enclosed in the element. In case no type attribute is provided, Quantum Espresso default reader of UPF format (IOTK library) enforces the expected intrinsic type. For example data entry "1" is read as 1.0dp if the internally assigned variable is of DP kind. IOTK also discards the columns and fmt attributes in reading.
Although not XML, UPF format can be validated against a schema definition. Efforts towards building an XSD were initiated by L. Talirz and is continued here.

Specification of the elements

Here is a list of all elements and subelements that can occur in UPF (as written by QE).

PP_INFO

This is the human-readable part of the UPF file that is not designed for parsing. It is an optional element of UPF. It usually provides information on the following

Generating software and version
Author
Generation date
Pseudopotential type: Allowed types are NC, US, PAW
Element label (only 2 characters allowed)
Exchange correlation functional used to generate
Suggested minimum cut-off for wavefunctions and density. In QE this information is obtained from Bessel function expansion during pseudopotential generation.
Relativistic treatment: Allowed are non-, scalar-, fully-relativistic treatments
Generation of an effective local ionic potential prior to unscreening of the valance electrons : In the standard pseudopotential scheme, pseudopotential can be seen as a combination of two segments: a local part and a non-local part with angular momentum resolution. Ideally local part should be selected such that it should reproduce the scattering behavior of all angular momentum channels higher than the ones included in non-local part within the pseudization region. One common strategy is then choosing the local potential directly as the potential of lmax+1. Alternatively one can choose a smoothed all-electron potential. In UPF language lloc -1 or -2 correspond to two different recipes of smoothing the all-electron potential for this part. For lloc >0 one uses the potential of that l-channel as the local one. The matching radius is also given. In the PAW case, this potential corresponds to Kresse-Joubert reformulated zero-potential. Note that these are the most common strategies of obtaining a local potential however UPF can also store a local potential generated with other recipes.
Whether pseudopotential has data regarding spin-orbit calculations (SO) or all-electron reconstruction used in magnetic resonance or X-ray absorption calculations (a.k.a. GIPAW reconstruction)
Valance and generation configurations: The local potential prior to unscreening is generated from "Generation configuration", and the unscreening of it is performed by the "Valance configuration." In case the two differs, which may be preferable with single projector pseudopotentials with semi-core states, the pseudization recipe is also given.
Further comments of the author
Pseudopotential generation file that is used to generate, if available, can be found here in a sub-element: <PP_INPUTFILE>

PP_HEADER

The header element is a special one in UPF: It is a complex element that is an empty element only made up of attributes. As attributes, it contains some fundamental information of the pseudopotential file that can also be found in the human-readable section. The values of these attributes affect the reading of other elements in the file, hence <PP_HEADER> should be processed before the rest of the parseable UPF. Most of the attributes are self-explanatory yet information on some of them might be useful:

is_coulomb: boolean. UPF can also hold the bare coulomb potential (1/r) instead of the pseudopotential.
has_wfc: whether the all-electron partial waves corresponding to each projector is written.
paw_as_gipaw: boolean. Whether a different dataset is used for the GIPAW reconstruction or PAW one is used instead.
lmax: maximum angular momentum in the pseudopotential; or for PAW, maximum l of the projectors used. While lmax_rho: maximum l contribution to density. In principle, for standard cases, lmax_rho should be twice the lmax.

'''Comparison with PAW-XML (PX) '''

<PP_HEADER> contains "element" and "z_valence" attributes corresponding to "symbol" and "valance" attributes of element in PX. Unlike PX, the number of core charge is not necessarily included in UPF, however the atomic number Z can be found as "zmesh" attribute in <PP_MESH> element. At first glance having the atomic number in the mesh element seems counterintuitive. The reason is UPF contains the result of an atomic calculation done on radial logarithmic mesh and the mesh interval is often a function of the atomic number, as higher atomic numbers would require a finer resolution in space.
Functional names in PX, <xc_functional>, is taken from LibXC while UPF uses the QE internal names or numbers for it.
element in PX has "type" attribute corresponding to "relativistic" attribute of <PP_HEADER>, and unfortunately the pseudo_type attribute of UPF refers to NC/US/PAW/Coulomb types. Since PX only contains PAW information, this attribute in <PP_HEADER> has no correspondence in PX. Although note that the same information is also covered in "is_paw" "is_ultrasoft" "is_coulomb" boolean attributes of <PP_HEADER> as well, resulting in redundancy. However Quantum Espresso only makes use of these booleans, hence the correct writing/parsing of "pseudo_type" is not critical for operability in QE, while the ones of booleans is essential. The only exception to this rule is the obsolete semilocal norm-conserving pseudopotentials. They are distinguished thanks to "SL" value of "pseudo_type" attribute in <PP_HEADER>.
PX and <core_energy> elements do not necessarily have correspondence in UPF. Only the total energy of the pseudoatom is always given in "total_psenergy" attribute in <PP_HEADER>. In the case of PAW, "core energy" attribute of <PP_PAW> element can be used with pseudoatom energy to obtain the all electron energy.

Parsing UPF with <PP_HEADER> information

Attributes in this element determine the reading and existence of other elements of the same level in UPF, which makes this element particularly important for parsing operation. For example in QE

The default for radial grid mesh size is taken from "mesh" attribute in <PP_HEADER> so that it can be missing from the <PP_MESH> element
The number of projectors in pseudopotential is given in <PP_HEADER> in "number_of_proj" and only this many projectors (PP_BETA) are read form a UPF file in QE. This enumeration is also true for wavefunctions (PP_CHI) in <PP_PSWFC> and "number_of_wfc" attribute.
is_ultrasoft and/or is_paw determine whether <PP_AUGMENTATION> is read within <PP_NONLOCAL> element.
<PP_FULL_WFC> element is only read when "has_wfc" is true.
"has_so" is a boolean attribute for spin-orbit coupling, hence the reading of elements with relativistic contributions (such as <PP_SPIN_ORB>, <PP_AEWFC_REL> ) should be coherent with its value.
PAW-related elements are only read if is_paw attribute has value true. Entire <PP_PAW> and its sub elements is an example to this.
Similarly GIPAW reconstruction information is only read if has_gipaw attribute is true, and a special assignment in reading is made if paw_as_gipaw is also true.
Strictly needed, not-defaulted attributes: "element", "pseudo_type", "relativistic", is_ultrasoft", "is_paw", "core_correction", "functional", "z_valance", "mesh_size", "number_of_wfc", "number_of_proj".

PP_MESH

Since UPF is designed primarily for spherically symmetric data, be it atomic potential or wavefunction, it assumes the use of a radial grid for such data elements. UPF accepts only one radial grid in a file and all radial functions in the file is assumed to be represented on this grid. Therefore one of the first elements that has to be written and read in UPF is the definition of this radial grid. In UPF, this information is stored in <PP_MESH> element which must follow the <PP_HEADER> element.

<PP_MESH dx="dx" mesh="m" xmin="xmin" rmax="rmax" zmesh="Z">
  <PP_R type="real" size="m" columns="c">
     r(1) r(2) ...  r(m)
  </PP_R>
  <PP_RAB type="real" size="m" columns="c">
     rab(1) rab(2) ... rab(m)
  </PP_RAB>
</PP_MESH>

Analytic expression

<PP_MESH> is a complex element that only contains attributes and other elements but does not contain any text or data. It has following attributes: "dx" "mesh" "xmin" "rmax" "zmesh". As mentioned, "zmesh" stands for the atomic number the radial mesh is prepared for (although intrinsic type is real) and "mesh" stands for the mesh size, i.e. number of points in the mesh, whose default is set using the value in the <PP_HEADER>'s same name attribute, rmax is the value of grid at the maximum grid index, ie r(mesh)=rmax. These attributes allow regeneration of the mesh data using the relation $r_i = 1/Z e^{xmin}e^{(i-1)dx}$ and its derivative (volume element of integration) $dr_i/di = r_i~dx$, although this information is not strictly needed because, as we will see, the numerical grid is also given in UPF.

Numeric grid

The values of grid points and derivatives are given in the following two sub-elements <PP_R> and <PP_RAB>. In Quantum Espresso, these numerical values are the only ones that are used, while the regeneration information is discarded. Hence UPF format can support grids that originate from other analytical expressions as long as the r and dr/di values are consistently given in <PP_R> and <PP_RAB> elements respectively. For example the shifted grid that involves the r=0 point as the first element that is widely used in pseudopotential generation community can also be represented correctly in UPF as many other analytic expressions. The exception to this are the "atomic" code within QE, in the case of importing a pre-generated UPF file for testing reasons, and GIPAW code due to shared routines with atomic code. These packages, instead, use the analytical formula described above for regeneration of the grid.

Subelements

<PP_R> and <PP_RAB> are the first "data elements" that are found in a UPF file. As required by data-element format, if present, "size" must be consistent with the data enclosed in the element, i.e. size="mesh" attribute of the <PP_MESH> parent element. See related segment in general format description to learn about the data-elements.

PP_NLCC

In the case that <PP_HEADER> has attribute core_correction="true", <PP_NLCC> element must be present for read-in. PP_NLCC holds the radial core charge density, possibly pseudized, that can be used to calculate the non-linear core correction. The core charge density contained here, rho_core(r), is such that Integral(4*PI*rho_core(r) dr ) is equal to the total core charge without further modifications. Note that the core charge density mentioned here does not have to be the same as all-electron core charge density, because a non-linear core correction scheme could as well be implemented with pseudo core charge density.

This is a data-element of UPF hence complies with the data-element format found here. If present, size attribute must match the mesh size consistently with the rest of the radial functions in the UPF file, as all radial functions in radial grid should comply with the "mesh" attribute of the PP_MESH element defined earlier.

<PP_NLCC type="real" size="m" columns="c">
  rho_core(1) rho_core(2) ... rho_core(m)
</PP_NLCC>

Theory

Generation of a pseudopotential requires solving the atomic hamiltonian for an all-electron system and then unscreening the valance electrons from the determined atomic potential, such that the valance electrons can be treated as external electrons to the pseudo-ion used in the calculation. This unscreening, with a linearity approximation, can be performed by removing the hartree and exchange potential of the valance electrons from the total potential:

$V_{PSion} = V_{atom} - V_H(\rho_{val}) - V_{xc}(\rho_{val})$.

However, since the XC potential is not linear in density, i.e.

$V_{xc}(\rho_{val} + \rho_{core} ) != V_{xc}( \rho_{val} ) + V_{xc}( \rho_{core} )$

the equation above introduces errors in unscreening. A proposed solution by Loui et al is to unscreen the XC due to all electrons, and calculate it again, considering both core and valance electrons, in run-time:

$V_{PSion} = V_{atom} - V_H(\rho_{val}) - V_{xc}(\rho_{val} + \rho_{core} )$.

Hence one needs to store the core charge after the pseudopotential generation, and calculate the XC due to total charge ($\rho_{val} + \rho_{core}$) in calculations with non-linear core correction. Therefore during the calculation the core charge stored in radial grid must be converted to the same grid of the valance electrons. This could introduce an additional computational cost for plane-wave codes because core charge can have high Fourier components close to the nucleus. This issue can be circumvented considering that close to nucleus there is often little overlap between core and valance densities. And in practice, for local and semi-local functionals, the error corrected by non-linear core correction is highest at the regions where core and valance charge densities overlap. Hence for an approximate correction one does not necessarily need the exact core charge, but just a smooth charge density that matches the accurate one in the regions of overlap might suffice.