Current version of the UPF specification: 2.0.1
Codes able to read the UPF format: Quantum
Espresso,
abinit, gpaw
(incomplete list?)
Pseudopotential generators able to produce UPF files: Quantum
Espresso (incomplete list?)
Official documentation of the UPF specification by P. Giannozzi can be
found
here
The following is an introduction to the latest version of UPF, stable since 2010. It is important to note that all ultrasoft pseudopotentials with UPF version 2.0.0 might be compromised due to a bug that is fixed in later versions.
UPF is designed to store majority of the element-specific information used in Quantum Espresso (QE) suite. This includes norm-conserving and ultrasoft pseudopotentials, PAW datasets, 1/r potentials, core electron wavefunctions needed for reconstruction of all-electron density used in ab initio magnetic resonance or X-ray absorption calculations.
Coherent with the rest of Quantum Espresso, all quantities are in atomic Rydberg units (e^2^=2, m=1/2, hbar=1). Lengths are in Bohr radius and energies are in Ry. In UPF all potentials are multiplied with e to have the unit of energy.
Originally UPF files were designed to be compatible with the IOTK (input/output tool kit) library, a FORTRAN90 library that is used to read/write text and binary files in QE. IOTK processes tagged data files where each data block is wrapped within a tag. This main idea is shared with several markup languages such as XML, however IOTK introduces some differences that are responsible for the "XML-like-but-not-quite" structure of UPF. An example to such differences is the enumeration of elements in UPF that does not comply with the extensive nature of XML: <PP_BETA.1> <PP_BETA.2> etc.
An XML-tutorial can be found here. The following summarises some specific syntax rules followed in UPF:
<FOO>
...
</FOO>
or for an empty element
<FOO/>
Element names are case-sensitive as in XML, and trailing characters in the line after > must be ignored.
<UPF version ="2.0.1">
...
</UPF>
Here is a list of all elements and subelements that can occur in UPF (as written by QE).
This is the human-readable part of the UPF file that is not designed for parsing. It is an optional element of UPF. It usually provides information on the following
The header element is a special one in UPF: It is a complex element that is an empty element only made up of attributes. As attributes, it contains some fundamental information of the pseudopotential file that can also be found in the human-readable section. The values of these attributes affect the reading of other elements in the file, hence <PP_HEADER>
should be processed before the rest of the parseable UPF. Most of the attributes are self-explanatory yet information on some of them might be useful:
'''Comparison with PAW-XML (PX) '''
Parsing UPF with <PP_HEADER> information
Attributes in this element determine the reading and existence of other elements of the same level in UPF, which makes this element particularly important for parsing operation. For example in QE
Since UPF is designed primarily for spherically symmetric data, be it atomic potential or wavefunction, it assumes the use of a radial grid for such data elements. UPF accepts only one radial grid in a file and all radial functions in the file is assumed to be represented on this grid. Therefore one of the first elements that has to be written and read in UPF is the definition of this radial grid. In UPF, this information is stored in <PP_MESH> element which must follow the <PP_HEADER> element.
<PP_MESH dx="dx" mesh="m" xmin="xmin" rmax="rmax" zmesh="Z">
<PP_R type="real" size="m" columns="c">
r(1) r(2) ... r(m)
</PP_R>
<PP_RAB type="real" size="m" columns="c">
rab(1) rab(2) ... rab(m)
</PP_RAB>
</PP_MESH>
Analytic expression
<PP_MESH> is a complex element that only contains attributes and other elements but does not contain any text or data. It has following attributes: "dx" "mesh" "xmin" "rmax" "zmesh". As mentioned, "zmesh" stands for the atomic number the radial mesh is prepared for (although intrinsic type is real) and "mesh" stands for the mesh size, i.e. number of points in the mesh, whose default is set using the value in the <PP_HEADER>'s same name attribute, rmax is the value of grid at the maximum grid index, ie r(mesh)=rmax. These attributes allow regeneration of the mesh data using the relation $r_i = 1/Z e^{xmin}e^{(i-1)dx}$ and its derivative (volume element of integration) $dr_i/di = r_i~dx$, although this information is not strictly needed because, as we will see, the numerical grid is also given in UPF.
Numeric grid
The values of grid points and derivatives are given in the following two sub-elements <PP_R> and <PP_RAB>. In Quantum Espresso, these numerical values are the only ones that are used, while the regeneration information is discarded. Hence UPF format can support grids that originate from other analytical expressions as long as the r and dr/di values are consistently given in <PP_R> and <PP_RAB> elements respectively. For example the shifted grid that involves the r=0 point as the first element that is widely used in pseudopotential generation community can also be represented correctly in UPF as many other analytic expressions. The exception to this are the "atomic" code within QE, in the case of importing a pre-generated UPF file for testing reasons, and GIPAW code due to shared routines with atomic code. These packages, instead, use the analytical formula described above for regeneration of the grid.
Subelements
<PP_R> and <PP_RAB> are the first "data elements" that are found in a UPF file. As required by data-element format, if present, "size" must be consistent with the data enclosed in the element, i.e. size="mesh" attribute of the <PP_MESH> parent element. See related segment in general format description to learn about the data-elements.
In the case that <PP_HEADER> has attribute core_correction="true", <PP_NLCC> element must be present for read-in. PP_NLCC holds the radial core charge density, possibly pseudized, that can be used to calculate the non-linear core correction. The core charge density contained here, rho_core(r), is such that Integral(4*PI*rho_core(r) dr ) is equal to the total core charge without further modifications. Note that the core charge density mentioned here does not have to be the same as all-electron core charge density, because a non-linear core correction scheme could as well be implemented with pseudo core charge density.
This is a data-element of UPF hence complies with the data-element format found here. If present, size attribute must match the mesh size consistently with the rest of the radial functions in the UPF file, as all radial functions in radial grid should comply with the "mesh" attribute of the PP_MESH element defined earlier.
<PP_NLCC type="real" size="m" columns="c">
rho_core(1) rho_core(2) ... rho_core(m)
</PP_NLCC>
Theory
Generation of a pseudopotential requires solving the atomic hamiltonian for an all-electron system and then unscreening the valance electrons from the determined atomic potential, such that the valance electrons can be treated as external electrons to the pseudo-ion used in the calculation. This unscreening, with a linearity approximation, can be performed by removing the hartree and exchange potential of the valance electrons from the total potential:
$V_{PSion} = V_{atom} - V_H(\rho_{val}) - V_{xc}(\rho_{val})$.
However, since the XC potential is not linear in density, i.e.
$V_{xc}(\rho_{val} + \rho_{core} ) != V_{xc}( \rho_{val} ) + V_{xc}( \rho_{core} )$
the equation above introduces errors in unscreening. A proposed solution by Loui et al is to unscreen the XC due to all electrons, and calculate it again, considering both core and valance electrons, in run-time:
$V_{PSion} = V_{atom} - V_H(\rho_{val}) - V_{xc}(\rho_{val} + \rho_{core} )$.
Hence one needs to store the core charge after the pseudopotential generation, and calculate the XC due to total charge ($\rho_{val} + \rho_{core}$) in calculations with non-linear core correction. Therefore during the calculation the core charge stored in radial grid must be converted to the same grid of the valance electrons. This could introduce an additional computational cost for plane-wave codes because core charge can have high Fourier components close to the nucleus. This issue can be circumvented considering that close to nucleus there is often little overlap between core and valance densities. And in practice, for local and semi-local functionals, the error corrected by non-linear core correction is highest at the regions where core and valance charge densities overlap. Hence for an approximate correction one does not necessarily need the exact core charge, but just a smooth charge density that matches the accurate one in the regions of overlap might suffice.