ESCDF - Electronic Structure Common Data Format
Data representation standards based on HDF5 for large data sets (wave-functions, expansion coefficients, electron density, local density of states, etc.).
Tools for the simpler reading and writing of these representations are coded in the library libescdf, which is under development.
Both the standards and the tools originate from a first standardisation effort done in the European Nanoquanta Network of Excellence, which resulted in the ETSF File Format Specifications.
The main objectives of this data format are the following:
- enable a platform-independent exchange of data between electronic-structure programs;
- provide specifications which are both flexible and suitable for High-Performance Computing (HPC);
- hide the gory details of the I/O implementation, in particular the way parallelism is handled;
- facilitate, strengthen and extend interdisciplinary collaborations within and without the electronic-structure community.
The current version of the format is intended to support the following use cases:
- restarting a calculation;
- exchanging data between 2 codes in a multi-step calculation;
The current file format version number is 0.1.
The ESCDF file format is based on HDF5. An HDF5 file contains mainly two types of objects: groups and datasets (see the HDF5 documentation for further details). The groups are arranged in a way that is similar to a file system. The root group of an HDF5 file is denoted as "/", while "/foo" refers to a group named "foo" that is stored in the file root group.
The data stored following the ESCDF specifications should be stored within one group, which we will hereafter refer to as the ESCDF root group. Quite often this will be the actual root group of the HDF5 file, but it can also be any other group within the file. This means that a given HDF5 file might contain more than one ESCDF root group, thus storing information about several independent systems/calculations.
The contents of the ESCDF root group are further divided into global variables and groups. The global variables are used for a general description of the file, mainly the file format convention, while the groups contain the actual data. The names of the allowed groups are the following:
As suggested by their names, the groups are used to store different types of data. A detailed description of the global variables and the groups can be found in the following sections.
The simplest use case consist in storing the information generated by a single calculation for a given system, but one might also want to store information about several calculation and/or systems in the same file. Here is how the file structure could look like for differenct use cases:
- One system, one calculation:
/system /densities /basis_sets /basis_sets/foo /basis_sets/bar
In this case the file contains the description of one system and the data of one density, but two basis sets named foo and bar.
- Many systems, one calculation:
/system/foo /system/bar /densities
In this case the file contains the description of two systems named foo and bar, but only one density. This could correspond to a case where the total density of two systems is obtained from a single calculation.
- Many systems, one calculation per system:
/id1/system /id1/basis_sets /id1/densities /id2/system /id2/basis_sets /id2/densities
In this case the file contains two ESCDF root groups named id1 and id2. Each one of these groups contains the description of one system, one basis set and the data of one density.
The ESCDF root group must have the following attributes:
The ESCDF root group may also have the following optional attributes:
Global attributes provide general information on the file format being used, as well as the contents and history of the file.
file_format: char(80) (always
The name of the data standard.
file_format_version: float (e.g.,
Version number for the data standard.
Conventions: char(80) (e.g.,
Where the data standard specifications can be found on the Internet.
Each code modifying/writing the file is encouraged to add a line about itself in the history attribute. char(1024) allows for 12 additions of at most 80 characters.
A short description of the contents of the file (i.e., the physical system).
HDF5 does not support a boolean datatype. Flag-like variables should be stored as char(3), with allowed values
no. When such attributes are written, they should be written in full length and small letters. When they are read, only the first character needs to be checked (i.e.,
Dimensional variables (physical units)
Most variables are dimensionless. If a variable does have physical dimensions, the default is to use Hartree atomic units. However, different units can be specified by including two optional attributes to the variable (if it is a dataset):
The appropriate value in atomic units is obtained by multiplying the number found in the variable by this scaling factor. For example, if an energy variable is recorded in eV, scale_to_atomic_units should be set to
The name or definition of the units being used. This attribute is only used for informative purposes; only scale_to_atomic_units should be used to read the file.
- Specifications for the systems
- Specifications for the basis sets
- Specifications for the densities
- Specifications for the potentials
- Specifications for the states
- Specifications for the extensions
Links of interest
The following links constitute useful inspirations for the development of the ESCDF specifications, API, and library: