ETSF_IO

One of the objectives of the European Theoretical Spectroscopy Facility is to specify file formats for the contents that are relevant to the scientific activity of its constituting nodes. The present document describes detailed NetCDF specifications, for selected contents (crystallographic/density/wavefunctions). It is hoped that these specifications will be implemented in many different softwares, or (at least) will be the basis of even better file format specifications.

Introduction

This document has the goal of informing the electronic structure community of the agreed ETSF specifications, hereafter referred to as SpecFF_ETSF3, in view of further discussions and implementations. It is expected that the file format specifications present in this document be subject to revision and improvements. The first version of this document, named SpecFFNQ1 (NQ for Nanoquanta, the precursor of the ETSF), was frozen around June 2006. The current version of the file format, and associated information, can be found at http://www.etsf.eu/fileformats.

The document is organized in sections:

Section General considerations presents general considerations concerning the present file format specifications.
Section General specifications presents general specifications concerning ETSF NetCDF file formats.
Section Specification for files containing crystallographic data deals with files containing crystallographic data, and present a rather detailed NetCDF specification.
Section Specification for files containing a density or a potential deals with files containing density/potential, with the same level of detail.
Section Specification for files containing the wavefunctions deals with files containing wavefunctions, with the same level of detail.
Section Dielectric function deals with files containing dielectric functions and related data.

General considerations

One has to consider separately the set of data to be included in each of different types of files, from their representation. Concerning the latter, one encounters simple text files, binary files, XML-structured files, NetCDF files, etc ... The ETSF decided to evolve towards formats that deal appropriately with the self- description issue, i.e. XML and NetCDF. The inherent flexibility of these representations also allow to evolve specific versions of each type of files progressively, and refine earlier working proposals. The same direction has been adopted by several groups of code developers that we know of.

Information on NetCDF and XML can be obtained from the official Web sites:
http://www.unidata.ucar.edu/software/netcdf/ and http://www.w3.org/XML/

There are numerous other presentations of these formats on the Web, or in books.

Concerning XML
(A) The XML format is most adapted for the structured representation of relatively small quantity of data, as it is not compressed.
(B) It is a very flexible format.

Concerning NetCDF
(A) Several groups of developers inside the ETSF have already a good experience of using it, for the representation of binary data (large files).
(B) Although there is no clear advantage of NetCDF compared to HDF (another possibility for large binary files), this experience inside the ETSF network is the main reason for preferring it.
(C) File size limitations of NetCDF exist, see Appendix A, but should be limited to old architectures.

Thanks to the flexibility of NetCDF, the content of a NetCDF file format suitable for use for ETSF softwares might be of four different types:
(1) The actual numerical data (that defines a file for wavefunctions, or a density file, etc ...), whose NetCDF description would have been agreed.
(2) The auxiliary data that are mandatory to make proper usage of the actual numerical data. The NetCDF description of these auxiliary data should also be agreed.
(3) The auxiliary data that are not mandatory, but whose NetCDF description has been agreed, in a larger context.
(4) Other data, typically code-dependent, whose existence might help the use of the file for a specific code. The name of these variables should be different from the names chosen for agreed variables (1)-(3). Such other data might even be redundant with (1)-(3).

Such content is compatible with a file format being complete for use by many codes, though adapted for the specific usage by one code. The ETSF file descriptions to be provided later (sections General specifications to Dielectric function) are based on this generic classification of data that can be integrated in such a NetCDF file.

In order to address the 2 GB limit (see Appendix A), as well as the use of NetCDF files for parallel calculations, one file can actually be split into several partial files. Selected variables should describe the differing content of each of them. As an example, in Specification for files containing a density or a potential, a file containing a set of wavefunctions can be split in different files containing selected bands and/or k-points, however being exactly similar in every other respect.

Some technical details concerning the use of NetCDF files apply to all formats specified in the ETSF framework:

concerning the variable names, long names should be chosen, as close as possible to natural language (so inherently self-descriptive).
all variable names are lower case, except "Conventions" - a name agreed by the NetCDF community
underscores are used to replace blanks separating words
in the tables, the slow indices are left-most, and the fast indices are right-most, so that the order of indices has to be reversed in Fortran

General specifications

Global attributes

Global attributes are used for a general description of the file, mainly the file format convention. Important data is not contained in attributes, but rather in variables.

The length of character attributes is the maximum length this attribute may take. This is relevant for reading, where sufficient space must be provided. In writing, the defined length may be reduced to the real length of the attribute.

Mandatory attributes

This table gathers specifications for required attributes in any ETSF NetCDF files.

Attributes	Type (length)	Notes
file_format	char (80)	"ETSF"
file_format_version	real	1.1 or 1.2 or 2.0 ...
Conventions	char (80)	"http://www.etsf.eu/fileformats"

file_format: Name of the file format for ETSF wavefunctions.
file_format_version: Real version number for file format (only one period, e.g. 1.2 ).
Conventions: NetCDF recommended attribute specifying where the conventions for the file can be found on the Internet.

Optional attributes

This table presents optional attributes for ETSF NetCDF files.

Attributes	Type (length)	Notes
history	char (1024)
title	char (80)

history: NetCDF recommended attribute: each code modifying/writing this file is encouraged to add a line about itself in the history attribute. char(1024) allows for 12 additions of at most 80 characters.
title: Short description of the content (system) of the file.

Generic attributes of variables

A few attributes might apply to a large number of variables. The following table presents the generic attributes that might be mandatory for selected variables in ETSF NetCDF files.

Attributes	Type (length)	Notes
units	char (80)	required for variables that carry units
scale_to_atomic_units	double	required for units other than "atomic units"

units: It is one of the NetCDF recommended attributes, but it only applies to a few variables in our case, since most are dimensionless. For dimensional variables, it is required. The use of atomic units (corresponding to the string "atomic units") is advised throughout for portability. If other units are used, the definition of an appropriate scaling factor to atomic units is mandatory. Actually, the definition of the name "units" in the ETSF files is only informative: the "scale_to_atomic_units" information should be the only one used to read the file by machines.
scale_to_atomic_units: If "units" is something other than the character string "atomic units" (based on Hartree for energies, Bohr for lengths) we request the definition of an appropriate scaling factor. The appropriate value in atomic units is obtained by multiplying the number found in the variable by the scaling factor. Examples:
units="eV" $\rightarrow$ scale_to_atomic_units = 0.036749326
units="angstrom" $\rightarrow$ scale_to_atomic_units = 1.8897261
units="parsec" $\rightarrow$ scale_to_atomic_units = 5.8310856e+26
This can be used to deal with unknown units. Note that the recommended values for the fundamental constants can be found at NIST.

Flag-like attributes

"Flag-like" attributes can take the values "yes" and "no". When such attributes are written, they should be written in full length and small letters. When they are read, only the first character needs to be checked (i.e. "y" or "n" – this simplifies life a lot).

Dimensions

Dimensions are used for one- or multidimensional variables. It is very important to remember that the NetCDF interface adapts the dimension ordering to the programming language used. The notation here is C-like, i.e. the last index varies fastest. In Fortran, the order is reversed. When implementing new reading interfaces, the dimension names can be used to check the dimension ordering. The dimension names help also to identify the meaning of certain dimensions in cases where the number alone is not sufficient.

The variables that specify dimensions in ETSF files are divided into two lists: one for the dimensions that are not supposed to lead to a splitting, and another for the dimensions that might be used to define a splitting (e.g. in case of parallelism).

Dimensions that cannot be split

This table list the dimensions that are not supposed to lead to a splitting.

Dimensions	Type (index order as in C)	Notes
character_string_length	integer	Always ==80
real_or_complex_coefficients	integer	Either ==1 or 2
real_or_complex_density	integer	Either ==1 or 2
real_or_complex_gw_corrections	integer	Either ==1 or 2
real_or_complex_potential	integer	Either ==1 or 2
real_or_complex_wavefunctions	integer	Either ==1 or 2
number_of_cartesian_directions	integer	Always ==3
number_of_reduced_dimensions	integer	Always ==3
number_of_vectors	integer	Always ==3
number_of_symmetry_operations	integer
number_of_atoms	integer
number_of_atom_species	integer
symbol_length	integer	Always ==2

character_string_length: The maximum length of string variables (attributes may be longer).
real_or_complex_coefficients: To specify whether the variable coefficients_of_wavefunctions (Wavefunctions) is real or complex.
real_or_complex_density: To specify whether the variable density (Density) is real or complex.
real_or_complex_gw_corrections: To specify whether the variable gw_corrections (BSE/GW) is real or complex.
real_or_complex_potential: To specify whether the variables exchange_potential, correlation_potential, and exchange_correlation_potential (Exchange and correlation) are real or complex.
real_or_complex_wavefunctions: To specify whether the variable real_space_wavefunctions (Wavefunctions) is real or complex.
number_of_cartesian_directions: Used for absolute coordinates.
number_of_reduced_dimensions: Used for reduced (also called relative) coordinates in reciprocal or real space.
number_of_vectors: Used to distinguish the vectors when defining their relative / reduced coordinates.
number_of_symmetry_operations: The number of symmetry operations.
number_of_atoms: The number of atoms in the unit cell.
number_of_atom_species: The number of different atom species in the unit cell.
symbol_length: Maximum number of characters for the chemical symbols.

Dimensions that can be split

This table list the dimensions that might be used to define a splitting (e.g. in case of parallelism). For the auxiliary variables needed in case of splitting, see Splitting.

Dimensions	Type (index order as in C)	Notes
max number of states	integer
number of kpoints	integer
number of spins	integer	Either ==1 or 2
number of spinor components	integer	Either ==1 or 2
number of components	integer	Either ==1, 2 or 4
max number of coefficients	integer
number of grid points vector1	integer
number of grid points vector2	integer
number of grid points vector3	integer
max number of basis grid points	integer	For wavelets. Range in 1 to number_of_grid_points1_vector1 * number_of_grid_points1_vector2 * number_of_grid_points1_vector3
number of localisation regions	integer	Always 1.

max_number_of_states: The maximum number of states.
number_of_kpoints: The number of kpoints.
number_of_spins: Used to distinguish collinear spin-up and spin-down components:
1 for non-spin-polarized or spinor wavefunctions
2 for collinear spin (spin-up and spin-down).
number_of_spinor_components: For non-spinor wavefunctions, this dimension must be present and equal to 1. For spinor wavefunctions this dimension must equal to 2.
number_of_components: Used for the spin components of spin-density matrices:
1 for non-spin-polarized
2 for collinear spin (spin-up and spin-down)
4 for non-collinear spin (average density, then magnetization vector in cartesian coordinates x,y and z).
max_number_of_coefficients: The (maximum) number of coefficients for the basis functions at each k-point, except in the case of real space grids (see next lines).
number_of_grid_points_vector1: The number of grid points along direction 1 in the unit cell in real space, for dimensioning the wavefunction coefficients (an alternative to max_number_of_coefficients).
number_of_grid_points_vector2: Same as number_of_grid_points_vector1, for the second direction.
number_of_grid_points_vector3: Same as number_of_grid_points_vector1, for the third direction.
max_number_of_basis_grid_points: For wavelets. The number of relevant points from the regular mesh in real space used to store coefficients. This value is the maximum of number of basis set grid points over all localization region(s). Currently, number_of_localization_regions is always 1, so max_number_of_basis_grid_points is simply the number of basis set grid points.
number_of_localization_regions For wavelets. This dimension will be used later to define one or several localized basis sets.

To clarify the interplay between number_of_spins, number_of_components, and number_of_spinor_components, note the different following magnetic or non-magnetic cases:
Non-spin-polarized:
number_of_spins=1 , number_of_spinor_components=1, number_of_components=1
Collinear spin-polarized:
number_of_spins=2, number_of_spinor_components=1, number_of_components=2
Non-collinear spin-polarized:
number_of_spins=1, number_of_spinor_components=2, number_of_components=4

Splitting

We now turn to the specification of the (optional) splitting of files in partial files. Such splitting might be done in many different ways. In order to allow for very general, flexible, splittings, but still rely on a simple system, we set up different pairs of variables, one for each possible splitting. These pairs of variables are described in Auxiliary dimensions for splitting and Auxiliary variables for splitting. If a software cannot cope with the file splitting, it should simply check that no file splitting is done, and if the contrary happens, it should stop.

Let us work out an example.
Suppose we split the file according to the kpoints. The full set might have 10 kpoints, of which 3 kpoints (number 1, 2 and 5) might be contained in a first file, 3 other kpoints (number 3, 6 and 9) might be contained in a second file, and the 4 remaining kpoints (number 4, 7, 8 and 10) might be contained in the third file.

Then, the first file will contain:
number_of_kpoints = 10 , my_number_of_kpoints = 3 , my_kpoints=(1,2,5)

The second file will contain:
number_of_kpoints = 10 , my_number_of_kpoints = 3 , my_kpoints=(3,6,9)

The third file will contain:
number_of_kpoints = 10 , my_number_of_kpoints = 4 , my_kpoints=(4,7,8,10)

If more than one splitting is done, the file will contain the intersection of the split data. As an example, suppose we split the file according to the kpoints and the spins. The full set of kpoints might have 4 kpoints, and there would be two spins. We perform two splittings, one separating kpoints 1 and 2 from kpoints 3 and 4, and one separating the spins.
The first file might contain:
number_of_kpoints = 4 , my_number_of_kpoints = 2 , my_kpoints=(1,2)
number_of_spins = 2 , my_number_of_spins = 1 , my_spins=(1)

The second file might contain:
number_of_kpoints = 4 , my_number_of_kpoints = 2 , my_kpoints=(3,4)
number_of_spins = 2 , my_number_of_spins = 1 , my_spins=(1)

The third file might contain:
number_of_kpoints = 4 , my_number_of_kpoins = 2 , my_kpoints=(1,2)
number_of_spins = 2 , my_number_of_spins = 1 , my_spins=(2)

The fourth file might contain:
number_of_kpoints = 4 , my_number_of_kpoins = 2 , my_kpoints=(3,4)
number_of_spins = 2 , my_number_of_spins = 1 , my_spins=(2)

Different variables might change their sizes when splitting is used. The list of variables whose size might change compared to non-split files will have to be specified.

Auxiliary dimensions for splitting

Dimensions of variables to specify the (optional) splitting of one file in different partial files. These dimensions and associated variables (see Auxiliary variables for splitting) are defined by pair (one integer, and one integer array). Any one of these pairs can be used to split the files, and several of these pairs can be used as well. In case several pairs are used, the content of the file is defined by the intersection of the different integer arrays. The detailed description of these variables is induced from the one of the corresponding variables in Dimensions that can be split.

Dimensions	Type (index order as in C)	Notes
my_max number_of_states	integer	At least 1, at most number of states
my_number_of_kpoints	integer	At least 1, at most number of kpoints
my_number_of_spins	integer	At least 1, at most number of spins
my_number_of_spinor_components	integer	At least 1, at most number of spinor components
my_number_of_components	integer	At least 1, at most number of components
my_number_of_grid_points_vector1	integer	At least 1, at most number of grid points vector1
my_number_of_grid_points_vector2	integer	At least 1, at most number of grid points vector2
my_number_of_grid_points_vector3	integer	At least 1, at most number of grid points vector3
my_max_number_of_coefficients	integer	At least 1, at most max number of coefficients

Auxiliary variables for splitting

Variables to specify the (optional) splitting of one file in different partial files. See the explanation in Auxiliary dimensions for splitting. The detailed description of these variables is induced from the one of the corresponding variables from Dimensions that can be split.

Variables	Type (index order as in C)	Notes
my_states	integer [my_max_number_of_states]
my_kpoints	integer [my_number_of_kpoints]
my_spins	integer [my_number_of_spins]
my_spinor_components	integer [my_number_of_spinor_components]
my_components	integer [my_number_of_components]
my_grid_points_vector1	integer [my_number_of_grid_points_vector1]
my_grid_points_vector2	integer [my_number_of_grid_points_vector2]
my_grid_points_vector3	integer [my_number_of_grid_points_vector3]
my_coefficients	integer [my_max_number_of_coefficients]

Optional variables

In order to avoid the “divergence of the formats in the additional data”, we propose names and formats for some information that is likely to be written to the files. This section will grow in future format versions. Please report any variable you miss here, so we can add it to the list. None of these data is mandatory for the file formats to be described later. Some of the proposed variables contain redundant information.

All optional variables must be defined BEFORE the largest size array of the file, otherwise this array will be restricted to 4GB. Examples of such arrays are coefficients_of_wavefunctions or real_space_wavefunctions (see later).

These optional variables are grouped with respect to their physical relevance: atomic information, electronic structure, and reciprocal space.

Atomic information

Variables	Type (index order as in C)	Notes
valence_charges	double [number_of_atom_species]
pseudopotential_types	char [number_of_atom_species][character_string_length]

valence_charges: Ionic charges for each atom species.
pseudopotential_types: Type of pseudopotential scheme = "bachelet-hamann-schlueter", "troullier-martins", "hamann", "hartwigsen-goedecker-hutter", "goedecker-teter-hutter"... A standardized list should be found or established.

Electronic structure

Variables	Type (index order as in C)	Notes
number of electrons	integer
exchange_functional	char [character_string_length]
correlation_functional	char [character_string_length]
fermi_energy double	Units attribute required. The attribute “scale to atomic units” might also be mandatory, see Generic attributes of variables.
smearing_scheme	char [character_string_length]
smearing_width double	Units attribute required. The attribute “scale to atomic units” might also be mandatory, see Generic attributes of variables.

number_of_electrons: Number of electrons in the elementary cell.
exchange_functional: String describing the functional used for exchange: names should be taken from the ETSF XC library specifications (at present, under construction).
correlation_functional: String describing the functional used for correlation: Lee Yang Parr or Colle-Salvetti etc... names should be taken from the ETSF XC library specifications.
fermi_energy: Fermi energy corresponding to occupation numbers.
smearing_scheme: Smearing scheme used for metallic or finite temperature occupation numbers = "gaussian", "fermi-dirac", "cold-smearing", "methfessel-paxton-n" for n=1 ... 10.
smearing_width: Smearing width used with scheme above.

Reciprocal space

Variables	Type (index order as in C)	Notes
kinetic_energy_cutoff	double	Units attribute required. The attribute “scale to atomic units” might also be mandatory, see Generic attributes of variables.
kpoint_grid_shift	double [number_of_reduced_dimensions]
kpoint_grid_vectors	double [number_of_vectors] [number_of_reduced_dimensions]
monkhorst_pack_folding	integer [number_of_vectors]

kinetic_energy_cutoff: Cutoff used to generate the plane-wave basis set.
kpoint_grid_vectors: Basis vectors for kpoint grid if it is homogeneous. Given in the coordinates of reciprocal space primitive vectors.
kpoint_grid_shift: Shift for offset of grid of kpoints. Used with both kpoint_grid_vectors and monkhorst_pack_folding.
monkhorst_pack_folding: This indicates the “folding” for regular kpoint grids (e.g. Monkhorst-Pack Phys. Rev. B 13, 5188 (1976)). An alternative to kpoint_grid_vectors.

Naming conventions

NetCDF files, that respect the ETSF specifications described in the present document, should be easily recognized. We suggest to append, in their names, the string "-etsf.nc" . The appendix ".nc" is a standard convention for naming NetCDF files, see:
http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#filename . Some filesystems are case- insensitive, and this motivates the lower-case choice. Finally, a dash is to be preferred to an underscore to allow the files references by a Web search engine.

Specification for files containing crystallographic data

Specification

A ETSF NetCDF file for crystallographic data should contain the following set of mandatory information:

The three attributes defined in Mandatory attributes
The following dimensions from Dimensions that cannot be split:
- number_of_cartesian_directions
- number_of_vectors
- number_of_atoms
- number_of_atom_species
- number_of_symmetry_operations
The following variables defined in Atomic structure and symmetry operations:
- primitive_vectors
- reduced_symmetry_matrices
- reduced_symmetry_translations
- space_group
- atom_species
- reduced_atom_positions
At least one of the following variables defined in Atomic structure and symmetry operations, to specify the kind of atoms:
- atomic_numbers
- atom_species_names
- chemical_symbols

The use of atomic_numbers is preferred. If atomic_numbers is not available, atom_species_names will be preferred over chemical_symbols. In case more than one such variables are present in a file, the same order of preference should be followed by the reading program.

As mentioned in General considerations and General specifications, such file might contain additional information agreed within ETSF, such as any of the variables specified in General specifications. It might even contain enough information to be declared a ETSF NetCDF file "containing the density" or "containing the wavefunctions", or both. Such file might also contain additional information specific to the software that generated the file. It is not expected that this other software-specific information be used by another software.

It is not expected that the above-mentioned information be distributed among different files (unlike for density/potential/wavefunction files, see later).

Atomic structure and symmetry operations

Variables and attributes to specify the atomic structure and symmetry operations.

Variables	Type (index order as in C)	Notes
primitive_vectors	double [number_of_vectors][number_of_cartesian_directions]	By default, given in Bohr.
reduced_symmetry_matrices	integer [number_of_symmetry_operations][number_of_reduced_dimensions][number_of_reduced_dimensions]	The "symmorphic" attribute is needed.
reduced_symmetry_translations	double [number_of_symmetry_operations][number_of_reduced_dimensions]	The "symmorphic" attribute is needed.
space_group	integer	Between 1 and 232.
atom_species	integer [number_of_atoms]	Between 1 and "number_of_atom_species".
reduced_atom_positions	double [number_of_atoms][number_of_reduced_dimensions]
atomic_numbers	double [number_of_atom_species]
atom_species_names	char [number_of_atom_species][character_string_length]
chemical_symbols	char [number_of_atom_species][symbol_length]
Attributes	Type	Notes
symmorphic	char(80)	flag-type attribute, see Flag-like attributes.

primitive_vectors The primitive vectors, expressed in cartesian coordinates.
Symmetry operations are defined in real space, with reduced coordinates. A symmetry operation in real space sends the input point r to the output point r', with

${r’}{\alpha}^{red} = \sum{\beta} S^{red}{\alpha \beta} r^{red}{\beta} + t^{red}_{\beta}$

The array reduced_symmetry_matrices contains the matrices S, in reduced coordinates, while the vector t, in reduced coordinates, is contained in the array reduced_symmetry_translations of the same Table. There might be a confusion between the two dimensions number_of_reduced_dimensions of this variable. In the C ordering, the last one corresponds to the beta index in the above-mentioned formula.

The first symmetry operation must always be unity with translation vector (0,0,0). If all translations are zero, the attribute symmorphic for reduced_symmetry_matrices should be set to "yes".

space_group: Space group number according to international tables of crystallography (from 1 to 232).
atom_species: Types of each atom in the unit cell. Note that the first type of atom has number "1", and the last type of atom has number "number_of_atom_species".
reduced_atom_positions: Positions of the different atoms in the unit cell in relative / reduced coordinates.
atomic_numbers: Atomic number for each atom species. If it does not refer to an "usual" atom (e.g. fractional charge atoms or similar), a non-integer number or zero may be used, but it is strongly advised then to also specify the atom_species_names variable.
atom_species_names: Descriptive name for each atom species = "H" "Ga" plus variants like "Ga-semicore" "C-1s-corehole" "C-sp2" "C1".
chemical_symbols: Chemical symbol for each atom species (as in periodic table). If not appropriate (fractional charge atoms or similar), "X" may be used.
symmorphic: Flag-type attribute (see Flag-like attributes), needed for the variables reduced_symmetry_matrices and reduced_symmetry_translations.

Specification for files containing a density or a potential

Specification

A ETSF NetCDF file for a density should contain the following set of mandatory information:

The three attributes defined in Mandatory attributes
The following dimensions from Dimensions that cannot be split:
- number_of_cartesian_directions
- number_of_vectors
- real_or_complex_density and/or real_or_complex_potential
The following dimensions from Dimensions that can be split:
- number_of_components
- number_of_grid_points_vector1
- number_of_grid_points_vector2
- number_of_grid_points_vector3
The primitive vectors of the cell, as defined in Atomic structure and symmetry operations.
The density or potential, as defined in Density or Exchange and correlation. This variable must be the last, in order not to be limited to 4 GB.

As mentioned in General considerations and General specifications, such file might contain additional information agreed within ETSF, such as any of the variables specified in General specifications. It might even contain enough information to be declared a ETSF NetCDF file "containing crystallographic data" or "containing the wavefunctions", or both. Such file might also contain additional information specific to the software that generated the file. It is not expected that this other software-specific information be used by another software.

A ETSF NetCDF exchange, correlation, or exchange-correlation potential file should contain at least one variable among the three presented in Exchange and correlation in replacement of the specification of the density. The type and size of such variables are similar to the one of the density. The other variables required for a density are also required for a potential file. Additional ETSF or software-specific information might be added, as described previously.

The information might distributed among different files, thanks to the use of splitting of data for variables:

number_of_components
number_of_grid_points_vector1
number_of_grid_points_vector2
number_of_grid_points_vector3

In case the splitting related to one of these variables is activated, then the corresponding variables in Auxiliary variables for splitting must be defined. Accordingly, the dimensions of the variables in Density and/or Exchange and correlation will be changed, to accommodate only the segment of data effectively contained in the file.

Density

Variables	Type (index order as in C)	Notes
density	double[number_of_components][number_of_grid_points_vector3][number_of_grid_points_vector2][number_of_grid_points_vector1][real_or_complex_density]	This is a pseudo-density. Note in case of PAW, the augmentation contribution is missing. By default, the density is given in atomic units, that is, number of electrons per Bohr^3^. The “units” attribute is required. The attribute “scale_to_atomic_units” might also be mandatory, see Generic attributes of variables.

A density in such a format (represented on a 3D homogeneous grid) is suited for the representation of smooth densities, as obtained naturally from pseudopotential calculations using plane waves.

This specification for a density can also accommodate the response densities of Density-Functional Perturbation Theory.

Exchange and correlation

Variables	Type (index order as in C)	Notes
correlation_potential	double[number_of_components][number_of_grid_points_vector3][number_of_grid_points_vector2][number_of_grid_points_vector1][real_or_complex_potential]	Note in case of PAW, the augmentation contribution is missing. Units attribute required. The attribute "scale to atomic units" might also be mandatory, see Generic attributes of variables.
exchange_potential	double[number_of_components][number_of_grid_points_vector3][number_of_grid_points_vector2][number_of_grid_points_vector1][real_or_complex_potential]	Note in case of PAW, the augmentation contribution is missing. Units attribute required. The attribute "scale to atomic units" might also be mandatory, see Generic attributes of variables.
exchange_correlation_potential	double[number_of_components][number_of_grid_points_vector3][number_of_grid_points_vector2][number_of_grid_points_vector1][real_or_complex_potential]	Note in case of PAW, the augmentation contribution is missing. Units attribute required. The attribute "scale to atomic units" might also be mandatory, see Generic attributes of variables.

Specification for files containing the wavefunctions

Specification

A ETSF NetCDF file "containing the wavefunctions" should contain at least the information needed to build the density from this file. Also, since the eigenvalues are intimately linked to eigenfunctions, it is expected that such a file contain eigenvalues. Of course, files might contain less information than the one required, but still follow the naming convention of ETSF. It might also contain more information, of the kind specified in other tables of the present document.

A ETSF NetCDF file "containing the wavefunctions" should contain the following set of mandatory information:

The three attributes defined in Mandatory attributes
The following dimensions from Dimensions that cannot be split:
- character_string_length
- number_of_cartesian_directions
- number_of_vectors
- real_or_complex_coefficients and/or real_or_complex_wavefunctions
- number_of_symmetry_operations
- number_of_reduced_dimensions
The following dimensions from Dimensions that can be split:
- max_number_of_states
- number_of_kpoints
- number_of_spins
- number_of_spinor_components
In case of a real-space wavefunctions, the following dimensions from Dimensions that can be split:
- number_of_grid_points_vector1
- number_of_grid_points_vector2
- number_of_grid_points_vector3
In case of a wavefunction given in terms of a basis set, the following dimensions from Dimensions that can be split:
- max_number_of_coefficients
In case of a wavefunction given in terms of a Daubechies wavelet basis set, the following dimensions from Dimensions that can be split:
- max_number_of_basis_grid_points
- number_of_localization_regions
The primitive vectors of the cell, as defined in Atomic structure and symmetry operations (variable primitive_vectors)
The symmetry operations, as defined in Atomic structure and symmetry operations (given by the variables reduced_symmetry_translations and reduced_symmetry_matrices)
The information related to each kpoint, as defined in K-points.
The information related to each state (including eigenenergies and occupation numbers), as defined in States.
In case of basis set representation, the information related to the basis set, and the variable coefficients_of_wavefunctions, as defined in Wavefunctions.
For basis_set equal to "plane_waves", the following variable is required from Wavefunctions:
- reduced_coordinates_of_plane_waves
For basis_set equal "daubechies_wavelets", the following variables are required from Wavefunctions:
- coordinates_of_basis_grid_points
- number_of_coefficients_per_grid_point
In case of real-space representation, the variable following variable from Wavefunctions:
- real_space_wavefunctions

As mentioned in General considerations and General specifications, such a file might contain additional information agreed on within ETSF, such as any of the variables specified in General specifications. It might even contain enough information to be declared a ETSF NetCDF file "containing crystallographic data" or "containing the density", or both. Such a file might also contain additional information specific to the software that generated the file. It is not expected that this other software-specific information be used by another software.

The information might be distributed among different files, thanks to the use of splitting of data for variables:

max_number_of_states
number_of_kpoints
number_of_spins

And, either

number_of_grid_points_vector1
number_of_grid_points_vector2
number_of_grid_points_vector3

max_number_of_coefficients

In case the splitting related to one of these variables is activated, then the corresponding variables in Split wavefunctions must be defined. Accordingly, the dimensions of the variables in K-points, States, Wavefunctions, and BSE/GW might have to be changed, to accommodate only the segment of data effectively contained in the file.

K-points

Variables	Type (index order as in C)	Notes
reduced_coordinates_of_kpoints	double[number_of_kpoints] [number_of_reduced_dimensions]	See possible changes for split files in Split wavefunctions.
kpoint_weights	double[number_of_kpoints]	See Construction of the density. See also possible changes for split files in Split wavefunctions.

reduced_coordinates_of_kpoints: k-point in relative/reduced coordinates.
kpoint_weights: k-point integration weights. The weights must sum to 1. See Construction of the density.

States

Variables	Type (index order as in C)	Notes
number_of_states	integer[number_of_spins][number_of_kpoints]	The attribute "k_dependent" must be defined.
eigenvalues	double[number of spins][number of kpoints][max number of states]	The "units" attribute is required. The attribute "scale_to_atomic_units" might also be mandatory, see Generic attributes of variables. See also possibles changes for split files in Split wavefunctions.
occupations	double[number of spins][number of kpoints][max number of states]	See also possibles changes for split files in Split wavefunctions.
Attributes	Type	Notes
k_dependent	char(80)	Attribute of number_of_states, flag-type, see Flag-like attributes.

number_of_states: Number of states for each kpoint, if varying (the attribute k_dependent must be set to yes). Otherwise (the attribute k_dependent must be set to no), might not contain any information, the actual number of states being set to max_number_of_states.
eigenvalues: One-particle eigenvalues/eigenenergies. Should be 0 if unknown.
occupations: Occupation numbers. Full occupation for spin-unpolarized cases (number_of_spins = 1 AND number_of_spinor_components = 1) is 2, otherwise it is 1. See Construction of the density.
k_dependent: Flag-type attribute (see Flag-like attributes), needed for the variables number_of_states, number_of_coefficients, and reduced_coordinates_of_plane_waves.

Wavefunctions

Variables	Type (index order as in C)	Notes
basis_set	char(character string length)	"plane_waves" if a plane-wave basis set is used. "Daubechies_wavelets" if a Daubechies wavelet is used.
number_of_coefficients	integer[number_of_kpoints]	The attribute "k_dependent" must be defined (see States). Possible splitting, see Split wavefunctions.
coefficients_of_wavefunctions	double [number_of_spins][number_of_kpoints][max_number_of_states][number_of_spinor_components][max_number_of_coefficients][real_or_complex_coefficients]	For both plane-wave basis set and Daubechies wavelet basis set. Normalization for plane waves: 1 per unit cell. See also possible modifications for split files in Split wavefunctions. The attribute used_time_reversal_at_gamma might be defined.
reduced_coordinates_of_plane_waves	integer[number_of_kpoints][max_number_of_coefficients][number_of_reduced_dimensions]	The attribute "k_dependent" must be defined (see States). See possible modifications for split files in Split wavefunctions. The attribute used_time_reversal_at_gamma might be defined.
coordinates_of_basis_grid_points	integer[number_of_localization_regions][max_number_of_basis_grid_points][number_of_reduced_dimensions]	For wavelets.
number_of_coefficients_per_grid_point	integer[number_of_localization_regions][max_number_of_basis_grid_points]	For wavelets.
order_of_Daubechies_wavelets	integer	For wavelets.
real_space_wavefunctions	double[number_of_spins][number_of_kpoints][max_number_of_states][number_of_spinor_components][number_of_grid_points_vector3][number_of_grid_points_vector2][number_of_grid_points_vector1][real_or_complex_wavefunctions] Normalization: 1 per unit cell. See possible modifications for split files in Split wavefunctions.
Attributes	Type	Notes
used_time_reversal_at_gamma	char(80)	Attribute of reduced coordinates of plane waves and coefficients of wavefunctions flag-type, see Flag-like attributes.

basis_set:|Type of basis set used if not in a real-space grid. At present, either "plane_waves" or "Daubechies_wavelets".
number_of_coefficients: Number of basis function coefficients for each kpoint, if varying (the attribute k_dependent must be set to yes). Otherwise (the attribute k_dependent must be set to no), might not contain any information, the actual number of coefficients being set to max_number_of_coefficients.
reduced_coordinates_of_plane_waves: Plane-wave G-vectors in relative/reduced coordinates. If the attribute k_dependent is set to no, then the dimension [number_of_kpoints] must be omitted. On the other hand, if the attribute used_time_reversal_at_gamma is set to yes (only allowed for the plane wave basis set), then, for the Gamma k point - reduced_coordinates_of_kpoints being equal to (0 0 0) - the time reversal symmetry has been used to nearly halve the number of plane waves, with the coefficients of the wavefunction for a particular reciprocal vector being the complex conjugate of the coefficients of the wavefunction at minus this reciprocal vector. So, apart the origin, the coefficient of only one out of each pair of corresponding plane waves ought to be specified. Note that in the present version of this specification, spatial symmetries should not be used to decrease the number of plane waves. Note also that the dimension max_number_of_coefficients actually governs the size of reduced_coordinates_of_plane_waves, so only when the gamma kpoint is present alone, will the size of the file effectively be reduced by the factor of two.
coefficients_of_wavefunctions: Wavefunction coefficients. The wavefunctions must be normalized to 1, i.e. the sum of the absolute square of the coefficients of one wavefunction must be 1. See Construction of the density. The attribute used_time_reversal_at_gamma must be used in the same way as for the variable reduced_coordinates_of_plane_waves .
coordinates_of_basis_grid_points: For wavelets. Coordinates of the grid points where coefficients can be stored. This is used to define a real space basis set where a reduced set of points is used. This array may be used in conjunction with the variable number_of_coefficients_per_grid_points.
number_of_coefficients_per_grid_points: For wavelets. This array gives the number of coefficients stored on basis set grid points. The coordinates of corresponding grid points are given in the array coordinates_of_basis_grid_points.
order_of_Daubechies_wavelets: For wavelets. This number gives the order of the Daubechies wavelet basis, e.g. if order_of_Daubechies_wavelets is 14, the Daubechies wavelet are made from piecewise polynomial of order 14.
used_time_reversal_at_gamma: Flag-type attribute (see Flag-like attributes), that can be used for the variables reduced_coordinates_of_plane_waves and coefficients_of_wavefunctions
real_space_wavefunctions: Wavefunction coefficients. Unlike for explicit basis set, the wavefunctions must be normalized to 1 per unit cell, i.e. the sum of the absolute square of the coefficients of one wavefunction, for all points in the grid, divided by the number of points must be 1. See Construction of the density. Note that this array has a number of dimensions that exceeds the maximum allowed in Fortran (that is, seven). This leads to practical problems only if the software to read/write this array attempts to read/write it in one shot. Our suggestion is instead to read/write sequentially parts of this array, e.g. to write the spin up part of it, and then, add the spin down. This might be done using Fortran arrays with at most seven dimensions.

When the variable basis_set is set to "daubechies_wavelets", the basis set is constituted by a reduced set of grid points that can host one or several coefficients. The following explanation assumes a two-level resolution but it can be used for other values. In the two-resolution case, all other quantities than the wavefunctions (as the density) are usually expressed on the finest grid, i.e. the grid for the density is twice the grid for the wavefunctions. Since dimensions number_of_grid_points_vector<i> are used to define the scalar variables, the coordinates_of_basis_grid_points must be even numbers in the two- resolution case. The wavefunctions are expanded in real space on a non-complete uniform grid. The grid points used for the basis set are listed in the variable coordinates_of_basis_grid_points. Each basis grid point can host one or eight coefficients as stored in the variable number_of_coefficients_per_grid_points. Then, in that case, the dimension max_number_of_coefficients is the sum over the basis gridpoint of the values of number_of_coefficients_per_grid_point. To build the wavefunctions from the values stored in coefficients_of_wavefunctions, one must read for each basis grid point the required number of coefficients. When one coefficient is given, this means a coefficient for a product of 1-dimensional Daubechies scaling-functions centered on the basis grid point. When eight values are given, this means eight coefficients for product of both scaling functions and wavelet functions ($\phi$ denotes Daubechies scaling functions and $\psi$ Daubechies wavelet functions):

$\phi (x) \phi (y) \phi (z)$
$\psi (x) \phi (y) \phi (z)$
$\phi (x) \psi (y) \phi (z)$
$\psi (x) \psi (y) \phi (z)$
$\phi (x) \phi (y) \psi (z)$
$\psi (x) \phi (y) \psi (z)$
$\phi (x) \psi (y) \psi (z)$
$\psi (x) \psi (y) \psi (z)$

For a review on wavelets, including the description of Daubechies wavelets, see, e.g., Wavelets and Their Application for the Solution of Partial Differential Equations in Physics, Presses Polytechniques et Universitaires Romandes, Lausanne, (1998) by S. Goedecker.

Note that these specification for the wavefunctions can accommodate the response wavefunctions of Density-Functional Perturbation Theory. On the contrary, the response eigenenergies (actually a hermitian matrix of Lagrange multipliers) cannot be accommodated by the "eigenvalues" array of States.

Split wavefunctions

Different variables see their dimensions modified, in case the file is split, as described in Splitting (see Dimensions that can be split and Auxiliary variables for splitting). In the following table we have gathered the variables whose dimensions will change. We have also dimensioned them as if the splitting was done on all the possible dimensions. This will rarely be the case, but intermediate situations can easily be deduced from the data gathered in the table.

Variables	Type (index order as in C)	Notes
reduced_coordinates_of_kpoints	double[my_number_of_kpoints][number_of_reduced_dimensions]
number_of_coefficients	integer[my_number_of_kpoints]
kpoint_weights	double[my_number_of_kpoints]
occupations	double[my_number_of_spins][my_number_of_kpoints][my_max_number_of_states]
eigenvalues	double[my_number_of_spins][my_number_of_kpoints][my_max_number_of_states]	The "units" attribute is required. The attribute "scale_to_atomic_units" might also be mandatory, see Generic attributes of variables.
real_space_wavefunctions	double [my_number_of_spins][my_number_of_kpoints][my_max number_of_states][my_number_of_spinor_components][my_number_of_grid_points_vector1][my_number_of_grid_points_vector2][my_number_of_grid_points_vector3][real_or_complex_wavefunctions]
coefficients_of_wavefunctions	double [my_number_of_spins][my_number_of_kpoints][my_max_number_of_states][my_number_of_spinor_components][my_max_number_of_coefficients][real_or_complex_coefficients]
reduced_coordinates_of_plane_waves	integer[my_number_of_kpoints][number_of_reduced_dimensions]

BSE/GW

The variables mentioned in this table are optional. They have been introduced in the present specification in prevision of use by some GW/BSE softwares, and might be subject to (heavy?) revisions in future versions of the specification.

Dimensions	Type	Notes
max_number_of_angular_momenta	integer
max_number_of_projectors	integer
Variables	Type (index order as in C)
gw_corrections	double [number_of_spins][number_of_kpoints][max_number_of_states][real_or_complex_gw_corrections]	The "units" attribute is required. The attribute "scale_to_atomic_units" might also be mandatory, see Generic attributes of variables. See also possibles changes for split files, as in Split wavefunctions.
kb_formfactor_sign	integer[number_of_atom_species][max_number_of_angular_momenta][max_number_of_projectors]
kb_formfactors	double[number_of_atom_species][max_number_of_angular_momenta][max_number_of_projectors][number_of_kpoints][max_number_of_coefficients]	Possibles changes for split files, as in Split wavefunctions.
kb_formfactor_derivative	double[number_of_atom_species][max_number_of_angular_momenta][max_number_of_projectors][number_of_kpoints][max_number_of_coefficients]	Possibles changes for split files, as in Split wavefunctions.

gw_corrections: GW-corrections to one-particle eigenvalues (see Split wavefunctions). Imaginary part (originating from the non-hermiticity) is optional. Should be 0 if unknown.
max_number_of_angular_momenta: The maximum number of angular momenta to be considered for non-local Kleinman-Bylander separable norm-conserving pseudopotentials. If there is no non-local part, set it to 0. If the s channel is the highest angular momentum channel over all atomic species, then set it to 1. If the p channel (resp. d or f) is the highest, set it to 2 (resp. 3 or 4).
max_number_of_projectors: The maximum number of projectors for non-local Kleinman-Bylander separable norm-conserving pseudopotentials, over all angular momenta and all atomic species. If there is no non-local part, set it to 0. Most separable norm- conserving pseudopotentials have only one projector per angular momentum channel.
kb_formfactor_sign: An array of integers whose value depend on the specific atomic species, angular momentum, and projector. It can have three values: when 0, it means that there is no projector defined for that channel. When +1 or -1, it gives the sign of the Kleinman-Bylander projector for that channel.
kb_formfactors: Kleinman-Bylander form factors in reciprocal space.
kb_formfactor_derivatives: Kleinman-Bylander form factors derivatives in reciprocal space.

On the Kleinman-Bylander form factors, we note that one can always write the non-local part of Kleinman-Bylander pseudopotential (reciprocal space) in the following way:

$v^{KB}{nonloc} (\vec{K},\vec{K’}) = \sum_s \left[ \sum{a(s)} e^{-i(\vec{K}-\vec{K’})\vec{\tau_a}}\right] \left[ \sum_{lp} P_l(\hat{K} \cdot \hat{K’}) F^{\star}{slp}(K) S{slp} F_{slp}(K’) \right]$

with $\vec{K} = \vec{k} + \vec{G}$ , $\vec{k}$ is one of the kpoints (see Exchange and correlation), $\vec{G}$ is a vector of the reciprocal lattice, the list of reduced coordinates of which can be found in the variable reduced_coordinates_of_plane_waves of Wavefunctions. $K$ is the module of $\vec{K}$ and $\hat{K}$ its direction. $\vec{\tau_a}$ is the atomic position of atom $a$ belonging to species $s$. $P_l (x)$ is the Legendre polynomial of order $l$. $F_{slp} (K)$ is the Kleinman-Bylander form factor for species $s$, angular polynomial of order $l$ , and number of projector $p$ . $S_{slp}$ is the sign of the dyadic product $F_{slp}^{\star}(K) F_{slp}(K’)$. The sum on $a(s)$ runs over all atoms of atomic species $s$, $l$ runs over all the pseudopotential angular momentum components of the atomic species $s$, and $p$ runs over the number of projectors allowed for a specific angular momentum channel of atomic species $s$. The additional variable kb_formfactor_derivative is equal to $d F_{slp}(K) / dK$.

Construction of the density

Supposing $\rho_{n, k} (r)$ to be the partial density at point r (in real space, using reduced coordinates) due to band n at k-point k (in reciprocal space, using reduced coordinates), then the full density at point is obtained thanks to:

$\rho(r^{red}\alpha) = \sum_{s \in sym} \sum_k w_k \sum_n f_{n,k} \rho_{n, k} \left( S^{red}{s,\alpha \beta} (r^{red}\beta-t^{red}_{s,\beta}) \right)$,

where $w_k$ is contained in the array "kpoint_weights" of K-points, and $f_{n, k}$ is contained in the array "occupations" of States. This relation generalizes to the collinear spin-polarized case, as well as the non-collinear case by taking into account the "number_of_components" defined in Dimensions that can be split, and the direction of the magnetization vector.

Appendix A: Some information on the NetCDF size limitation

To summarize:

The 2GB limit is firstly a FILE-SIZE limit of operating systems on 32-bits machine (and some non-updated 64-bits old-operating-systems). And this cannot be overcome, even splitting wavefunctions into nbands*nkpoints variables.
Assuming your machine can store >2GB files, the NetCDF has in general a limit of 4GB. BUT even with the actual version you can store in NetCDF at least one variable (the last) up to Terabytes, and probably in future this will be extended to also the non last variables.

NetCDF 64-bit Offset Format Limitations

Although the 64-bit offset format allows the creation of much larger NetCDF files than was possible with the classic format, there are still some restrictions on the size of variables. It is important to note that without Large File Support (LFS) in the operating system, it is impossible to create any file larger than 2 GBytes. Assuming an operating system with LFS, the following restrictions apply to the NetCDF 64-bit offset format:

No fixed-size variable can require more than 2^32^ - 4 bytes (i.e. 4GB - 4 bytes, or 4,294,967,292 bytes) of storage for its data, unless it is the last fixed-size variable and there are no record variables. When there are no record variables, the last fixed-size variable can be any size supported by the file system, e.g. terabytes.
A 64-bit offset format NetCDF file can have up to 2^32^ - 1 fixed sized variables, each under 4GB in size. If there are no record variables in the file the last fixed variable can be any size. No record variable can require more than 2^32^ - 4 bytes of storage for each record’s worth of data, unless it is the last record variable. A 64-bit offset format NetCDF file can have up to 2^32^ - 1 records, of up to 2^32^ - 1 variables, as long as the size of one record’s data for each record variable except the last is less than 4 GB - 4.

Note also that all NetCDF variables and records are padded to 4-byte boundaries.

Appendix B: List of things under debate

Should formulate specification for other susceptibilities (spin, frequency, real or reciprocal representation), electron-phonon, dynamical matrices
Tolerances / treshhold for equality of two double numbers (e.g. k points, when given explicitely) ; one might define a tolerance in the specif, or define some tolerance variables ?!
Specification should be clarified about Monkhorst-Pack sampling in case where the original article refers to conventional reciprocal lattice, and not primitive one
Should discuss symmetries in case of magnetization (collinear and non-collinear).
Should plan PAW / USPP generalisation , perhaps LAPW ?
Should debate about the interest of a pseudopotential specif - perhaps PAW/USPP atomic data.

Appendix C. List of ETSF NetCDF agreed names

Here is a list of all the names of agreed variables, attributes, and dimensions names, in alphabetical order.

Note: all the variables/dimensions beginning with "my_" refer to split files, and are explained in Auxiliary dimensions for splitting and Auxiliary variables for splitting.

Name	Type	Table
atom_species	Variable	Atomic structure and symmetry operations
atom_species_names	Variable	Atomic structure and symmetry operations
atomic_numbers	Variable	Atomic structure and symmetry operations
basis_set	Variable	Wavefunctions
character_string_length	Dimension	Dimensions that cannot be split
chemical_symbols	Variable	Atomic structure and symmetry operations
coefficients_of_wavefunctions	Variable	Wavefunctions
Conventions	Global attribute	Mandatory attributes
coordinates_of_basis_grid_points	Variable	Wavefunctions
correlation_functional	Variable	Electronic structure
correlation_potential	Variable	Exchange and correlation
density	Variable	Density
eigenvalues	Variable	States
exchange_correlation_potential	Variable	Exchange and correlation
exchange_functional	Variable	Electronic structure
exchange_potential	Variable	Exchange and correlation
fermi_energy	Variable	Electronic structure
file_format	Global attribute	Mandatory attributes
file_format_version	Global attribute	Mandatory attributes
gw_corrections	Variable	BSE/GW
history	Global attribute	Optional attributes
k_dependent	Attribute	States
kb_formfactor_sign	Variable	BSE/GW
kb_formfactors	Variable	BSE/GW
kb_formfactor_derivative	Variable	BSE/GW
kinetic_energy_cutoff	Variable	Reciprocal space
kpoint_grid_shift	Variable	Reciprocal space
kpoint_grid_vectors	Variable	Reciprocal space
kpoints_weights	Variable	K-points
max_number_of_angular_momenta	Dimension	BSE/GW
max_number_of_basis_grid_points	Dimension	Dimensions that can be split
max_number_of_coefficients	Dimension	Dimensions that can be split
max_number_of_projectors	Dimension	BSE/GW
max_number_of_states	Dimension	Dimensions that can be split
monkhorst_pack_folding	Variable	Reciprocal space
number_of_atoms	Dimension	Dimensions that cannot be split
number_of_atom_species	Dimension	Dimensions that cannot be split
number_of_cartesian_directions	Dimension	Dimensions that cannot be split
number_of_coefficients	Variable	Wavefunctions
number_of_coefficients_per_grid_point	Variable	Wavefunctions
number_of_components	Dimension	Dimensions that can be split
number_of_electrons	Variable	Electronic structure
number_of_grid_points_vector1	Dimension	Dimensions that can be split
number_of_grid_points_vector2	Dimension	Dimensions that can be split
number_of_grid_points_vector3	Dimension	Dimensions that can be split
number_of_kpoints	Dimension	Dimensions that can be split
number_of_localization_regions	Dimension	Dimensions that can be split
number_of_reduced_dimensions	Dimension	Dimensions that cannot be split
number_of_spinor_components	Dimension	Dimensions that can be split
number_of_spins	Dimension	Dimensions that can be split
number_of_states	Variable	States
number_of_symmetry_operations	Dimension	Dimensions that cannot be split
number_of_vectors	Dimension	Dimensions that cannot be split
occupations	Variable	States
order_of_Daubechies_wavelets	Variable	Wavefunctions
primitive_vectors	Variable	Atomic structure and symmetry operations
pseudopotential_types	Variable	Atomic information
real_or_complex_coefficients	Variable	Dimensions that cannot be split
real_or_complex_density	Variable	Dimensions that cannot be split
real_or_complex_gw_corrections	Variable	Dimensions that cannot be split
real_or_complex_potential	Variable	Dimensions that cannot be split
real_or_complex_wavefunctions	Variable	Dimensions that cannot be split
real_space_wavefunctions	Variable	Wavefunctions
reduced_atom_positions	Variable	Atomic structure and symmetry operations
reduced_coordinates_of_kpoints	Variable	K-points
reduced_coordinates_of_plane_waves	Variable	Wavefunctions
reduced_symmetry_matrices	Variable	Atomic structure and symmetry operations
reduced_symmetry_translations	Variable	Atomic structure and symmetry operations
scale_to_atomic_units	Attribute	Generic attributes of variables
smearing_scheme	Variable	Electronic structure
smearing_width	Variable	Electronic structure
space_group	Variable	Atomic structure and symmetry operations
symbol_length	Dimension	Dimensions that cannot be split
symmorphic	Attribute	Atomic structure and symmetry operations
title	Global attribute	Optional attributes
units	Attribute	Generic attributes of variables
used_time_reversal_at_gamma	Attribute	Wavefunctions
valence_charges	Variable	Atomic information