License | Authors | Download |
---|---|---|
LGPL v3 | D. Caliste, F. Corsetti, J. Minar, M. Oliveira, Y. Pouillon, T. Ruh and D. Strubbe | Gitlab |
libescdf is a library containing tools for reading and writing massive data structures related to electronic structure calculations, following the standards defined in ESCDF - Electronic Structure Common Data Format. It is under development.
Platform-independence, parallel I/O, ...
While the API may present multidimensional arrays to the user, all arrays are internally stored as one-dimensional vectors. This circumvents a strong limitation of Fortran only allowing 7 dimensions at most, an upper bound that can easily be exceeded by variables such as wavefunctions.
To make it as flexible as possible, the API was designed having in mind the following points:
It would have been nice to have a way of avoiding users to create incorrect or incomplete files. Unfortunately, because of the dependencies between different chunks of data, the only way to achieve this would be to only write a group to disk once all the information required to perform the validation is available. This is not practical and would have required to keep track of the synchronization between the data stored in memory and the one stored on disk.
When initializing the library, either by opening an existing ESCDF file or by creating a new one, the library returns a handle. This handle is an abstract reference to that file and all access to a file through Libescd is done using that handle.
The C structure is named escdf_handle_t and contain the following data:
hid_t file_id
: the HDF5 handler for the file.hid_t group_id
: the HDF5 handler for the root group.The following functions are provided:
escdf_handle_t * escdf_open(const char *filename, const char *path)
This function creates an instance of escdf_handle_t by allocating
the memory and opens an existing file. Optionally, it considers the
root group to be given by path
if it is not NULL
. Note that this
will return an error if the file does not exist.
escdf_handle_t * escdf_create(const char *filename, const char *path)
This function creates an instance of escdf_handle_t by allocating
the memory and creates a file. Optionally, it considers the root
group to be given by path
if it is not NULL
.
escdf_errno_t escdf_close(escdf_handle_t *handle)
This function closes a previously opened file.
For each group allowed by the specifications (system, densities, etc), there is a C structure plus several functions. These are described bellow. In the following we use the system group as an example.
The C structure is named escdf_system
and contain the following data:
hid_t group_id
: the HDF5 handler for the group.Furthermore, the structure is private, that is, it is declared in the C
file and a typedef struct escdf_system escdf_system_t
declaration can
be found in the corresponding header file.
The difference between what is called metadata and what is called data is not immediately obvious from the specifications, but the way they are handled by the library is different. The metadata is stored on disk and there is a copy of it in the C structure. As for the data, it is never explicitly stored in the structure. Instead, it is always directly written/read to/from the file on disk.
The following low-level functions are provided:
escdf_system_t * escdf_system_new()
This function takes care of creating an instance of escdf_system_t
by allocating the memory and it also initializes all its contents to
the default values.
void escdf_system_free(escdf_system_t *system)
This function frees all the memory associated with the instance of
the structure, including the instance itself.
escdf_errno_t escdf_system_open_group(escdf_system_t *system, escdf_handle_t *handle, const char *path)
This function opens an group from the file managed by the handle. If
path
is NULL
, the group path is system, otherwise it is
system/path. Note that this will return an error if the group
does not exist.
escdf_errno_t escdf_system_create_group(escdf_system_t *system, escdf_handle_t *handle, const char *path)
This function creates a group within the file managed by the handle.
If path
is NULL
, the group path is system, otherwise it is
system/path.
escdf_errno_t escdf_system_close_group(escdf_system_t *system)
This function closes the group.
The library provides the following high-level creators and destructors:
escdf_system_t * escdf_system_open(escdf_handle_t *handle)
This function performs the following tasks:
escdf_system_new
to create an instance of the structure.escdf_system_open_group
. Note that this function will
return an error if the group does not exist.escdf_system_read_metadata
to read all the metadata from
the file and store it in memory.escdf_system_is_correct
and escdf_system_is_complete
to
verify if the data is valid. Return an error code if not.escdf_system_t * escdf_system_create(escdf_handle_t *handle)
This function performs the following tasks:
escdf_system_new
to create an instance of the structure.escdf_system_create_group
. Note that this function will
delete all previous contents of the group.escdf_errno_t escdf_system_close(escdf_system_t *group)
This function performs the following tasks:
escdf_system_is_correct
and escdf_system_is_complete
.escdf_system_close_group
to close the group.escdf_system_free
to free all memory.escdf_errno_t escdf_system_read_metadata(escdf_system_t *system)
This function reads all the metadata from the file on disk and
stores it in memory. Note: it is the responsibility of the user to
call this function whenever the contents of the file change.
escdf_errno_t escdf_system_set_*
The setters should start by writing the data to the disk. Once the
data is successfully written to the file, it is copied to the
structure in memory. It is recommended that different metadata that
only make sense when taken together be set by calling a single set
function rather than by calling several different set functions.
escdf_errno_t escdf_system_get_*
Getters should simply return the values stored in memory.
escdf_errno_t escdf_system_copy_metadata(const escdf_system_t *src, escdf_system_t *dst)
This function copies the content of the metadata from one
escdf_system_t structure to another. Once done, write the metadata
to the file of the destination group. Note that it is the
responsibility of the user to modify the destination group in any
necessary way to make it valid.
escdf_system_write_*
These functions should take as argument a buffer containing all or
part of the data to be written to a given dataset. Any attributes of
the dataset, like units, should be passed as arguments of the
function.
escdf_system_read_*
These functions read all or part of the data stored in the dataset
and copy it to a buffer passed as argument. Any attributes of the
dataset, like units, should be returned as arguments of the
function.
Both the read and write functions should take care of any necessary data reordering to read/write in parallel.
Coming soon...
There are basically two types of validation that can be performed on the content of the group: correctness and completeness. For a group to be considered as obeying the ESCDF specifications it must be both complete and correct.
The content of the group is considered to be correct if all the pieces of metadata and data that are set or present are correct. The correctness of some metadata or data may depend on the values of other metadata or data. In that case, those checks should only be performed when all the corresponding metadata and/or data are present. Note that if a piece of metadata or a data is not present, the file will never be considered to be valid, as it will fail the completeness test.
The content of the group is said to be complete if all the attributes and datasets that the specifications say are mandatory are set or present.
Therefore, the library provides these two functions:
bool escdf_system_is_correct(escdf_system_t *system)
This functions checks that all the data and metadata that is set is
correct, that is, that it satisfies all the ranges, constrains,
dimensions, etc that are mentioned in the ESCDF specifications. If
two pieces of data/metadata have some sort of dependence, then that
dependence is only checked if both are present/set.
bool escdf_system_is_complete(escdf_group_t *system)
This function checks that all the attributes and datasets that the
specifications say are mandatory are set or present.