Recently, I updated our meshfree partitioner to support writing files in the Hierarchical Data Format 5 (HDF5) file format. At the time of writing this post, we were handling more than six different file formats for our meshfree partitioner.
With an increase in complexity, we needed a standard file format that many solvers can use.
The solution? Standardize the output format using HDF5.
We chose HDF5 since a significant portion of people in the high-performance computing (HPC) and computational fluid dynamics (CFD) community prefer using it.
While Kumar has already implemented it in the Fortran 90 Solver, I’ll be talking about the Pythonic version of the same.
Requirements
- Python 3.x (I’m using 3.8.5)
- H5py (The HDF5 Python Wrapper Library)
The File Structure
The below example is from a grid file containing 6415 points. You may download the h5 file from here.
To view the file you can use the h5dump
command to print to the console.
|
|
How is the file structured?
The HDF5 file consists of groups, attributes, and datasets. Groups and Datasets are similar to how files are stored in folders. Similarly, attributes can be compared to how file and folder in the filesystem have their respective attributes.
In our case since we use METIS
to decompose our point distribution, we have several partitioned grids. Each partitioned grid contains a set of points. Each point has a set of neighbor points known as the connectivity set. They are identified by their index in the partitioned grid.
These points are further classified into local
and ghost
. local
are points whose neighbor points all exist in the same partition. ghost
points are points that have at least one neighbor point which exists in another partition.
For any HDF5 file, there exists a /
group also known as the root group. This is similar to how the file system structure exists in Linux based systems. For example, if we have two groups group1
and group2
they will be accessed as /group1
and /group2
.
Alternatively, if group2
is a subgroup of group1
we can access it as /group1/group2
.
Now, for every partition, we create a group whose name corresponds to the index (one-based indexing) in the partitioned grid. We assign attributes
such as local
, ghost
, and total
points which refer to the total number of local, ghost, and local+ghost points in that partition.
We then create a dataset called local
a (number of local points, 30) array. This array contains our list of local points and corresponding data for each of those points.
Likewise, we create another dataset called ghost
a (number of ghost points, 4) array.
The Code
We first import the h5py
module.
|
|
Now, we read the hdf5 file using the File
method. We open the file in read mode, r.
|
|
We, read the number of partitions in the file using keys()
method.
|
|
We, now use the range()
and loop over from 1 -> partitions.
|
|
Using the get()
method we can access the dataset provided we give the path.
Our path for accessing local
points would be as follows.
|
|
|
|
Use the shape
method to get the shape of the dataset.
|
|
Similarly, we can replace the text local
with ghost
to access the ghost
dataset.
Finally, we close the hdf5 file handler with close()
method.
|
|
The entire code can be summed up as.
|
|
Summary
It’s quite easy and simple to read HDF5 files in Python. Furthermore, HDF5 offer some nifty features like chunking, data compression and a simple to understand hierarchical structure.
Additionally, a HDF5 file created in Python can be easily read by a program written in another language such as C++ or Fortran. This allows HDF5 to be extremely flexible with it comes to playing around with different languages.