HDF5, the XML for numerical data

At my work place, the Gear department of the WZL of the RWTH Aachen , I successfully advertised the usage of HDF5 as storage container for numerical data as a replacement for massive ASCII text file which are read/written with custom parsers/emitters that have to be developed and more importantly maintained.

I told people to think of HDF5 as XML for numerical data. It makes time consuming and error prone parser and emitter development obsolete.  This is especially important for us since data is exchanged between Fortran and Java programs. Hence for a given ASCII file there are two implementations of each emitter and parser, which makes $(1+2)*2 = 6$ (for each of the 2 emitters there are the emitter itself and 2 parsers involved) combinations of entities that can go wrong.  This is just the production code, as we also use some C++ to create plugins for proprietary applications, Matlab for algorithm RAD and  debugging, python for data visualization with Mayavi2 where Matlab doesn’t fit the needs. That makes $(1+5)*5 = 30$ .  If there are N ASCII formats to exchange data that makes $30*N$ ways to screw up just for exchanging data.

With HDF5 parsing and emitting is taken care of.  Matlab supports HDF5 (though only a subset) natively, for python there is h5py.

The downside is that HDF5 adds complexity to the build process, since its not just Fortran code that needs to be built.  But this works out pretty well since I’ve also put a lot of effort into the fortran90 support in CMake and successfully advertised the use of it  for our Fortran builds. But thats a story for another posting ^^.