HDF5 C++ 接口：编写动态二维数组答案

【问题标题】：HDF5 C++ interface: writing dynamic 2D arraysHDF5 C++ 接口：编写动态二维数组
【发布时间】：2011-11-16 17:50:22
【问题描述】：

我正在使用HDF5 C++ API 编写二维数组数据集文件。 HDF 组有an example to create 一个来自静态定义数组大小的 HDF5 文件，我已经对其进行了修改以满足我的以下需求。但是，我需要一个动态数组，其中NX 和NY 都是在运行时确定的。我找到了another solution to create 2D arrays using the "new" keyword 来帮助创建一个动态数组。这是我所拥有的：

#include "StdAfx.h"
#include "H5Cpp.h"
using namespace H5;

const H5std_string FILE_NAME("C:\\SDS.h5");
const H5std_string DATASET_NAME("FloatArray");
const int NX = 5; // dataset dimensions
const int NY = 6;

int main (void)
{
    // Create a 2D array using "new" method
    double **data = new double*[NX];
    for (int j = 0; j < NX; j++)         // 0 1 2 3 4 5
    {                                    // 1 2 3 4 5 6
        data[j] = new double[NY];        // 2 3 4 5 6 7
        for (int i = 0; i < NY; i++)     // 3 4 5 6 7 8
            data[j][i] = (float)(i + j); // 4 5 6 7 8 9
    }

    // Create HDF5 file and dataset
    H5File file(FILE_NAME, H5F_ACC_TRUNC);
    hsize_t dimsf[2] = {NX, NY};
    DataSpace dataspace(2, dimsf);
    DataSet dataset = file.createDataSet(DATASET_NAME, PredType::NATIVE_DOUBLE,
                                            dataspace);
    // Attempt to write data to HDF5 file
    dataset.write(data, PredType::NATIVE_DOUBLE);

    // Clean up
    for(int j = 0; j < NX; j++)
        delete [] data[j];
    delete [] data;
    return 0;
}

但是，生成的文件与预期不符（hdf5dump 的输出）：

HDF5 "SDS.h5" {
GROUP "/" {
   DATASET "FloatArray" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
      DATA {
      (0,0): 4.76465e-307, 4.76541e-307, -7.84591e+298, -2.53017e-098, 0,
      (0,5): 3.8981e-308,
      (1,0): 4.76454e-307, 0, 2.122e-314, -7.84591e+298, 0, 1,
      (2,0): 2, 3, 4, 5, -2.53017e-098, -2.65698e+303,
      (3,0): 0, 3.89814e-308, 4.76492e-307, 0, 2.122e-314, -7.84591e+298,
      (4,0): 1, 2, 3, 4, 5, 6
      }
   }
}
}

问题源于 2D 数组的创建方式（因为此示例适用于静态数组方法）。据我了解this email thread：

HDF5 库需要一个连续的元素数组，而不是指向低维元素的指针

由于我对 C++/HDF5 比较陌生，我不确定如何在运行时创建一个动态大小的数组，它是一个连续的元素数组。我不想做电子邮件线程中描述的更复杂的“hyperslab”方法，因为这看起来过于复杂。任何帮助表示赞赏。

【问题讨论】：

标签： c++ multidimensional-array dynamic-data hdf5

【解决方案1】：

好吧，我对 HDF5 一无所知，但是可以使用大小为 NX * NY 的一维数组来模拟具有连续缓冲区的 C++ 中的动态二维数组。例如：

分配：

double *data = new double[NX*NY];

元素访问：

 data[j*NY + i]

（而不是data[j][i]）

【讨论】：

这个解决方案实现起来很简单，并且可以很好地扩展（我使用的是包含 21.9M 个元素的数组大小）。结果在 HDF5 文件输出中完美验证。
像魅力一样工作。谢谢。
我知道这已经很老了，但是，我认为最后一句话有错误。它应该是：（而不是 data[i][j]）
@pablo_worker：如果 j 在 0..(NX-1) 范围内并且 i 在 0..(NY-1) 范围内，则最后一句是正确的，就像在 OP 的问题中一样。
@DocBrown 你是对的。我总是对行使用i，对列使用j，我没有注意到OP 使用了不同的符号。对不起。

【解决方案2】：

在科学编程中，通常将多维数组表示为一个大的一维数组，然后根据多维索引计算相应的偏移量，例如正如 Doc Brown 的回答中所见。

或者，您可以重载下标运算符 (operator[]()) 以提供允许使用由一维数组支持的多维索引的接口。或者更好的是，使用执行此操作的库，例如 Boost multi_array。或者，如果您的二维数组是矩阵，您可以使用一个不错的 C++ 线性代数库，例如 Eigen。

【讨论】：

【解决方案3】：

这里是如何以 HDF5 格式编写 N 维数组

最好使用boost multi_array 类。这相当于使用std::vector 而不是原始数组：它为您完成所有内存管理，您可以使用熟悉的下标（例如data[12][13] = 46）像原始数组一样高效地访问元素

这是一个简短的例子：

#include <algorithm>
#include <boost/multi_array.hpp>
using boost::multi_array;
using boost::extents;

// dataset dimensions set at run time
int NX = 5,  NY = 6,  NZ = 7;


// allocate array using the "extents" helper. 
// This makes it easier to see how big the array is
multi_array<double, 3>  float_data(extents[NX][NY][NZ]);

// use resize to change size when necessary
// float_data.resize(extents[NX + 5][NY + 4][NZ + 3]);


// This is how you would fill the entire array with a value (e.g. 3.0)
std::fill_n(float_data.data(), float_data.num_elements(), 3.0)

// initialise the array to some variables
for (int ii = 0; ii != NX; ii++)
    for (int jj = 0; jj != NY; jj++)
        for (int kk = 0; kk != NZ; kk++)
            float_data[ii][jj][kk]  = ii + jj + kk

// write to HDF5 format
H5::H5File file("SDS.h5", H5F_ACC_TRUNC);
write_hdf5(file, "doubleArray", float_data );

最后一行调用了一个函数，该函数可以写入任何维度和任何标准数字类型（ints、chars、floats 等）的multi_arrays。

这是write_hdf5()的代码。

首先，我们必须将 c++ 类型映射到 HDF5 类型（来自H5 c++ api）：

#include <cstdint>

//!_______________________________________________________________________________________
//!     
//!     map types to HDF5 types
//!         
//!     
//!     \author lg (04 March 2013)
//!_______________________________________________________________________________________ 

template<typename T> struct get_hdf5_data_type
{   static H5::PredType type()  
    {   
        //static_assert(false, "Unknown HDF5 data type"); 
        return H5::PredType::NATIVE_DOUBLE; 
    }
};
template<> struct get_hdf5_data_type<char>                  {   H5::IntType type    {   H5::PredType::NATIVE_CHAR       };  };
//template<> struct get_hdf5_data_type<unsigned char>       {   H5::IntType type    {   H5::PredType::NATIVE_UCHAR      };  };
//template<> struct get_hdf5_data_type<short>               {   H5::IntType type    {   H5::PredType::NATIVE_SHORT      };  };
//template<> struct get_hdf5_data_type<unsigned short>      {   H5::IntType type    {   H5::PredType::NATIVE_USHORT     };  };
//template<> struct get_hdf5_data_type<int>                 {   H5::IntType type    {   H5::PredType::NATIVE_INT        };  };
//template<> struct get_hdf5_data_type<unsigned int>        {   H5::IntType type    {   H5::PredType::NATIVE_UINT       };  };
//template<> struct get_hdf5_data_type<long>                {   H5::IntType type    {   H5::PredType::NATIVE_LONG       };  };
//template<> struct get_hdf5_data_type<unsigned long>       {   H5::IntType type    {   H5::PredType::NATIVE_ULONG      };  };
template<> struct get_hdf5_data_type<long long>             {   H5::IntType type    {   H5::PredType::NATIVE_LLONG      };  };
template<> struct get_hdf5_data_type<unsigned long long>    {   H5::IntType type    {   H5::PredType::NATIVE_ULLONG     };  };
template<> struct get_hdf5_data_type<int8_t>                {   H5::IntType type    {   H5::PredType::NATIVE_INT8       };  };
template<> struct get_hdf5_data_type<uint8_t>               {   H5::IntType type    {   H5::PredType::NATIVE_UINT8      };  };
template<> struct get_hdf5_data_type<int16_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT16      };  };
template<> struct get_hdf5_data_type<uint16_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT16     };  };
template<> struct get_hdf5_data_type<int32_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT32      };  };
template<> struct get_hdf5_data_type<uint32_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT32     };  };
template<> struct get_hdf5_data_type<int64_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT64      };  };
template<> struct get_hdf5_data_type<uint64_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT64     };  };
template<> struct get_hdf5_data_type<float>                 {   H5::FloatType type  {   H5::PredType::NATIVE_FLOAT      };  };
template<> struct get_hdf5_data_type<double>                {   H5::FloatType type  {   H5::PredType::NATIVE_DOUBLE     };  };
template<> struct get_hdf5_data_type<long double>           {   H5::FloatType type  {   H5::PredType::NATIVE_LDOUBLE    };  };

然后我们可以使用一些模板转发魔法来创建一个正确类型的函数来输出我们的数据。由于这是模板代码，如果您要从程序中的多个源文件中输出 HDF5 数组，则它需要存在于头文件中：

//!_______________________________________________________________________________________
//!     
//!     write_hdf5 multi_array
//!         
//!     \author leo Goodstadt (04 March 2013)
//!     
//!_______________________________________________________________________________________
template<typename T, std::size_t DIMENSIONS, typename hdf5_data_type>
void do_write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data, hdf5_data_type& datatype)
{
    // Little endian for x86
    //FloatType datatype(get_hdf5_data_type<T>::type());
    datatype.setOrder(H5T_ORDER_LE);

    vector<hsize_t> dimensions(data.shape(), data.shape() + DIMENSIONS);
    H5::DataSpace dataspace(DIMENSIONS, dimensions.data());

    H5::DataSet dataset = file.createDataSet(data_set_name, datatype, dataspace);

    dataset.write(data.data(), datatype);
}

template<typename T, std::size_t DIMENSIONS>
void write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data )
{

    get_hdf5_data_type<T> hdf_data_type;
    do_write_hdf5(file, data_set_name, data, hdf_data_type.type);
}

【讨论】：

【解决方案4】：

我也一直在为类似的问题苦苦挣扎。由于某些原因，我需要在 C++ 中处理数据流，但最终我想使用 numpy 和 matplotlib 的优点来分析 Python 中生成的 HDF。解决方案比预期的要简单。首先，我声明我真正需要的任何形状的数据空间。

hsize_t dims[2] = {rows, cols};         
dataspace = new DataSpace(2, dims);
dataset = new DataSet(group->createDataSet("data", PredType::STD_U16LE, *dataspace));

接下来我使用一维动态数组并填充它，记住元素 [i][j] 在位置 [i * cols + j]

unsigned short* hits = new unsigned short[cols * rows]; (...) hits[i * cols + j] = foo; (...) 现在有趣的部分。由于DataSet.write 接受void*，它并不关心你通过什么。它只需要连续的元素数组，形状由DataSpace 定义解释。由于我们的动态数组是连续的，具有正确的整体大小和元素顺序，因此您可以简单地编写它。

dataset->write(hits, PredType::STD_U16LE);

如果您稍后读取 HDF5 文件，则生成的数组会被正确解释为 2D。

【讨论】：

【解决方案5】：

实际上，“hyperslab”方法实现起来并不是很复杂。只需要修改“写”部分：

dataset.write(data, PredType::NATIVE_DOUBLE);

在输出前在数据空间中选择一个hyperslab：

#include "H5Cpp.h"
using namespace H5;

const H5std_string FILE_NAME("SDS.h5");
const H5std_string DATASET_NAME("FloatArray");
const int NX = 5; // dataset dimensions
const int NY = 6;

int main ()
{
    // Create a 2D array using "new" method
    double **data = new double*[NX];
    for (int j = 0; j < NX; j++)         // 0 1 2 3 4 5
    {                                    // 1 2 3 4 5 6
        data[j] = new double[NY];        // 2 3 4 5 6 7
        for (int i = 0; i < NY; i++)     // 3 4 5 6 7 8
            data[j][i] = (float)(i + j); // 4 5 6 7 8 9
    }

    // Create HDF5 file and dataset
    H5File file(FILE_NAME, H5F_ACC_TRUNC);
    hsize_t dimsf[2] = {NX, NY};
    DataSpace dataspace(2, dimsf);
    DataSet dataset = file.createDataSet(DATASET_NAME, PredType::NATIVE_DOUBLE,
                                             dataspace);
    
    // The above codes are the same.    

    hsize_t start[2]={0, 0}, count[2]={1, NY};
    // Create memory space for one line
    DataSpace memspace(2, count);

    for(int k=0; k<NX; k++)
    {
        start[0] = k;

        // select the hyperslab for one line
        dataspace.selectHyperslab(H5S_SELECT_SET, count, start, NULL, NULL);

        // Attempt to write data to HDF5 file
        dataset.write(data[k], PredType::NATIVE_DOUBLE, memspace, dataspace);
        /*
        * memspace: dataspace specifying the size of the memory that needs to be written
        * dataspace: dataspace sepcifying the portion of the dataset that needs to be written
        */

        // Reset the selection for the dataspace.
        dataspace.selectNone();
    }

    // Clean up
    for(int j = 0; j < NX; j++)
        delete [] data[j];
    delete [] data;
    return 0;
}

生成的文件是正确的：

HDF5 "SDS.h5" {
GROUP "/" {
   DATASET "FloatArray" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
      DATA {
      (0,0): 0, 1, 2, 3, 4, 5,
      (1,0): 1, 2, 3, 4, 5, 6,
      (2,0): 2, 3, 4, 5, 6, 7,
      (3,0): 3, 4, 5, 6, 7, 8,
      (4,0): 4, 5, 6, 7, 8, 9
      }
   }
}
}

【讨论】：