【问题标题】:Reading string array from a HDF5 dataset从 HDF5 数据集中读取字符串数组
【发布时间】:2014-06-11 07:28:07
【问题描述】:

我正在尝试将 C# 中的 HDF5 文件中的字符串数据集读取到字符串数组中。我能够使用以下代码读入数据集:

//read the no of rows and columns
var datasetID = H5D.open(fileId,"dimensions");
var dataTypeId = H5D.getType(datasetID);
var dataType = H5T.getClass(dataTypeId);
var length = H5T.getSize(dataTypeId);
int[] dDim = new int[length];

H5D.read(datasetID, dataTypeId, new H5Array<int>(dDim));

我尝试对字符串数据集执行相同的操作,但我将所有值初始化为 null。所以我提到了这个链接(https://www.mail-archive.com/hdf-forum@hdfgroup.org/msg02980.html)。我能够将它们读取为字节,但我不知道字节数组应该初始化的大小。我现在要读取字符串的代码是这样的:

//read string
datasetID = H5D.open(fileId, "names");
var dataSpaceId = H5D.getSpace(datasetID);
long[] dims = H5S.getSimpleExtentDims(dataSpaceId);
dataTypeId = H5T.copy(H5T.H5Type.C_S1);

//hard coding the no of string to read (213)
byte[] buffer = new byte[dims[0]*213]; 
Console.WriteLine(dims[0]);
H5D.read(datasetID, dataTypeId, new H5Array<byte>(buffer));
Console.WriteLine(System.Text.ASCIIEncoding.ASCII.GetString(buffer)); `.

【问题讨论】:

  • 我不明白你的问题到底是什么。它是否有效?如果没有,错误是什么?请详细说明...
  • 我在数据集中找不到字符串的编号。
  • 它可以工作,但我必须硬编码字符串的编号。有什么方法可以让我知道我需要初始化的字节数组的大小而不用硬编码?
  • 是否可以显示字符串是如何定义的? H5T.getSize() 的输出应该如example 中所示工作?

标签: c# .net hdf5


【解决方案1】:

如果您事先不知道您的数据类型是什么,请尝试以下代码。数据类型不完整,但很容易修改:

public static Array Read1DArray(this H5FileId fileId, string dataSetName)
    {
        var dataset = H5D.open(fileId, dataSetName);
        var space = H5D.getSpace(dataset);
        var dims = H5S.getSimpleExtentDims(space);
        var dtype = H5D.getType(dataset);

        var size = H5T.getSize(dtype);
        var classID = H5T.getClass(dtype);

        var rank = H5S.getSimpleExtentNDims(space);
        var status = H5S.getSimpleExtentDims(space);

        // Read data into byte array
        var dataArray = new Byte[status[0]*size];
        var wrapArray = new H5Array<Byte>(dataArray);
        H5D.read(dataset, dtype, wrapArray);

        // Convert types
        Array returnArray = null;
        Type dataType = null;

        switch (classID)
        {
            case H5T.H5TClass.STRING:
                dataType = typeof(string);
                break;

            case H5T.H5TClass.FLOAT:
                if (size == 4)
                    dataType = typeof(float);
                else if (size == 8)
                    dataType = typeof(double);
                break;

            case H5T.H5TClass.INTEGER:
                if (size == 2)
                    dataType = typeof(Int16);
                else if (size == 4)
                    dataType = typeof(Int32);
                else if (size == 8)
                    dataType = typeof(Int64);
                break;

        }

        if (dataType == typeof (string))
        {
            var cSet = H5T.get_cset(dtype);

            string[] stringArray = new String[status[0]];

            for (int i = 0; i < status[0]; i++)
            {
                byte[] buffer = new byte[size];
                Array.Copy(dataArray, i*size, buffer, 0, size);

                Encoding enc = null;
                switch (cSet)
                {
                    case H5T.CharSet.ASCII:
                        enc = new ASCIIEncoding();
                        break;
                    case H5T.CharSet.UTF8:
                        enc = new UTF8Encoding();
                        break;
                    case H5T.CharSet.ERROR:
                        break;
                }

                stringArray[i] = enc.GetString(buffer).TrimEnd('\0');
            }

            returnArray = stringArray;
        }
        else
        {
            returnArray = Array.CreateInstance(dataType, status[0]);
            Buffer.BlockCopy(dataArray, 0, returnArray, 0, (int) status[0]*size);
        }

        H5S.close(space);
        H5T.close(dtype);
        H5D.close(dataset);

        return returnArray;
    }

【讨论】:

    【解决方案2】:

    您的开始非常有帮助!有了它和HDF5 Example code 的一些帮助,我能够提出一些通用扩展,这会将您的代码减少到:

    //read string
    string[] datasetValue = fileId.Read1DArray<string>("names");
    

    扩展看起来像这样(这与引用的问题完全相同。):

    public static class HdfExtensions
    {
        // thank you https://stackoverflow.com/questions/4133377/splitting-a-string-number-every-nth-character-number
        public static IEnumerable<String> SplitInParts(this String s, Int32 partLength)
        {
            if (s == null)
                throw new ArgumentNullException("s");
            if (partLength <= 0)
                throw new ArgumentException("Part length has to be positive.", "partLength");
    
            for (var i = 0; i < s.Length; i += partLength)
                yield return s.Substring(i, Math.Min(partLength, s.Length - i));
        }
    
        public static T[] Read1DArray<T>(this H5FileId fileId, string dataSetName)
        {
            var dataset = H5D.open(fileId, dataSetName);
            var space = H5D.getSpace(dataset);
            var dims = H5S.getSimpleExtentDims(space);
            var dataType = H5D.getType(dataset);
            if (typeof(T) == typeof(string))
            {
                int stringLength = H5T.getSize(dataType);
                byte[] buffer = new byte[dims[0] * stringLength];
                H5D.read(dataset, dataType, new H5Array<byte>(buffer));
                string stuff = System.Text.ASCIIEncoding.ASCII.GetString(buffer);
                return stuff.SplitInParts(stringLength).Select(ss => (T)(object)ss).ToArray();
            }
            T[] dataArray = new T[dims[0]];
            var wrapArray = new H5Array<T>(dataArray);
            H5D.read(dataset, dataType, wrapArray);
            return dataArray;
        }
    
        public static T[,] Read2DArray<T>(this H5FileId fileId, string dataSetName)
        {
            var dataset = H5D.open(fileId, dataSetName);
            var space = H5D.getSpace(dataset);
            var dims = H5S.getSimpleExtentDims(space);
            var dataType = H5D.getType(dataset);
            if (typeof(T) == typeof(string))
            {
                 // this will also need a string hack...
            }
            T[,] dataArray = new T[dims[0], dims[1]];
            var wrapArray = new H5Array<T>(dataArray);
            H5D.read(dataset, dataType, wrapArray);
            return dataArray;
        }
    }
    

    【讨论】:

      猜你喜欢
      • 2015-06-04
      • 2016-06-05
      • 2017-09-29
      • 2011-09-05
      • 2013-12-19
      • 2012-11-28
      • 1970-01-01
      • 2013-01-03
      • 2020-06-21
      相关资源
      最近更新 更多