如何从文件中读取矩阵？答案

【问题标题】：How do I read a matrix from a file?如何从文件中读取矩阵？
【发布时间】：2016-05-10 19:59:06
【问题描述】：

我知道同样的问题已经被问过一百次了，但他们要么没有帮助我，要么没有得到回答。

我想读取一个文本文件，其中包含一些格式如下的整数：

1;50
2;40
3;180

这个文件可以永久保存，所以我不能创建一个固定大小的数组。到目前为止我所做的总结部分（完整的代码不是这样的，我检查了文件是否不为空，创建文件指针，将它们放在不同的函数中等）：

int **mymatrix;
mymatrix =(int **) malloc(sizeof(int*)*1);
fscanf(file, "%d", &mymatrix[0]);
fscanf(file, ";%d", &mymatrix[1]);

然后打印出来：

printf("%d",  *mymatrix[0]);
printf(" %d", *mymatrix[0]);

我看过一些类似的问题，并从中学习了 malloc 行。我尝试过 fscanf(file, "%d;%d", something) 并用 *、**、&、&& 的所有可能组合替换了某些内容，还尝试了 [0]、[1] 但仍然无法读取任何内容。

我的代码不需要打印部分（也尝试了所有可能的组合，但没有运气）。我在 scanf 之后放置了断点，但 Visual Studio 将 mymatrix 显示为。

所以，应该有我没有尝试过的 scanf 的组合。如果有人能在这方面帮助我，我将不胜感激。

【问题讨论】：

1) 您的代码中没有矩阵（又名二维数组），也没有任何东西可以用作一个矩阵。 2) 不要将malloc & friends 的结果投射到 C 中。
您的文件中的两个数字是int、unsigned，还是限制在short的范围内，或者它们可以是long或int64_t吗？这将有助于设计正确的解决方案。
@DavidC.Rankin 它们都是整数。

标签： c file matrix

【解决方案1】：

像这样：

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    FILE *fp = fopen("matrix.txt", "r");
    if(!fp){
        perror("fopen");
        return 1;
    }

    int d1, d2, rows = 0;
    while(2 == fscanf(fp, "%d;%d", &d1, &d2))
        ++rows;
    int **matrix = malloc(rows * sizeof(*matrix));

    rewind(fp);
    rows = 0;
    while(2 == fscanf(fp, "%d;%d", &d1, &d2)){
        matrix[rows] = malloc(2 * sizeof(**matrix));
        matrix[rows][0] = d1;
        matrix[rows++][1] = d2;
    }
    fclose(fp);
    //print and free'd
    for(int r = 0; r < rows; ++r){
        printf("%d %d\n", matrix[r][0], matrix[r][1]);
        free(matrix[r]);
    }
    free(matrix);
    return 0;
}

realloc 版本。

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    FILE *fp;
    int **matrix = NULL;
    int d1, d2, r, rows = 0;

    fp = fopen("data.txt", "r");
    if(!fp){
        perror("fopen");
        return 1;
    }

    while(2 == fscanf(fp, "%d;%d", &d1, &d2)){
        matrix = realloc(matrix, (rows+1)*sizeof(*matrix));//In the case of large files increase the size that you want to extend. It must have control of the timing.
        matrix[rows] = malloc(2 * sizeof(**matrix));
        matrix[rows][0] = d1;
        matrix[rows][1] = d2;
        ++rows;
    }
    fclose(fp);
    //print and free'd
    for(r = 0; r < rows; ++r){
        printf("%d %d\n", matrix[r][0], matrix[r][1]);
        free(matrix[r]);
    }
    free(matrix);
    return 0;
}

【讨论】：

在大文件的情况下，realloc比读取文件两次要快。
我正在使用Visual Studio进行编译，所以我认为我应该使用@Priyansh Goel的malloc定义。 VS老是给CL.exe not found错误，所以我要修复它，需要一个多小时。我接受这个答案，但感谢任何帮助过的人。
另外，fscanf 返回 2 是因为它读取了 2 个不同的变量吗？
@acon__ 表示成功读取了两个元素。
如果末尾有free(matrix);，是否需要使用free(matrix[r]);？

【解决方案2】：

首先，int **mymatrix; 不是矩阵/二维数组，不能表示一个。

也就是说，你应该使用指向一维数组/矩阵的指针：

// avoid magic numbers!
#define COLS 2

// Points to the matrix. Starts without matrix
int (*mymatrix)[COLS] = NULL;

尽管它的类型，它可以指向一个二维数组。作为一般规则，“指向N 维度数组的指针”可用于寻址“N+1 维度数组”。

// the amount to grow the array (min. 1, but that is inefficient)
#define GROW_LENGTH 10

// this holds the total number of rows in the array
size_t length = 0;

// row to store next entry
size_t row = 0;

// buffer for input data
int buffer[COLS];

// read data until failure
while ( scanf("%d;%d", &buffer[0], &buffer[1]) == 2 ) {

    if ( row >= length ) {

        // enlarge the array for another block
        int (*p)[COLS] = realloc(mymatrix,
                sizeof(*mymatrix) * (length + GROW_LENGTH));

        if ( p == NULL ) {
            // realloc failed

            // release the matrix and terminate (can be changed to more inteligent behaviour)
            free(mymatrix);
            exit(1);
        }

        // update variables
        mymatrix = p;
        length += GROW_LENGTH;
    }

    // store the data into the matrix
    mymatrix[row][0] = buffer[0];
    mymatrix[row][1] = buffer[1];

    // next position in buffer
    row++;
}

if ( mymatrix == NULL ) {
    // nothing has been read
}

// process the data. 
// `row` contains the number of rows with data

完成后不要忘记释放数组：

free(mymatrix);

上面的代码是一个片段。当然，它需要一些标准头文件和一个函数。最好的方法是将读取部分包装到它自己的函数中，并使用一个干净的调用者接口。它也从标准输入读取；更改为fscanf 很简单。

打印部分也很简单，只需遍历所有行。并打印每一列。

请注意，此代码最多会分配GROW_LENGTH - 1 未使用的行。设置为 1 则完全没有开销，但效率较低，因为每行都调用 realloc。最佳平衡取决于应用程序、操作系统等。

【讨论】：

@Christophe：OP 显然使用了固定的列布局。您的版本也不适用于可变宽度。你也没有。这仍然不是 DV 的理由。错别字已更正，这显然是一个片段。您的代码也不会按原样编译。

【解决方案3】：

首先，fscanf 的参数与格式字符串不匹配。 mymatrix[0] 是 int *，所以 &mymatrix[0] 是 int **。使用-Wall -Wextra 编译会警告您这一点。

此外，您为 int * 的 1 元素数组分配空间，但随后您不填充该指针。

您需要分配一个包含 2 个int 的数组来分配给mymatrix 的第一个元素，然后将每个元素的地址传递给fscanf：

int **mymatrix;
mymatrix = malloc(sizeof(int*)*1);    // don't cast the return value of malloc
mymatrix[0] = malloc(sizeof(int)*2);  // same here
fscanf(file, "%d", &mymatrix[0][0]);
fscanf(file, ";%d", &mymatrix[0][1]);

然后你像这样打印它们：

printf("%d",  mymatrix[0][0]);
printf(" %d", mymatrix[0][1]);

阅读后续每一行时，您需要realloc 而不是malloc 并跟踪您有多少行以及您在哪一行。

【讨论】：

我正在使用 VS，它不允许我不强制转换 malloc 的返回值。
@acon__ 如果是这种情况，那么您可能正在使用 C++ 编译器而不是 C 编译器进行编译。
@acon__：永远不要用 C++ 编译器编译 C 代码！它们是不同的语言。即使某些构造具有相同的语法和语法，它们也可以具有不同的语义。

【解决方案4】：

除了所有其他很好的答案之外，为什么不使用 指向 int 2 数组的指针（或任何元素的大小）？在您的情况下，这是一种最佳方法。无需使用 指向 int 的指针。这使内存的分配和释放变得复杂。 指向数组的指针为您提供块的单次分配、所有已分配内存的单次释放和您想要的 2D 索引。

如果您从文件中读取并存储整数对的集合，则只需使用 指向数组的指针，例如

    int (*arr)[2] = NULL;

这使得malloc 或calloc 的分配可以通过单个调用来分配存储初始数量的对，并通过单个调用来释放内存。例如，如果您有一个变量maxn 或64，那么要分配一块内存来保存从文件中读取的第一个64 整数对，您只需要：

     arr = calloc (maxn, sizeof *arr);

无需单独调用为每个 2 整数分配存储空间，当您达到 64 的初始限制时，您只需 realloc 您的数组并继续前进。每次当前索引 idx 达到限制 maxn（新的内存块也归零）时，以下使用常量 MAXN 到 realloc 额外的 64 对：

    if (++idx == maxn) {
        printf ("\n  reallocating %zu to %zu\n", maxn, maxn + MAXN);
        size_t szelem = sizeof *arr;
        void *tmp = realloc (arr, (maxn + MAXN) * szelem);
        if (!tmp) {
            fprintf (stderr, "realloc() error: virtual memory exhausted.\n");
            exit (EXIT_FAILURE);                
        }
        arr = tmp;
        memset (arr + maxn * szelem, 0, MAXN * szelem);
        maxn += MAXN;
    }

为了方便起见，将所有部分放在一起并使用一些简单的错误检查功能，您可以执行类似于以下的操作：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* constants for number of columns, buffer chars, and initial allocation */
enum { NCOL = 2, MAXC = 32, MAXN = 64 };

void *xcalloc (size_t nmemb, size_t sz);
void *xrealloc (void *ptr, size_t psz, size_t *nelem);
FILE *xfopen (const char *fn, const char *mode);

int main (int argc, char **argv) {

    char buf[MAXC] = {0};
    char *fmt = "%d;%d";
    int (*arr)[NCOL] = NULL;
    size_t i, idx = 0, maxn = MAXN;
    FILE *fp = argc > 1 ? xfopen (argv[1], "r") : stdin;

    /* alloc mem for array of MAXN elements */
    arr = xcalloc (maxn, sizeof *arr);

    while (fgets (buf, MAXC, fp)) {     /* read each line of input */
        int a, b;                       /* parse line for values */
        if (sscanf (buf, fmt, &a, &b) != NCOL) continue;
        arr[idx][0] = a, arr[idx][1] = b;
        if (++idx == maxn)              /* realloc as needed  */
            arr = xrealloc (arr, sizeof *arr, &maxn);
    }
    if (fp != stdin) fclose (fp);       /* close if not stdin */

    for (i = 0; i < idx; i++)
        printf (" array[%3zu][0] : %4d    [1] : %d\n",
                i, arr[i][0], arr[i][1]);

    free (arr);     /* free allocated memory */

    return 0;
}

/** xcalloc allocates memory using calloc and validates the return. */
void *xcalloc (size_t nmemb, size_t sz)
{   register void *memptr = calloc (nmemb, sz);
    if (!memptr) {
        fprintf (stderr, "xcalloc() error: virtual memory exhausted.\n");
        exit (EXIT_FAILURE);
    }
    return memptr;
}

/** realloc 'ptr' to array of elements of 'psz' to 'nelem + MAXN' elements */
void *xrealloc (void *ptr, size_t psz, size_t *nelem)
{   void *tmp = realloc ((char *)ptr, (*nelem + MAXN) * psz);
    if (!tmp) {
        fprintf (stderr, "realloc() error: virtual memory exhausted.\n");
        exit (EXIT_FAILURE);                
    }
    memset (tmp + *nelem * psz, 0, MAXN * psz);  /* zero new memory */
    *nelem += MAXN;
    return tmp;
}

/** fopen with error checking - short version */
FILE *xfopen (const char *fn, const char *mode)
{   FILE *fp = fopen (fn, mode);
    if (!fp) {
        fprintf (stderr, "xfopen() error: file open failed '%s'.\n", fn);
        // return NULL;      /* choose appropriate action */
        exit (EXIT_FAILURE);
    }
    return fp;
}

示例输入

对于64 对的初始分配，重新分配是强制读取整个文件。（您可以在每次迭代时将初始大小设置为 1 和 realloc，但这非常低效 - MAXN 的初始大小必须至少为 1，并且应该设置为给定您的合理预期的元素数量数据）

$ cat dat/2d_data.txt
1;354
2;160
3;205
4;342
...
98;464
99;130
100;424

使用/输出示例

$ ./bin/array_ptr2array_realloc <dat/2d_data.txt
 array[  0][0] :    1    [1] : 354
 array[  1][0] :    2    [1] : 160
 array[  2][0] :    3    [1] : 205
 array[  3][0] :    4    [1] : 342
...
 array[ 97][0] :   98    [1] : 464
 array[ 98][0] :   99    [1] : 130
 array[ 99][0] :  100    [1] : 424

内存使用/错误检查

在您编写的动态分配内存的任何代码中，对于分配的任何内存块，您都有 2 个责任：(1) 始终保留指向起始地址的指针内存块，因此，(2) 当不再需要它时可以释放。

您必须使用内存错误检查程序来确保您没有超出/超出分配的内存块，尝试读取或基于未初始化的值进行跳转，最后确认您已释放所有您分配的内存。对于 Linux，valgrind 是正常的选择。

$ valgrind ./bin/array_ptr2array_realloc <dat/2d_data.txt
==2796== Memcheck, a memory error detector
==2796== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==2796== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==2796== Command: ./bin/array_ptr2array_realloc
==2796==
 array[  0][0] :    1    [1] : 354
 array[  1][0] :    2    [1] : 160
 array[  2][0] :    3    [1] : 205
 array[  3][0] :    4    [1] : 342
...
 array[ 97][0] :   98    [1] : 464
 array[ 98][0] :   99    [1] : 130
 array[ 99][0] :  100    [1] : 424
==2796==
==2796== HEAP SUMMARY:
==2796==     in use at exit: 0 bytes in 0 blocks
==2796==   total heap usage: 2 allocs, 2 frees, 1,536 bytes allocated
==2796==
==2796== All heap blocks were freed -- no leaks are possible
==2796==
==2796== For counts of detected and suppressed errors, rerun with: -v
==2796== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)

始终确认所有堆块都已释放——不可能有泄漏，同样重要的是错误摘要：0 个上下文中的 0 个错误。

查看所有答案，如果您还有其他问题，请告诉我。在诸如此类的许多情况下，使用指向数组的指针 非常有意义。它简化了内存的分配和释放，并保留了您的 2D 索引。

【讨论】：

感谢您建议使用指向数组的指针。我对编程比较陌生，所以我认为没有创意。你的例子比我的水平高一点，但我学到了很多东西。感谢您抽出宝贵时间。
没什么花哨的，它只是 int *array[n];（一个指针数组 - 其中 n 个）和 int (*array)[n];（指向n int 的数组）。第一个是n 指向int 的指针，第二个是指向n int 的array 的单个指针。没有魔法，它只是遵循 C 运算符优先规则，你先阅读 (...) 里面的内容。因此，在int (*array)[]; 中，您将*array（指针）读取到[]（一个数组）。
知道了。但其中花哨的部分是我还没有学过的东西，比如size_t、enum、register和其他一些功能。
很公平，size_t 只是您看到的涵盖int 正范围的类型（您不能有负长度等...），全局enum 允许您创建常量而不是 3 行 #define NCOL 3, ... 和 register 只是向编译器提示该值将立即使用，因此请尽量将值保存在寄存器中，而不是分页缓存.慢慢来，有很多东西要学，但第一次就学会它的工作量要少得多:)

【解决方案5】：

如果mymatrix 被定义为int mymatrix[3][2]，您可以阅读第一行：

fscanf(file, "%d;%d", &mymatrix[0][0], &mymatrix[0][1]);

同样的代码也适用于定义为int **mymatrix的mymatrix，如果它指向一个动态初始化的指向int数组的指针数组，如下所示：

mymatrix=(int**)calloc(lines, sizeof(int*)); // intialize the array of array
for (inti=0; i<lines; i++) 
    mymatrix[i]=(int*)calloc(cols, sizeof(int));  //intialise each array

然后可以在运行时定义矩阵的lines 和cols 的数量。

【讨论】：

int **mymatrix 是指向指针的指针，而不是“数组数组”。
@Olaf 从技术上讲，您当然是对的，因为它是指向指针的指针。但是如果 mymatrix 表示一个矩阵，它必须指向一个int* 的数组，每个它都必须指向一个int 的数组。我稍微修改了措辞以避免混淆。
不，没有！请看我的回答。
@Olaf，您有权使用固定列数提出解决方案。但我认为对使用不同方法的正确答案投反对票是不公平的。
您的代码包含其他缺陷（不必要的强制转换、幻数）并且不包括相关部分：如何放大矩阵。另外我已经写了你的声明“数组数组”显然是错误的。指针不是数组！试试sizeof(mymatrix)

【解决方案6】：

你必须malloc内存到mymatrix[0]。

mymatrix[0]= (int *) malloc ( sizeof (int) *2);

另外，在mymatrix malloc 中，1 实际上表示您将拥有的最大行数。所以你最多可以有 1 行。

【讨论】：

很好奇为什么你将(int *) 转换为malloc() 的返回值。这是允许的，但您认为这样做有什么价值？