【问题标题】:How to read number of words in a line in C txt file如何在 C txt 文件中读取一行中的单词数
【发布时间】:2021-03-10 20:34:59
【问题描述】:

您好,我有一个像素数组,我使用 fprintf 将其写入文本文件。我试图获取行数和列数,但我注意到 fscanf 没有考虑换行,所以当我使用它时,我只能得到数字的总数。还有其他方法可以获取行数和列数吗?

100 255 244 200
999  11  23  41
234   0  23 111

【问题讨论】:

  • 检查getline这可能会有所帮助,here
  • 对于行数-fgets,对于列数,可能 strtok
  • 你能说出最多可能的列吗?
  • 您的两行以空白结尾,最后一行没有。这是输入的灵活性还是您能保证所有内容都以一个空白结尾,并且只有最后一行没有结尾?
  • 这可能是一个单行,具有严格的输入格式。请提供您打算在其中存储读取值的数据结构。填充它可能需要另一行......

标签: c file scanf


【解决方案1】:

行数等于\n 的数量加一(如果最后一个字节不是\n)。列数等于单个空格数加一(不计算列的额外空间)。我读取了文件的全部内容并将其存储到一个字符数组中,然后我计算了\n 和空格的数量,以找到行数和列数。

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>

int main()
{
    int fd;
    char arr[1024]; // Increase size according to the file size.
    ssize_t count;
    int isLineReach = 0;
    int row = 0, cols =0;
    fd = open("RowCol.txt", O_RDONLY);
    if (fd >= 0)
    {
        count = read(fd, arr, 1024); // reading from file and writing to array.
        if (count == -1)
        {
            printf("Error %d\n", errno);
            exit(0);
        }
        arr[count] = '\0';
    }
    
   else
    {
       printf("Error opening a file %d\n", errno);
       exit(0);
    }

    for(int i=0; i<count-1; i++) // count--> Total number of characters including `\n` and `' '` space
    {
        if(arr[i] == '\n') // Checking for number of lines
        {
            row++;
            isLineReach = 1;
        }

        else if(arr[i] == ' ' && isLineReach != 1 && arr[i+1] !=' ') // Checking for number of spaces in a line
        {
            cols++;
        }
        
    }

    printf("The number of rows are %d\n",(row+1));
    printf("The number of columns are %d\n",(cols+1));

    return 0;
    close(fd);
}

RowCol.txt 内容:

100 255 244 200
999  11  23  41
234   0  23 111
123 234 123 0
112 230 43 12
123 234 43 10
133 321 23 12
100 123 67 89
102 34 45 67
104 123 43 54
120 165 23 23

输出是:

The number of rows are 11
The number of columns are 4

【讨论】:

  • 标准库完全可以做到这一点,为什么还要使用非标准函数?
  • 如果文件大于 1024 字节怎么办?还有,这算空格数来标识列数,如果第一行是234 0 23 111
  • arr[i + 1] 可能 读取条件i &lt; count 越界,将条件更改为i &lt; count - 1
  • 行数等于\n数加一。如果最后一个字节是换行符,为什么要加一?
  • @KrishnaKanthYenumula:好的。对于列数,请注意初始空格和尾随空格:您的方法在这些上失败,将空格 1 空格换行符计为 3 列。
【解决方案2】:

对于行,fgets 足以逐行读取

size_t rows = 0;
char buffer[256] = {0};
FILE* f = fopen("test.txt", "r");
if (!f)
{
    fprintf(stderr, "Could not open file\n");
    return 1;
}
while (fgets(buffer, sizeof buffer, f) != NULL)
{
    if (strchr(buffer, '\n') != NULL)
    {
        // Increment row counter if a newline is present in the string
        rows++;
    }
    else if (feof(f))
    {
        // Increment even if there's no newline but EOF has been reached
        rows++;
    }
}
fclose(f);
printf("rows: %d\n", rows);
return 0;

fgets 最多会读入sizeof(buffer) - 1 个字符或直到遇到的第一个换行符,以先到者为准。

这意味着某些行大于缓冲区大小(在本例中为 256)的读取,将不会读取整行。所以我们需要在递增之前检查字符串中是否真的存在换行符strchr

对于列,假设所有行的列数相同,您可以简单地计算包含整行的buffer 中的空格数(不连续)

size_t columns = 0;
char buffer[256] = {0};
FILE* f = fopen("test.txt", "r");
if (!f)
{
    fprintf(stderr, "Could not open file\n");
    return 1;
}
// Read the first line in full, keep trying until a newline is encountered
while (strchr(buffer, '\n') == NULL && fgets(buffer, sizeof buffer, f) != NULL)
{
    // Keep track of whether or not actual column data has been encountered
    bool data_encountered = false;
    for (size_t i = 0; i < strlen(buffer) - 1; i++)
    {
        if (buffer[i] != ' ')
        {
            // NOTE: This assumes any non space character is valid column data
            data_encountered = true;
        }
        else if (data_encountered)
        {
            // Encountered space, if column data had been encountered prior - increment count
            columns++;
            // Reset data_encountered
            data_encountered = false;
        }
    }
}
// Increment columns one last time if line ended with a non space character
size_t bufferlen = strlen(buffer);
if (buffer[bufferlen - 1] == '\n')
{
    // Buffer ended in a newline, check the character just before it
    // Increment column count if the last character (excluding newline is a valid column data)
    columns += (buffer[bufferlen - 2] != ' ');
}
else
{
    // Increment column count if the last character (excluding newline is a valid column data)
    columns += (buffer[bufferlen - 1] != ' ');
}
fclose(f);
printf("columns: %d\n", columns);
return 0;

循环一直调用fgets,直到缓冲区中出现换行符,即已读取一行。在循环内部,对于每个缓冲区,将空格数(非连续)添加到计数器中,表示列。

如果你事先知道列数的上限,甚至每行字符数的上限 -您将不需要所有这些保护措施。但在您无法猜测的情况下,这将是可靠的。

现在,你如何组合它们?我建议将它们放在单独的函数中,一个用于计算行数,另一个用于计算列数。不用担心性能,如果编译器看到这两个函数在彼此附近被调用,它会处理这个问题。

但是如果你坚持在同一个函数中完成所有这些,这里有一个有效的实现-

int columns = 0, rows = 0;
char buffer[256] = { 0 };
FILE* f = fopen("test.txt", "r");
if (!f)
{
    fprintf(stderr, "Could not open file\n");
    return 1;
}
// Extract the first line and count the columns
while (strchr(buffer, '\n') == NULL && fgets(buffer, sizeof buffer, f) != NULL)
{
    // Keep track of whether or not actual column data has been encountered
    bool data_encountered = false;
    for (size_t i = 0; i < strlen(buffer) - 1; i++)
    {
        if (buffer[i] != ' ')
        {
            // NOTE: This assumes any non space character is valid column data
            data_encountered = true;
        }
        else if (data_encountered)
        {
            // Encountered space, if column data had been encountered prior - increment count
            columns++;
            // Reset data_encountered
            data_encountered = false;
        }
    }
}
// Increment columns one last time if line ended with a non space character
size_t bufferlen = strlen(buffer);
if (buffer[bufferlen - 1] == '\n')
{
    // Buffer ended in a newline, check the character just before it
    // Increment column count if the last character (excluding newline is a valid column data)
    columns += (buffer[bufferlen - 2] != ' ');
}
else
{
    // Increment column count if the last character (excluding newline is a valid column data)
    columns += (buffer[bufferlen - 1] != ' ');
}
// Increment rows by one, since one line has been read already
rows++;
// Reset all cells in the buffer to 0
memset(buffer, 0, sizeof buffer);
// Count the rest of the lines
while (fgets(buffer, sizeof buffer, f) != NULL)
{
    if (strchr(buffer, '\n'))
    {
        rows++;
    }
    else if (feof(f))
    {
        rows++;
    }
}
fclose(f);
printf("rows: %d\n", rows);
printf("columns: %d\n", columns);

注意:要包含在代码中的标头-

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

【讨论】:

  • 一个由 2 个字节组成的文件:一个空格和 1,将被分析为 rows: 0cols: 2,这在这两个方面似乎都不正确。
  • @chqrlie 好吧,我提到了这种边缘情况并想给 OP 留下一些乐趣,但我肯定会编辑完整的解决方案来处理所有情况
猜你喜欢
  • 2019-01-27
  • 2019-03-02
  • 2021-03-17
  • 2022-06-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多