使用 sscanf 从字符串中读取多个单词答案

【问题标题】：Read more than one word from string with sscanf使用 sscanf 从字符串中读取多个单词
【发布时间】：2015-05-24 00:48:18
【问题描述】：

我正在尝试从文件中读取格式化的内容。为此，我使用fgets() 和sscanf() 逐行阅读。

文件的内容应该是一个表格。一行类似于以下示例：

456    2    39    chained_words    62.5    // comment with more than one word

为了阅读它，我使用：

fgets(temp,MAXLINELENGTH,file);
sscanf(temp,"%d %d %d %s %f // %s",&num1,&num2,&num3,word,&num4,comment);

前五个元素加上// 之后的第一个单词可以正常工作，但问题是我需要将整个注释存储在comment char * 变量中。我尝试了其他帖子中提出的多种解决方案，例如指定排除某些字符的格式，但没有任何效果。

如果有任何解决问题的提示，我将不胜感激！

【问题讨论】：

这是其他问题中提出的许多对我不起作用的事情之一。
sscanf() 可能不是该工作的正确工具。
写这个例子是个错误，我很抱歉。我在我的程序中使用%f。 @GregHewgill，你有什么替代方案吗？
只要链接的词是真正链接的，sscanf(temp,"%d %d %d %s %f // %[^\n]%*c",&num1,&num2,&num3,word,&num4,comment); 应该这样做。 注意： %*c 不是“技术上”需要的，因为 %d 将跳过所有空格，包括 '\n'，但考虑所有行中的字符是一个好习惯。
那么你有几个选择。最简单的方法是用fgets（或getline）读取，然后用strtol读取整数，使用endptr的返回前进直到遇到下一个字符（或换行符），测试下一个字符是否为数字或文本，如果再次输入数字 strtol，如果是文本，则使用 fgetc 读取直到下一个字符（数字、字符或换行符）——重复。

标签： c string scanf

【解决方案1】：

根据您的评论，如果您要在现有的 comment 之后添加另一个数字，这会使事情变得有点复杂。原因是comment 包含多个单词，您没有离散的结尾可搜索。

然而，C 很少让你失望。每当您需要从行或缓冲区解析数据时，您会查看数据的格式并询问“我将使用什么作为我需要的开始或结束的参考？”在这里，没有任何注释，我们需要使用缓冲区的末尾作为参考并向后工作。

我们将假设该值是换行符之前的最后一行（后面没有制表符或空格）。我们可以向后循环，直到找到要验证的最后一个非空白字符，但出于此处的目的，我们做出我们的假设。

为了解决这个问题，我们将解析行分成两部分。我们可以通过我们最初的sscanf 电话以可靠的方式阅读评论之前的所有内容。因此，我们将考虑一行第一部分中的所有内容（直到并包括浮点数）第 1 部分，以及注释字符 // 第 2 部分之后的所有内容。您照常阅读/解析第 1 部分：

        sscanf (line, "%d %d %d %s %f", &d1, &d2, &d3, word, &f1);

在一行中搜索特定字符时，我们有一个手动的逐字符比较（我们一直都有），我们在 string.h 中有 strchr 和 strrchr 函数，它们将搜索一行文本对于给定字符的第一次 (strchr) 或最后一次 (strrchr) 出现。这两个函数都在字符串中返回一个指向该字符的指针。

从我们的行尾向后工作，如果我们找到/，我们现在有一个指针（字符串中的地址）指向注释开头之前的最后一个'/'。现在，我们使用指针将整行的剩余部分读入comment（值和全部）。

        p = strrchr (line, '/');            /* find last '/' in line    */
        sscanf (p, "/ %[^\n]%*c", comment); /* read comment and value   */

现在我们只使用comment（而不是line）。我们知道，如果我们从comment 的末尾向后工作以寻找空格' '，我们将能够读取我们的最后一个值。在我们读取最后一个值之后，由于我们的指针指向该值之前的地址，我们知道我们可以在指针处 null-terminate comment 完成我们的解析。

        p = strrchr (comment, ' ');         /* find last space in file  */
        sscanf (p, " %d", &d4);             /* read last value into d4  */
        *p = 0;                             /* null-terminate comment   */

（注意：如果需要，您可以检查/删除comment 中的任何尾随空格，但出于我们的目的，将其省略）

把它们放在一起，你会得到这样的东西：

快速示例

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXS 128

int main (int argc, char **argv) {

    if (argc < 2 ) {                /* check for at least 1 argument    */
        fprintf (stderr, "error: insufficient input, usage: %s filename\n", 
                argv[0]);
        return 1;
    }

    char line[MAXS] = {0};
    char word[MAXS] = {0};
    char comment[MAXS] = {0};
    char *p = NULL;
    size_t idx = 0;
    int d1, d2, d3, d4;
    float f1 = 0.0;
    FILE *fp = NULL;

    d1 = d2 = d3 = d4 = 0;

    if (!(fp = fopen (argv[1], "r"))) {  /* open/validate file   */
        fprintf (stderr, "error: file open failed '%s'.", argv[1]);
        return 1;
    }

    while (fgets (line, MAXS, fp) != NULL)  /* read each line in file */
    {
        /* read buffer through first float */
        sscanf (line, "%d %d %d %s %f", &d1, &d2, &d3, word, &f1);

        p = strrchr (line, '/');            /* find last '/' in line    */
        sscanf (p, "/ %[^\n]%*c", comment); /* read comment and value   */
        p = strrchr (comment, ' ');         /* find last space in file  */
        sscanf (p, " %d", &d4);             /* read last value into d4  */
        *p = 0;                             /* null-terminate comment   */

        printf ("\nline : %zu\n\n %s\n", idx, line);
        printf ("   d1 : %d\n   d2 : %d\n   d3 : %d\n   d4 : %d\n   f1 : %.2f\n",
                d1, d2, d3, d4, f1);
        printf ("   chained : %s\n   comment : %s\n", word, comment);

        idx++;
    }

    fclose (fp);

    return 0;
}

输入

$ cat dat/strwcmt.txt
456    2    39    chained_words    62.5    // comment with more than one word    227
457    2    42    more_chained_w   64.5    // another comment    228
458 3 45 s_n_a_f_u 66.5 // this is still another comment 229

输出

$ ./bin/str_rd_mixed dat/strwcmt.txt

$ ./bin/str_rd_mixed dat/strwcmt.txt

line : 0

 456    2    39    chained_words    62.5    // comment with more than one word    227

   d1 : 456
   d2 : 2
   d3 : 39
   d4 : 227
   f1 : 62.50
   chained : chained_words
   comment : comment with more than one word

line : 1

 457    2    42    more_chained_w   64.5    // another comment    228

   d1 : 457
   d2 : 2
   d3 : 42
   d4 : 228
   f1 : 64.50
   chained : more_chained_w
   comment : another comment

line : 2

 458 3 45 s_n_a_f_u 66.5 // this is still another comment 229

   d1 : 458
   d2 : 3
   d3 : 45
   d4 : 229
   f1 : 66.50
   chained : s_n_a_f_u
   comment : this is still another comment

注意：处理此问题的不同方法没有限制。这只是一种方法。另一种是将整行标记成单独的单词，检查每个单词是否以数字开头（并包含'.'表示浮点数），然后简单地转换所有数字并连接所有非数字根据需要的话。由你决定。你的工具箱越大，你就会看到越多的方法来处理它。

【讨论】：

感谢您非常完整的回答，我学到了很多东西！我现在就开始做一些测试。