【问题标题】:C search words in a StringC在字符串中搜索单词
【发布时间】:2016-04-26 17:43:42
【问题描述】:

我希望有人可以帮助我。我认为这是一个简单的问题, 我想编写一个搜索文件中单词的程序。

char *such = "Ingo";
char *fund;
FILE *datei;
char text[100];

datei = fopen("names.txt", "r");

if (datei == NULL) {
    printf("Fehler\n");
}
else 
{
    fscanf(datei, "%100c", text);
    text[100] = '\0';
    //i think this dont work
    if (fgets(text, 100, datei) != NULL)
    {
        printf("%s \n", text);
    }   
}

return 0;

该文件包含以下内容:

Ingo Test Test 123 Test Ingo Ingo

现在我想搜索“Ingo”这个名字在文件中出现的频率。

是否可以搜索更多的词,也许是“ingo”和“test”并数一下?

【问题讨论】:

  • 您好,我已尝试将文件保存在字符串中。但这不起作用。
  • 然后我尝试使用“fscan”和“fgets”。 ://
  • 变量“text”在哪里定义?
  • 使用 fscanf 扫描文件中的单词,使用 strcmp 从 string.h 将扫描的作品与您要查找的单词进行比较。
  • 有一个名为strstr 的函数会有所帮助:stackoverflow.com/questions/27303062/…。此外,text[100] = '\0'; 超出了您的数组范围。

标签: c string file search


【解决方案1】:

有两种非常简单的方法可以做到这一点:

  1. 在循环中,您使用 fscanf 从文件中查找单词,直到到达 EOF,同时通过 string.h 中的 strcmp(字符串比较)询问该单词是否是您要查找的单词

    /li>
  2. 使用两个循环,在外部循环中使用 fgetc 获取字符,直到到达一些分隔符,例如空格或 \n 或 \t,并在内部循环中检查您使用 getc 扫描的单词是否是您正在查找的单词为了。为此,您需要一些临时 char 数组。

【讨论】:

    【解决方案2】:
    #include <stdio.h>
    #include <string.h>
    #include <ctype.h>
    
    int main(void) {
        char *such = "Ingo";
        FILE *datei;
        char word[100];
        int counter = 0;
    
        datei = fopen("names.txt", "r");
    
        if (datei == NULL) {
            printf("Fehler\n");
        }
        else 
        {
            while(1==fscanf(datei, "%99s", word)){//read word by word
                word[0] = toupper(word[0]);       //ingo --> Ingo
                if (strcmp(word, such) == 0){
                    ++counter;
                }
            }
            fclose(datei);
            if (counter != 0){
                printf("number of '%s' is %d\n", such, counter);
            }   
    
        }
    
        return 0;
    }
    

    【讨论】:

      【解决方案3】:

      你应该测试很多条件以确保你只匹配整个单词等。以下是搜索jury的一种方法,只匹配juryjury's,但不匹配@ 987654324@。您还应该考虑是否要匹配单词的复数形式(例如reviewreviews。在单个分隔符集合(delim)下方被认为可以确保匹配整个单词。您可以轻松打破它如果您想匹配复数或其他各种后缀,则将它们分成两个并设置开头和结尾。

      代码期望文件名作为第一个参数进行搜索,搜索项 (sterm) 作为第二个参数。 (如果没有给出参数,它将在stdin 上的文本中搜索'the')。代码将文件中的每一行读入名为line 的临时缓冲区,然后在line 中的每个字符中搜索sterm 中的开始字符。如果找到,则检查前一个字符以确保它是分隔符,然后单词后面的字符(sterm 长度)也是分隔符。如果是与sterm相同字符开头的单词,前后有分隔符,则使用strncmp比较内容。

      如果所有条件都满足,则将单词复制到tmp,并增加count。结果与匹配的line 中的从零开始 的位置一起打印。这只是一个尚未优化的基本全词搜索,但应该为您提供一个从包含较少的子字符串中区分全词的起点。 (即搜索'the' 将不会同时匹配'them''then''they' 等。)。您还可以将此代码转换为一个函数,该函数将每个匹配项的行号和位置保存在可以返回指针的结构数组中。这样,您可以解析文本并返回指向包含每个匹配项的行和位置的数组的指针。 (那是另一天)。

      查看代码,如果您有任何问题,请告诉我。如果您不关心只匹配 whole-words,那么您可以简单地在每一行上重复调用 strstr,同时推进一个指针来计算搜索词的出现次数。最能满足您需求的。

      #include <stdio.h>
      #include <string.h>
      
      #define MAXS 256
      
      int main (int argc, char **argv)
      {
          char line[MAXS] = {0};  /* line buffer for fgets */
          FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
          char *sterm = argc > 2 ? argv[2] : "the";
          char *delim = " \t\n\'\".";
          size_t count = 0, idx = 0, slen = strlen (sterm);
      
          if (!fp) {
              fprintf (stderr, "error: file open failed '%s'\n", argv[1]);
              return 1;
          }
      
          while (fgets (line, MAXS, fp))
          {
              size_t i, llen = strlen (line);
              idx++;
      
              if (llen < slen + 1)
                  continue;       /* line not longer than search term + \n */
      
              for (i = 0; i < llen - slen + 1; i++) {
      
                  if (line[i] != *sterm)
                      continue;   /* char != first char in sterm  */
                  if (i && !strchr (delim, line[i-1]))
                      continue;   /* prior char is not a delim    */
                  if (!strchr (delim, line[i+slen]))
                      continue;   /* next char is not a delim     */
                  if (strncmp (&line[i], sterm, slen))
                      continue;   /* chars don't match sterm      */
      
                  printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
                          idx, ++count, sterm, &line[i] - line);
              }
          }
          if (fp != stdin) fclose (fp);
      
          printf ("\n total occurrences of '%s' in '%s' : %zu\n\n",
                  sterm, argc > 1 ? argv[1] : "stdin", count);
      
          return 0;
      }
      

      示例文件

      $ cat dat/damages.txt
      Personal injury damage awards are unliquidated
      and are not capable of certain measurement; thus, the
      jury has broad discretion in assessing the amount of
      damages in a personal injury case. Yet, at the same
      time, a factual sufficiency review insures that the
      evidence supports the jury's award; and, although
      difficult, the law requires appellate courts to conduct
      factual sufficiency reviews on damage awards in
      personal injury cases. Thus, while a jury has latitude in
      assessing intangible damages in personal injury cases,
      a jury's damage award does not escape the scrutiny of
      appellate review.
      
      Because Texas law applies no physical manifestation
      rule to restrict wrongful death recoveries, a
      trial court in a death case is prudent when it chooses
      to submit the issues of mental anguish and loss of
      society and companionship. While there is a
      presumption of mental anguish for the wrongful death
      beneficiary, the Texas Supreme Court has not indicated
      that reviewing courts should presume that the mental
      anguish is sufficient to support a large award. Testimony
      that proves the beneficiary suffered severe mental
      anguish or severe grief should be a significant and
      sometimes determining factor in a factual sufficiency
      analysis of large non-pecuniary damage awards.
      

      输出

      $ ./bin/searchterm dat/damages.txt jury
       line[ 3] match  1. 'jury' at location 0
       line[ 6] match  2. 'jury' at location 22
       line[ 9] match  3. 'jury' at location 37
       line[11] match  4. 'jury' at location 2
      
       total occurrences of 'jury' in 'dat/damages.txt' : 4
      

      $ ./bin/searchterm <dat/damages.txt
       line[ 2] match  1. 'the' at location 50
       line[ 3] match  2. 'the' at location 39
       line[ 4] match  3. 'the' at location 43
       line[ 5] match  4. 'the' at location 48
       line[ 6] match  5. 'the' at location 18
       line[ 7] match  6. 'the' at location 11
       line[11] match  7. 'the' at location 38
       line[17] match  8. 'the' at location 10
       line[19] match  9. 'the' at location 34
       line[20] match 10. 'the' at location 13
       line[21] match 11. 'the' at location 42
       line[23] match 12. 'the' at location 12
      
       total occurrences of 'the' in 'stdin' : 12
      

      使用指针而不是数组索引表示法

      您可能会发现使用 pointer 而不是 array index 表示法更自然一些。 (例如,使用char *p = line; 和推进p,而不是使用line[X] 表示法)。如果是这样,您可以将读取循环替换为以下内容:

          while (fgets (line, MAXS, fp))
          {
              char *p = line;
              size_t llen = strlen (line);
              idx++;
      
              if (llen < slen + 1)
                  continue;       /* line not longer than search term + \n */
      
              for (;p < (line + llen - slen + 1); p++) {
      
                  if (*p != *sterm)
                      continue;   /* char != first char in sterm  */
                  if (p > line && !strchr (delim, *(p - 1)))
                      continue;   /* prior char is not a delim    */
                  if (!strchr (delim, *(p + slen)))
                      continue;   /* next char is not a delim     */
                  if (strncmp (p, sterm, slen))
                      continue;   /* chars don't match sterm      */
      
                  printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
                          idx, ++count, sterm, p - line);
              }
          }
      

      指针符号在 C 中可能更自然一些。如果您有任何问题,请告诉我。

      【讨论】:

        猜你喜欢
        • 2011-12-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-09-15
        • 2013-10-22
        相关资源
        最近更新 更多