如何从C语言中的括号中提取单词？答案

【问题标题】：How to extract words from parentheses in C language?如何从C语言中的括号中提取单词？
【发布时间】：2022-01-05 07:33:39
【问题描述】：

我正在尝试从这样的字符串中提取单词：

(octopus kitten) (game cake) (soccer football)

我尝试在strtok 的帮助下这样做（我这样做strcpy 只是为了不修改原始令牌/字符串，也使用了memcpy，但在我的情况下也是如此）。

主要功能：

int main(int argc, char * argv[]) {

  char row[] = "(octopus kitten) (game cake) (soccer football)";
  char * pch;
  pch = strtok(row, "(");

  while (pch != NULL) {

    pch[strcspn(pch, ")")] = '\0';
    print_word(pch);
    pch = strtok(NULL, "(");

  }

  return 0;
}

获取和打印每个单词的函数：

void get_and_print_word(char str[]) {

  char r[4000];

// for not modifying the original string
  strcpy(r, str);

  char * c = strtok(r, " ");
  for (int i = 0; i < 2; i++) {

    printf("%s\n", c);
    c = strtok(NULL, " ");
  }
}

它在第一次迭代时工作得很好，但是在pch 开始指向另一个内存地址之后（但它应该指向字母“g”的地址）。

如果我们删除get_and_print_word(pch)，它绝对可以正常工作（它只是在括号内打印字符串）：

int main(int argc, char * argv[]) {

  char row[] = "(octopus kitten) (game cake) (soccer football)";
  char * pch;
  pch = strtok(row, "(");

  while (pch != NULL) {

    pch[strcspn(pch, ")")] = '\0';
    printf("%s\n", pch);
    pch = strtok(NULL, "(");

  }

  return 0;
}

但这不是我想做的，我需要获取每个单词，而不仅仅是两个单词的字符串和它们之间的空格。

在我的情况下，使用pch = strtok(NULL, " )(") 也不合适，因为我需要将每对单词（每个单词，当然，应该是一个单独的字符串）存储在某个人中 struct，所以我肯定需要这个功能。

如何解决这个问题以及为什么会这样？

【问题讨论】：

阅读更多关于parsing 和Dragon book 的第一章。如果允许，请考虑使用GNU bison。还可以考虑带有奇怪字母的字符串，例如Être ou ne pas être。 2021年UTF-8 is everywhere
也许你可以给你的recursive descent parser 编码。用铅笔和纸在EBNF notation 中定义你的语法。注意pushdown automation
你应该解析(a (very (strange)) cat) (eating a (piece of cake)) 吗？
@BasileStarynkevitch 不，我只需要以我写的方式解析括号内的单词对
@BasileStarynkevitch 字符串应该是英文字母，这样我们就不会遇到任何奇怪的字母了。

标签： c string pointers strtok

【解决方案1】：

为什么不用正则表达式：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h>
int main (int argc, char *argv[])
{
   int err;
   regex_t preg;
   const char *str_request = argv[1];
   const char *str_regex = argv[2];
   err = regcomp (&preg, str_regex, REG_EXTENDED);
   if (err == 0) {
      int match;
      size_t nmatch = 0;
      regmatch_t *pmatch = NULL;
      nmatch = preg.re_nsub;
      pmatch = malloc (sizeof (*pmatch) * nmatch);
      char *buffer;
      if (pmatch) {
         buffer = (char *) str_request;
         match = regexec (&preg, buffer, nmatch, pmatch, 0);
         while (match == 0) {
            char *found = NULL;
            size_t size ;
            int start, end;
            start = pmatch[0].rm_so;
            end = pmatch[0].rm_eo;
            size = end - start;
            found = malloc (sizeof (*found) * (size + 1));
            if (found) {
               strncpy (found, &buffer[start], size);
               found[size] = '\0';
               printf ("found : %s\n", found);
               free (found);
            }
            //searching next occurence
            match = regexec (&preg, (buffer += end), nmatch, pmatch, 0);
         }
         regfree (&preg);
         free (pmatch);
      }
   }
   return 0;
}

[puppet@damageinc regex]$ ./regex "(octopus kitten) (game cake) (soccer football)" "([a-z]+)"
found : octopus
found : kitten
found : game
found : cake
found : soccer
found : football

【讨论】：