【问题标题】:How to split a string with a delimiter larger than one single char?如何拆分分隔符大于一个字符的字符串?
【发布时间】:2011-02-01 15:53:13
【问题描述】:

假设我有这个:

"foo bar 1 and foo bar 2"

我怎样才能把它分成:

foo bar 1
foo bar 2

?

我尝试了strtok()strsep(),但都没有成功。它们不将“and”识别为分隔符,而是将“a”、“n”和“d”识别为分隔符。

有什么函数可以帮助我解决这个问题,否则我将不得不用空格分割并进行一些字符串操作?

【问题讨论】:

    标签: c string string-split


    【解决方案1】:

    您可以使用strstr() 找到第一个“和”,并自己“标记”字符串,只需向前跳过这么多字符,然后再做一次。

    【讨论】:

      【解决方案2】:

      这是我刚刚写的一个很好的简短示例,展示了如何使用strstr 在给定字符串上拆分字符串:

      #include <string.h>
      #include <stdio.h>
      
      void split(char *phrase, char *delimiter)
      {
          char *loc = strstr(phrase, delimiter);
          if (loc == NULL)
          {
              printf("Could not find delimiter\n");
          }
          else
          {
              char buf[256]; /* malloc would be much more robust here */
              int length = strlen(delimiter);
              strncpy(buf, phrase, loc - phrase);
              printf("Before delimiter: '%s'\n", buf);
              printf("After delimiter: '%s'\n", loc+length);
          }
      }
      
      int main()
      {
          split("foo bar 1 and foo bar 2", "and");
          printf("-----\n");
          split("foo bar 1 and foo bar 2", "quux");
          return 0;
      }
      

      输出:

      分隔符之前:'foo bar 1' 分隔符后:'foo bar 2' ----- 找不到分隔符

      当然,我还没有完全测试过它,它可能容易受到与字符串长度相关的大多数标准缓冲区溢出问题的影响;但这至少是一个可证明的例子。

      【讨论】:

        【解决方案3】:

        在 C 中拆分字符串的主要问题是它不可避免地 导致一些动态内存管理,这往往是被避免的 尽可能由标准库提供。这就是为什么没有标准 C函数处理动态内存分配,只有malloc/calloc/realloc 这样做。

        但是自己做这件事并不太难。让我带你过去 它。

        我们需要返回一些字符串,这是最简单的方法 是返回一个指向字符串的指针数组,该数组由 一个 NULL 项目。除了最后的 NULL 之外,数组中的每个元素都指向 一个动态分配的字符串。

        首先我们需要几个辅助函数来处理这样的数组。 最简单的一种是计算字符串(元素 在最后的 NULL 之前):

        /* Return length of a NULL-delimited array of strings. */
        size_t str_array_len(char **array)
        {
            size_t len;
        
            for (len = 0; array[len] != NULL; ++len)
                continue;
            return len;
        }
        

        另一个简单的是释放数组的函数:

        /* Free a dynamic array of dynamic strings. */
        void str_array_free(char **array)
        {
            if (array == NULL)
                return;
            for (size_t i = 0; array[i] != NULL; ++i)
                free(array[i]);
            free(array);
        }
        

        比较复杂的是添加字符串副本的函数 到数组。它需要处理一些特殊情况,例如当 该数组尚不存在(整个数组为 NULL)。此外,还需要 处理不以 '\0' 结尾的字符串,这样更容易 我们实际的拆分函数只使用输入字符串的一部分 追加。

        /* Append an item to a dynamically allocated array of strings. On failure,
           return NULL, in which case the original array is intact. The item
           string is dynamically copied. If the array is NULL, allocate a new
           array. Otherwise, extend the array. Make sure the array is always
           NULL-terminated. Input string might not be '\0'-terminated. */
        char **str_array_append(char **array, size_t nitems, const char *item, 
                                size_t itemlen)
        {
            /* Make a dynamic copy of the item. */
            char *copy;
            if (item == NULL)
                copy = NULL;
            else {
                copy = malloc(itemlen + 1);
                if (copy == NULL)
                    return NULL;
                memcpy(copy, item, itemlen);
                copy[itemlen] = '\0';
            }
        
            /* Extend array with one element. Except extend it by two elements, 
               in case it did not yet exist. This might mean it is a teeny bit
               too big, but we don't care. */
            array = realloc(array, (nitems + 2) * sizeof(array[0]));
            if (array == NULL) {
                free(copy);
                return NULL;
            }
        
            /* Add copy of item to array, and return it. */
            array[nitems] = copy;
            array[nitems+1] = NULL;
            return array;
        }
        

        这是一个moutful。对于真正好的风格,它会更好 如果输入项为自己的,则拆分动态副本的制作 函数,但我将把它作为练习留给读者。

        最后,我们有了实际的拆分功能。它也需要处理 一些特殊情况:

        • 输入字符串可能以分隔符开头或结尾。
        • 可能有相邻的分隔符。
        • 输入字符串可能根本不包含分隔符。

        如果分隔符是,我选择在结果中添加一个空字符串 紧邻输入字符串的开头或结尾,或紧邻 另一个分隔符。如果你需要别的东西,你需要调整 代码。

        除了特殊情况和一些错误处理,拆分 现在相当简单。

        /* Split a string into substrings. Return dynamic array of dynamically
           allocated substrings, or NULL if there was an error. Caller is
           expected to free the memory, for example with str_array_free. */
        char **str_split(const char *input, const char *sep)
        {
            size_t nitems = 0;
            char **array = NULL;
            const char *start = input;
            char *next = strstr(start, sep);
            size_t seplen = strlen(sep);
            const char *item;
            size_t itemlen;
        
            for (;;) {
                next = strstr(start, sep);
                if (next == NULL) {
                    /* Add the remaining string (or empty string, if input ends with
                       separator. */
                    char **new = str_array_append(array, nitems, start, strlen(start));
                    if (new == NULL) {
                        str_array_free(array);
                        return NULL;
                    }
                    array = new;
                    ++nitems;
                    break;
                } else if (next == input) {
                    /* Input starts with separator. */
                    item = "";
                    itemlen = 0;
                } else {
                    item = start;
                    itemlen = next - item;
                }
                char **new = str_array_append(array, nitems, item, itemlen);
                if (new == NULL) {
                    str_array_free(array);
                    return NULL;
                }
                array = new;
                ++nitems;
                start = next + seplen;
            }
        
            if (nitems == 0) {
                /* Input does not contain separator at all. */
                assert(array == NULL);
                array = str_array_append(array, nitems, input, strlen(input));
            }
        
            return array;
        }
        

        这是一个整体的整个程序。它还包括一个主程序 运行一些测试用例。

        #include <assert.h>
        #include <stdbool.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        
        
        /* Append an item to a dynamically allocated array of strings. On failure,
           return NULL, in which case the original array is intact. The item
           string is dynamically copied. If the array is NULL, allocate a new
           array. Otherwise, extend the array. Make sure the array is always
           NULL-terminated. Input string might not be '\0'-terminated. */
        char **str_array_append(char **array, size_t nitems, const char *item, 
                                size_t itemlen)
        {
            /* Make a dynamic copy of the item. */
            char *copy;
            if (item == NULL)
                copy = NULL;
            else {
                copy = malloc(itemlen + 1);
                if (copy == NULL)
                    return NULL;
                memcpy(copy, item, itemlen);
                copy[itemlen] = '\0';
            }
        
            /* Extend array with one element. Except extend it by two elements, 
               in case it did not yet exist. This might mean it is a teeny bit
               too big, but we don't care. */
            array = realloc(array, (nitems + 2) * sizeof(array[0]));
            if (array == NULL) {
                free(copy);
                return NULL;
            }
        
            /* Add copy of item to array, and return it. */
            array[nitems] = copy;
            array[nitems+1] = NULL;
            return array;
        }
        
        
        /* Free a dynamic array of dynamic strings. */
        void str_array_free(char **array)
        {
            if (array == NULL)
                return;
            for (size_t i = 0; array[i] != NULL; ++i)
                free(array[i]);
            free(array);
        }
        
        
        /* Split a string into substrings. Return dynamic array of dynamically
           allocated substrings, or NULL if there was an error. Caller is
           expected to free the memory, for example with str_array_free. */
        char **str_split(const char *input, const char *sep)
        {
            size_t nitems = 0;
            char **array = NULL;
            const char *start = input;
            char *next = strstr(start, sep);
            size_t seplen = strlen(sep);
            const char *item;
            size_t itemlen;
        
            for (;;) {
                next = strstr(start, sep);
                if (next == NULL) {
                    /* Add the remaining string (or empty string, if input ends with
                       separator. */
                    char **new = str_array_append(array, nitems, start, strlen(start));
                    if (new == NULL) {
                        str_array_free(array);
                        return NULL;
                    }
                    array = new;
                    ++nitems;
                    break;
                } else if (next == input) {
                    /* Input starts with separator. */
                    item = "";
                    itemlen = 0;
                } else {
                    item = start;
                    itemlen = next - item;
                }
                char **new = str_array_append(array, nitems, item, itemlen);
                if (new == NULL) {
                    str_array_free(array);
                    return NULL;
                }
                array = new;
                ++nitems;
                start = next + seplen;
            }
        
            if (nitems == 0) {
                /* Input does not contain separator at all. */
                assert(array == NULL);
                array = str_array_append(array, nitems, input, strlen(input));
            }
        
            return array;
        }
        
        
        /* Return length of a NULL-delimited array of strings. */
        size_t str_array_len(char **array)
        {
            size_t len;
        
            for (len = 0; array[len] != NULL; ++len)
                continue;
            return len;
        }
        
        
        #define MAX_OUTPUT 20
        
        
        int main(void)
        {
            struct {
                const char *input;
                const char *sep;
                char *output[MAX_OUTPUT];
            } tab[] = {
                /* Input is empty string. Output should be a list with an empty 
                   string. */
                {
                    "",
                    "and",
                    {
                        "",
                        NULL,
                    },
                },
                /* Input is exactly the separator. Output should be two empty 
                   strings. */
                {
                    "and",
                    "and",
                    {
                        "",
                        "",
                        NULL,
                    },
                },
                /* Input is non-empty, but does not have separator. Output should
                   be the same string. */
                {
                    "foo",
                    "and",
                    {
                        "foo",
                        NULL,
                    },
                },
                /* Input is non-empty, and does have separator. */
                {
                    "foo bar 1 and foo bar 2",
                    " and ",
                    {
                        "foo bar 1",
                        "foo bar 2",
                        NULL,
                    },
                },
            };
            const int tab_len = sizeof(tab) / sizeof(tab[0]);
            bool errors;
        
            errors = false;
        
            for (int i = 0; i < tab_len; ++i) {
                printf("test %d\n", i);
        
                char **output = str_split(tab[i].input, tab[i].sep);
                if (output == NULL) {
                    fprintf(stderr, "output is NULL\n");
                    errors = true;
                    break;
                }
                size_t num_output = str_array_len(output);
                printf("num_output %lu\n", (unsigned long) num_output);
        
                size_t num_correct = str_array_len(tab[i].output);
                if (num_output != num_correct) {
                    fprintf(stderr, "wrong number of outputs (%lu, not %lu)\n",
                            (unsigned long) num_output, (unsigned long) num_correct);
                    errors = true;
                } else {
                    for (size_t j = 0; j < num_output; ++j) {
                        if (strcmp(tab[i].output[j], output[j]) != 0) {
                            fprintf(stderr, "output[%lu] is '%s' not '%s'\n",
                                    (unsigned long) j, output[j], tab[i].output[j]);
                            errors = true;
                            break;
                        }
                    }
                }
        
                str_array_free(output);
                printf("\n");
            }
        
            if (errors)
                return EXIT_FAILURE;   
            return 0;
        }
        

        【讨论】:

        • 非常感谢您提供这段令人惊讶的全面、实用的代码。
        【解决方案4】:

        如果你知道分隔符的类型,例如逗号或分号,你可以试试这个:

        #include<stdio.h>
        #include<conio.h>
        int main()
        {
          int i=0,temp=0,temp1=0, temp2=0;
          char buff[12]="123;456;789";
           for(i=0;buff[i]!=';',i++)
           {
             temp=temp*10+(buff[i]-48);
           }
           for(i=0;buff[i]!=';',i++)
           {
             temp1=temp1*10+(buff[i]-48);
           }
           for(i=0;buff[i],i++)
           {
             temp2=temp2*10+(buff[i]-48);
           }
            printf("temp=%d temp1=%d temp2=%d",temp,temp1,temp2);
            getch();
          return 0;
        }
        

        输出:

        temp=123 temp1=456 temp2=789
        

        【讨论】:

          猜你喜欢
          • 2011-11-28
          • 1970-01-01
          • 2014-03-02
          • 2017-04-23
          • 1970-01-01
          • 1970-01-01
          • 2018-01-06
          • 1970-01-01
          相关资源
          最近更新 更多