C ++中的字符串标记化引发段错误答案

【问题标题】：string tokenization in c++ throws a seg faultC ++中的字符串标记化引发段错误
【发布时间】：2018-11-19 02:19:14
【问题描述】：

我想写一个按标记分解字符串的函数，到目前为止我想出了以下内容：

#include <cstring>
#include <iostream>
#include <vector>
#define MAXLEN 20

void mytoken(std::string input, std::vector<std::string> & out);

int main() 
{
    std::vector<std::string> out;
    std::string txt = "XXXXXX-CA";
    mytoken(txt, out);
    std::cout << "0: " << out[0] <<std::endl;
    std::cout << "1: " << out[1] <<std::endl;
}

void mytoken(std::string instr, std::vector<std::string> & out) {
    std::vector<std::string> vec;
    char input[MAXLEN] = {0};
    strcpy(input, instr.c_str());
    char *token = std::strtok(input, "-");
    while (token != NULL) {
        std::cout << token << '\n';
        token = std::strtok(NULL, "-");
        out.push_back(token);
    }    
}

产生以下输出：

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
XXXXXX
CA
bash: line 7: 21987 Aborted                 (core dumped) ./a.out

我想知道为什么会这样。

【问题讨论】：

所以你有一个核心转储。您的调试器对此有何看法？
看起来你扔掉了第一个标记，再次搜索-，没有，你尝试用NULL构造一个字符串。您确定不想将push_back 移到第二个strtok 上方吗？
@cerr 我想编写一个函数，通过标记分解字符串 -- 使用strtok 是最糟糕的方法之一。
@cerr strtok 1) 销毁原始输入。 2) 不是线程安全的 3) 您不能在嵌套调用中使用strtok 两次或更多次，因为它使用静态缓冲区。这是一个最好留给想要处理这种混乱的新手和C 编码人员的功能。您正在使用 C++，并且有更多更好的方法可以使用 C++ 提供的内容来标记字符串。
是的，所有这些解决方案似乎都没有使用strtok。我敢打赌，即使C 程序员也不会介意看到strtok 死掉（当然很多代码会中断）。

标签： c++ segmentation-fault token strtok

【解决方案1】：

最好使用 'c++-style' 函数：它更简单，更易读：

#include <sstream>

void mytoken(std::string instr, std::vector<std::string> & out)
{
    std::istringstream ss(instr);
    std::string token;
    while(std::getline(ss, token, '-'))
    {
        std::cout << token << '\n';
        out.push_back(token);
    }
}

为了让您的示例正常工作，您需要更改循环中的操作顺序：

//...
while(token != NULL)
{
    out.push_back(token);
    std::cout << token << '\n';
    token = std::strtok(NULL, "-");
}

【讨论】：