在结构中存储可变大小的字符串答案

【问题标题】：Storing variable sized strings in structures在结构中存储可变大小的字符串
【发布时间】：2009-12-01 18:35:48
【问题描述】：

我在 C++ 中使用流读取文件，特别是 fstream，而不是 ifstream。

blah blah blah\n
blah blah\n
blah blah blah blah \n
end

这会一遍又一遍地重复

每行中 blah 的可变数量，
每端之间的行数不变，end是这里的分隔符

我想读取一组数据，然后将其存储在字符数组中，采用 C 风格的结构。我首先尝试使用 getline() 但分隔符只能是一个字符，而不是三个。我显然不能尝试仅使用 read() 读取一组字节数，因为每组的字节数会有所不同。

所以我对在这里做的最简单（也是最强大）的事情感到困惑。我是否应该调用 getline 直到找到一个“结束”字符串，同时一遍又一遍地附加每个字符串？

我尝试了一个 2D 字符数组，但我复制到它有点痛苦。我可以在这里使用 strncpy 吗？我认为这行不通

char buf[10][10];
strncpy(buf[1], "blah blah",10);

我在这里有一些想法，但我只是不确定哪一个（或我没有想到的）是最好的。

编辑：所以这是针对网络应用程序的，因此 char 数组（或字符串）的大小应该始终相同。此外，结构中不应有指针。

相关问题：char数组和std::string在内存中的存储方式是否相同？我总是认为 std::string 有一些开销。

【问题讨论】：

您可以使用 c_str 方法获得一个 C 风格的字符串，一个 char 指针，仅用于从 std::string 中读取。 data 方法也有效，但不保证空终止。字符串如何准确地存储在内存中是一个实现细节（例如，有些做引用计数），但到目前为止，最流行的归结为 char 数组。但是，使用 std::string 并让它管理数组仍然更容易。

标签： c++ string

【解决方案1】：

好吧，您说的是“C 风格的结构”，但也许您可以使用std::string？

#include <fstream>
#include <iostream>
#include <string>
#include <vector>

int main(void)
{
    std::fstream file("main.cpp");
    std::vector<std::string> lines;

    std::string line;
    while (getline(file, line))
    {
        if (line == "end")
        {
            break;
        }

        std::cout << line << std::endl;
        lines.push_back(line);
    }

    // lines now has all the lines up-to
    // and not including "end"

/* this is for reading the file
end

some stuff that'll never get printed
or addded blah blah
*/
};

【讨论】：

您的文件读取逻辑错误 - 考虑如果文件为空会发生什么情况。巧合的是，我刚刚在 punchlet.wordpress.com 上写了一篇关于这个问题的博客
耶，尼尔评论道！ :D 你好，巴特沃思先生。

【解决方案2】：

我建议使用字符串而不是 char 数组。

【讨论】：

【解决方案3】：

（底部描述了我的push_back 实用程序。）

typedef std::vector<std::string> Block;

int main() {
  using namespace std;

  vector<Block> blocks;
  string const end = "end";

  // no real difference from using ifstream, btw
  for (fstream file ("filename", file.in); file;) {
    Block& block = push_back(blocks);
    for (string line; getline(file, line);) {
      if (line == end) {
        break;
      }
      push_back(block).swap(line);
    }
    if (!file && block.empty()) {
      // no lines read, block is a dummy not represented in the file
      blocks.pop_back();
    }
  }

  return 0;
}

序列化示例：

template<class OutIter>
void bencode_block(Block const& block, OutIter dest) {
  int len = 0;
  for (Block::const_iterator i = block.begin(); i != block.end(); ++i) {
    len += i->size() + 1; // include newline
  }
  *dest++ = len;
  *dest++ = ':';
  for (Block::const_iterator i = block.begin(); i != block.end(); ++i) {
    *dest++ = *i;
    *dest++ = '\n';
  }
}

我使用了一个简单的bencoding 序列化格式。示例合适的输出迭代器，它只是写入流：

struct WriteStream {
  std::ostream& out;
  WriteStream(std::ostream& out) : out(out) {}

  WriteStream& operator++() { return *this; }
  WriteStream& operator++(int) { return *this; }
  WriteStream& operator*() { return *this; }

  template<class T>
  void operator=(T const& value) {
    out << value;
  }
};

使用示例：

bencode_block(block, WriteStream(std::cout));

另一个可能的输出迭代器，它写入file descriptor（例如网络套接字）：

struct WriteFD {
  int out;
  WriteFD(int out) : out(out) {}

  WriteFD& operator++() { return *this; }
  WriteFD& operator++(int) { return *this; }
  WriteFD& operator*() { return *this; }

  template<class T>
  void operator=(T const& value) {
    if (write(value) == -1) {
      throw std::runtime_error(strerror(errno));
    }
  }

  //NOTE: write methods don't currently handle writing less bytes than provided
  int write(char value) {
    return write(out, &value, 1);
  }
  int write(std::string const& value) {
    return write(out, value.data(), value.size());
  }
  int write(int value) {
    char buf[20];
    // handles INT_MAX up to   9999999999999999999
    // handles INT_MIN down to -999999999999999999 
    // that's 19 and 18 nines, respectively (you did count, right? :P)
    int len = sprintf(buf, "%d", value);
    return write(out, buf, len);
  }
};

穷人的移动语义：

template<class C>
typename C::value_type& push_back(C& container) {
  container.push_back(typename C::value_type());
  return container.back();
}

这允许轻松使用移动语义来避免不必要的复制：

container.push_back(value); // copies
// becomes:
// (C is the type of container)
container.push_back(C::value_type()); // add empty
container.back().swap(value); // swap contents

【讨论】：

我非常喜欢这个答案，但是如何将它与网络应用程序一起使用？如果我将 &blocks 转换为 void* ，我可以简单地将其转换回接收器并期望它工作吗？
没有。您将以您使用的任何格式发送每个块，并连接其中的行。上面包含的简化示例。

【解决方案4】：

这确实是您描述的解析问题。一旦你意识到问题所在，你就已经是解决问题的大部分了。

很难对您进行更具体的说明，因为您并没有真正描述您需要对这些数据做什么。但通常你可以做简单的内联解析。在这种情况下，也许您需要一个小程序来识别“blah”、EOL 和“end”，并告诉您它在给定的字符串位置找到了哪个。

然后你可以有一个 parse_line 例程来识别整行（期望任意数量的“blah”以 EOL 结尾）。

然后你可以有一个解析例程调用 parse_line 你给定的次数（10？），然后如果没有找到“end”就会出错。

【讨论】：