避免 C++11 原始字符串文字中的第一个换行符？答案

【问题标题】：avoiding the first newline in a C++11 raw string literal?避免 C++11 原始字符串文字中的第一个换行符？
【发布时间】：2014-09-12 19:17:15
【问题描述】：

C++11 中的原始字符串文字非常好，除了格式化它们的明显方式导致多余的换行符 \n 作为第一个字符。

考虑这个例子：

    some_code();
    std::string text = R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

显而易见的解决方法看起来很丑：

    some_code();
    std::string text = R"(This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

有没有人找到一个优雅的解决方案？

【问题讨论】：

我不记得\\n 是否被替换为空格，或者只是加入没有任何空格的行。
原始字符串中的所有字符，包括换行符和`\`都按字面意思解释。
把R"(放到下一行
@chris 公平地说，该注释本身并没有直接指定任何内容。确实在 2.5 [lex.pptoken] 中：“如果下一个字符开始的字符序列可能是原始字符串文字的前缀和初始双引号，例如 R”，则下一个预处理标记应为原始字符串文字.在原始字符串的初始和最终双引号字符之间，在阶段 1 和 2 中执行的任何转换（三元组、通用字符名称和行拼接）都将被还原；在识别出任何 d-char、r-char 或定界括号之前，此还原应适用。"
在其余代码中没有缩进的字符串文字并没有什么优雅的，但是如果您想要这样的多行原始字符串文字 - 无论出于何种原因，不要遵循 Bryan 的理智建议 - 少得到你想要的东西的理智方法是= 1 + R"(....

标签： c++ c++11

【解决方案1】：

是的，这很烦人。也许应该有原始文字 (R"PREFIX(") 和 multiline 原始文字 (M"PREFIX)。

我想出了这个几乎可以描述自己的替代方案：

#include<iterator> // std::next
...
{
    ...
    ...
    std::string atoms_text = 
std::next/*_line*/(R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");
    assert( atoms_text[0] != '\n' );
    ...
}

限制：

如果原始文字为空，则会生成无效字符串。但这应该很明显。
如果原始文字不以新行开头，则它将吃掉第一个字符。
std::next 是 constexpr 仅来自 C++17，然后您可以使用 1+(char const*)R"XYZ(" 但它不是很清楚，可能会产生警告。

constexpr auto atom_text = 1 + (R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");

此外，没有任何保证 ;) 。毕竟，我不知道用指向静态数据的指针做算术是否合法。

+ 1 方法的另一个优点是它可以放在最后：

constexpr auto atom_text = R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ" + 1;

无限可能：

constexpr auto atom_text = &R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ"[1];

constexpr auto atom_text = &1[R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ"];

【讨论】：

【解决方案2】：

您可以通过将 1 添加到字符串文字自动转换为的 const char* 来获得指向第二个字符的指针 - 跳过前导换行符：

    some_code();
    std::string text = 1 + R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

恕我直言，以上内容在破坏周围代码的缩进方面存在缺陷。一些语言提供了一个内置函数或库函数，可以执行以下操作：

删除一个空的引导线，并且
查看第二行的缩进并从所有其他行中删除相同数量的缩进

这允许使用如下：

some_code();
std::string text = unindent(R"(
    This is the first line.
    This is the second line.
    This is the third line.
    )");
more_code();

写这样一个函数比较简单……

std::string unindent(const char* p)
{
    std::string result;
    if (*p == '\n') ++p;
    const char* p_leading = p;
    while (std::isspace(*p) && *p != '\n')
        ++p;
    size_t leading_len = p - p_leading;
    while (*p)
    {
        result += *p;
        if (*p++ == '\n')
        {
            for (size_t i = 0; i < leading_len; ++i)
                if (p[i] != p_leading[i])
                    goto dont_skip_leading;
            p += leading_len;
        }
      dont_skip_leading: ;
    }
    return result;
}

（有点奇怪的p_leading[i] 方法旨在让使用制表符和空格的人的生活不会比他们自己的生活更难；-P，只要行以相同的顺序开头。）

【讨论】：

是 1 个字符吗？还是2？
@LightnessRacesinOrbit：在所讨论的上下文中，一个。编译器可能会安排在输出期间为每个'\n' 生成不同的回车和换行符，但是我相信上面介绍的代码和技术是可移植的，无论以后的转换如何。如果您有理由不相信，请分享。
我真的只是在提出一个话题。我们确定只有换行符吗？这有保证吗？还是取决于源文件的编码？如果是后者，那么您的 [first] 解决方案过于乐观。
答案是here。它是 1 个字符，单个 '\n'。
为使用 goto 点赞 :-)

【解决方案3】：

接受的答案会产生来自clang-tidy 的警告cppcoreguidelines-pro-bounds-constant-array-index。详情请见Pro.bounds: Bounds safety profile。

如果您没有 std::span 但您至少使用 C++17 进行编译，请考虑：

constexpr auto text = std::string_view(R"(
This is the first line.
This is the second line.
This is the third line.
)").substr(1);

主要优点是可读性 (恕我直言)，并且您可以在其余代码中打开该 clang-tidy 警告。

使用gcc，如果有人无意中将原始字符串缩减为空字符串，则使用此方法会出现编译器错误 (demo)，而接受的方法要么不产生任何结果 (demo)，要么取决于您的编译器设置“超出常量字符串范围”警告。

【讨论】：

"或根据您的编译器设置，发出“超出常量字符串范围”警告。" - 什么编译器设置会产生这样的警告？仍然保证空字符串（由 C++ 标准）以空值终止....

【解决方案4】：

我遇到了同样的问题，我认为以下解决方案是上述所有解决方案中最好的。我希望它对你也有帮助（见评论中的例子）：

/**
 * Strips a multi-line string's indentation prefix.
 *
 * Example:
 * \code
 *   string s = R"(|line one
 *                 |line two
 *                 |line three
 *                 |)"_multiline;
 *   std::cout << s;
 * \endcode
 *
 * This prints three lines: @c "line one\nline two\nline three\n"
 *
 * @author Christian Parpart <christian@parpart.family>
 */

inline std::string operator ""_multiline(const char* text, unsigned long size) {
  if (!*text)
    return {};

  enum class State {
    LineData,
    SkipUntilPrefix,
  };

  constexpr char LF = '\n';
  State state = State::LineData;
  std::stringstream sstr;
  char sep = *text++;

  while (*text) {
    switch (state) {
      case State::LineData: {
        if (*text == LF) {
          state = State::SkipUntilPrefix;
          sstr << *text++;
        } else {
          sstr << *text++;
        }
        break;
      }
      case State::SkipUntilPrefix: {
        if (*text == sep) {
          state = State::LineData;
          text++;
        } else {
          text++;
        }
        break;
      }
    }
  }

  return sstr.str();
}

【讨论】：

我在示例中没有看到_multiline，代码是否偏离了 cmets？
不错的代码。我仍然喜欢当前的1 + 解决方案，因为它也可以用于分配给constexpr char*。
@Hugues，您仍然可以在该代码中使用 constexpr_string 类（而不是 std::stringstream），但我相信，这个功能绝对应该在 C++ 标准库中。

【解决方案5】：

我能看到的最接近的是：

std::string text = ""
R"(This is the first line.
This is the second line.
This is the third line.
)";

如果在分隔符序列中允许有空格会更好一些。给出或接受缩进：

std::string text = R"
    (This is the first line.
This is the second line.
This is the third line.
)
    ";

我的预处理器会给你一个警告，但不幸的是它有点没用。 Clang 和 GCC 完全被抛弃了。

【讨论】：

【解决方案6】：

我推荐@Brian 的答案，特别是如果您只需要几行文本，或者您可以使用文本编辑器处理的文本。如果不是这样，我有一个替代方案。

    std::string text =
"\
This is the first line." R"(
This is the second line.
This is the third line.)";

Live example

原始字符串文字仍然可以与“普通”字符串文字连接，如代码所示。开头的 "\ 旨在“消除”第一行中的 " 字符，而是将其放在自己的一行中。

不过，如果我要决定，我会将这样的 lota-text 放入一个单独的文件中，并在运行时加载它。不过对你没有压力:-)。

^{^{^{另外，这是我这几天写的最难看的代码之一。}}}

【讨论】：

【解决方案7】：

这可能不是您想要的，但以防万一，您应该注意自动字符串文字连接：

    std::string text =
"This is the first line.\n"
"This is the second line.\n"
"This is the third line.\n";

【讨论】：

为什么不缩进呢？ ;)
请注意，它也适用于原始文字，因此 R"(...)\n" R"(...)\n" ... 也是一种可能性。
@MatthieuM。定界引号和括号之间的空格是为定界字符序列保留的。
@MatthieuM。看来您打算将\n 放在那里并将其编码为实际的字符串。
@Potatoswatter：啊！是的，确实，\n 应该被解释，所以 R"(...)""\n" R"(...)""\n" ...