使用 Protocol Buffers 描述符对象从 .proto 文件中读取注释答案

【问题标题】：Reading comments from .proto files using a Protocol Buffers descriptor object使用 Protocol Buffers 描述符对象从 .proto 文件中读取注释
【发布时间】：2015-09-23 14:40:01
【问题描述】：

我目前正在使用 Google Protocol Buffers 重新访问一个项目。

在项目中我想利用Protocol Buffers的Descriptors和Reflection的特性。

官方文档说明.proto文件的cmets可以读取：

使用函数DebugStringWithOptions()，在消息或描述符上调用。
使用函数GetSourceLocation()，在描述符上调用。

我无法检索 cmets，所以我认为我做错了什么，或者该功能尚未在 Protocol Buffers 中完全实现。

这里有一些代码sn-ps：

google::protobuf::DebugStringOptions options;
options.include_comments = true;
std::cout << "google::protobuf::Descriptor::DebugStringWithOptions(): "
          << message.descriptor()->DebugStringWithOptions(options) << std::endl
          << std::endl;

const google::protobuf::FieldDescriptor* field_descriptor{
    message.descriptor()->field(1)};

// TODO(wolters): Why doesn't this work?
google::protobuf::SourceLocation* source_location{
    new google::protobuf::SourceLocation};
field_descriptor->GetSourceLocation(source_location);

// if (field_descriptor->GetSourceLocation(source_location)) {
std::cout << "start_line: " << source_location->leading_comments
          << std::endl;
std::cout << "end_line: " << source_location->leading_comments << std::endl;
std::cout << "start_column: " << source_location->start_column << std::endl;
std::cout << "end_column: " << source_location->end_column << std::endl;
std::cout << "leading_comments: " << source_location->leading_comments
          << std::endl;
std::cout << "trailing_comments: " << source_location->trailing_comments
          << std::endl;
// }

我已经尝试在.proto 文件中对 cmets 使用以下两种语法，但它们似乎都不起作用：

MessageHeader header = 1;  // The header of this `Message`.

/**
 * The header of this `Message`.
 */
MessageHeader header = 1;

我使用的是 GCC 4.7.1（启用了 C++11 支持）和最新的 Protocol Buffers 版本 3.0.0-alpha-4.1。

有人可以引导我走向正确的方向和/或提供一个可行的例子吗？

编辑 2015-09-24：

在整理了官方文档中的Self Describing Messages 部分并测试了很多东西之后，在我看来，我对protobuf 描述符有了更好的理解。

如果以下一项或多项陈述不正确，请纠正我：

SelfDescribingMessage proto仅在另一端不知道 .proto 定义的情况下有用。
访问 proto 定义的 cmets 的唯一方法是使用 protoc 应用程序创建一个 .desc 文件。
要获得注释，只有当“top”元素是FileDescriptorSet、FileDescriptorProto 或FileDesriptor 时，才能使用GetSourceLocation 成员函数。如果这是正确的，Protocol Buffers 的 API 设计很差，因为 google::protobuf::Message 类是上帝类（提供对完整文件描述符 API 的访问，但根本不提供值）。
调用concrete_message.descriptor()->file() 不（和不能）包含源 cmets 信息，因为它不是编译代码的一部分。

在我看来，完成这项工作的唯一方法是：

使用参数调用 Message.proto 文件（引用所有其他消息）的 protoc：
```
--include_imports --include_source_info and --descriptor_set_out=message.desc
```
将message.desc 文件与应用程序/库一起发送，以便能够在运行时读取它（见下文）。
从该文件创建一个google::protobuf::FileDescriptorSet。
遍历FileDescriptorSet中的所有google::protobuf::FileDescriptorProto。
使用 google::protobuf::DescriptorPool::BuildFile() 将每个 FileDescriptorProto 转换为 google::protobuf::FileDescriptor。
使用Find… 函数之一查找消息和/或字段，应用于FileDescriptor 实例。
在消息/字段描述符实例上调用函数GetSourceLocation。
通过google::protobuf::SourceLocation::leading_comments 和google::protobuf::SourceLocation::trailing_comments 阅读cmets。

这对我来说似乎很复杂，所以我还有两个问题：

有没有办法在不使用 FileDescriptorSet 的情况下包含源信息？
是否可以将FileDescriptorSet 与具体的 Message 类/实例“连接”/设置，因为这会大大简化事情？

EDIT 2015-09-25:God Class 我的意思是Message 类和/或描述符类提供了或多或少无用的公共函数，因为它们在客户使用时不提供任何信息。以“正常”消息为例：所以生成的代码确实不包含源注释信息，因此所有描述符类（例如Descriptor和FieldDescriptor）中的GetSourceLocation方法完全没用.从逻辑的角度来看，如果处理消息，则应提供单独的实例 DescriptorLite 和 FieldDescriptorLite，如果处理来自 FileDescriptorSet 的信息，则应提供 Descriptor 和 FieldDescriptor（其源通常是从 .原型文件）。然后，[...]Lite 类将成为“普通”类的父类。 protoc 可能永远不会包含源 cmets 的论点强调了我的观点。

“连接”是指使用 .desc 文件中的描述符信息（始终是消息提供的描述符的超集）更新消息中的描述符信息的 API 函数，如果我理解正确的话）。

【问题讨论】：

是否可能与Protocol Buffers Compiler protoc有关？我刚刚偶然发现了protoc 参数-o 和--include_source_info。是否必须创建 FileDescriptorSet 才能检索 cmets？

标签： c++ c++11 protocol-buffers descriptor proto3

【解决方案1】：

听起来你基本上已经弄清楚了。

您正在深入了解协议编译器中的 API，这些 API 并不是真正为公众使用而设计的。它变得复杂，因为没有人编写帮助层来简化事情，因为没有多少人使用这些功能。

我不知道你所说的 Message 是“神级”是什么意思。 Message 只是 protobuf 实例的抽象接口。描述符描述了 protobuf 实例的类型。 Message::getDescriptor() 返回消息的类型，但除此之外，这些 API 之间没有太多直接联系...

有没有办法在不使用 FileDescriptorSet 的情况下包含源信息？

cmets 是有意从嵌入到生成代码中的描述符中剥离出来的，因此您需要单独运行解析器，生成描述符集并动态使用它。

是否可以使用具体的 Message 类/实例“连接”/设置 FileDescriptorSet，因为这会大大简化事情？

您的意思是您希望 Message::getDescriptor() 返回一个描述符，其中包含来自源文件的评论数据？这需要将评论数据嵌入到生成的代码中，这对于protoc 来说是微不足道的（它目前有意将它们剥离，所以它只需要不这样做）但可能臃肿且危险（可能会泄露使用 protobufs 构建的封闭源二进制文件的人的秘密）。

【讨论】：

感谢您的评论和解释！我再次更新了我的问题以进行澄清。我稍后会添加一个工作示例，也许有人可以查看它。
@FlorianWolters 感谢您的澄清。不过，我不同意这个建议：源位置是描述符接口中一个晦涩、很少使用的部分。专门为类层次结构增加复杂性以区分具有和不具有源信息的描述符是没有意义的。此外，一些消息子类（如 DynamicMessage）实际上可以在其描述符中包含源信息。（当然，我不再使用 protobufs 了，所以无论如何这不是我的决定。）