如何在 c++ 中的 cout/cerr 上打印 USB 字符串描述符？答案

【问题标题】：How to print a USB string descriptor on cout/cerr in c++?如何在 c++ 中的 cout/cerr 上打印 USB 字符串描述符？
【发布时间】：2016-10-17 03:19:17
【问题描述】：

我在 uint8_t 数组中有一个 USB 字符串描述符。例如：

0000:12 03 34 00 45 00 36 00 31 00 42 00 43 00 30 00 ..4.E.6.1.B.C.0.
0010:30 00                                           0.

（前两个字节是长度和描述符类型；其余字节是 uint16_t 字符。）

我想尽可能少地在终端上打印它，最好不要乱搞所有其他打印（就像cout << "Hello, world" << endl;）

我特别想做：

cout << "Serial number is: " << some_cast_or_constructor( buf + 2, len - 2 ) << endl;

对于上面的字符串描述符，在终端上获取以下内容：

Serial number is: 4E61BC00

这可能吗，还是我必须深入研究 Unicode 奥秘？

[编辑添加：]

根据@PaulMcKenzie，我尝试了这个程序：

#include <iostream>
#include <fstream>
#include <exception>
#include <string>
#include <locale>

int
main( int argc, char **argv )
{
    char    buf[] = { 34, 00, 45, 00, 36, 00, 31, 00, 42, 00, 43, 00, 30, 00, 30, 00 };

    std::wcout << "Hello" << std::wstring( (const wchar_t *)buf, sizeof(buf) ) << std::endl;

    return 0;
}

输出：

user:/tmp$ g++ foo.cc
user:/tmp$ ./a.out 
Hello??????????
user:/tmp$

【问题讨论】：

使用std::wcout，而不是std::cout。
你知道uint16_t字节是什么吗？例如 UTF-16？
我不确定...这是我编写的 USB 代码，但描述符定义为汇编语言 .string16 "abcd"。 hexdump 正是我在内存缓冲区中的内容。我尝试了 std::wcout （根据@PaulMcKenzie），但我得到了一堆？标记。
Works for Visual Studio 2015
在 Linux (Debian)、gcc-4.9.2 上运气不好。在 MacOSX 上，我得到 Hello[nothing]。我想是时候进行一些挖掘了。（我猜这很可能是一个终端问题。）

标签： c++ unicode character-encoding stdio

【解决方案1】：

在您的源代码中，我检测到两个错误： 1- 在您的 USB 原始数据（在顶部）中，值是十六进制的，而在您的 buf[] 值中是十进制的。应该写成：

    char    buf[] = { 0x34, 0x00, 0x45, 0x00, 0x36, 0x00, 0x31, 0x00, 0x42,
                      0x00, 0x43, 0x00, 0x30, 0x00, 0x30, 0x00 };

2- 在您的打印消息中，长度等于 sizeof(buf) 但它是“char”（1 个字节）而不是“wchar_t”（2 个字节）。应该写成：

std::wcout << "Hello" << std::wstring( (const wchar_t *)buf, (sizeof(buf) >> 1) ) << std::endl;

而且，此代码在 Windows PC 上给出了预期结果...确保在您的计算机上管理“wchar_t”之前没有大/小端转换。

你能检查一下 Linux 下的 sizeof(wchar_t) 吗？这个帖子 'Difference and conversions between wchar_t for Linux and for Windows' 假设 wchar_t 是一个 32 位的值。

【讨论】：

糟糕...十进制而不是十六进制是一个愚蠢的错误！从 hexdump 剪切和粘贴并不完全正确。但是，即使您进行了更正，它在 g++/Linux 上也不起作用（我还尝试通过将 0 字节从数组末尾移动到开头来交换字节顺序）。我想我得学习更多关于多字节字符和 I/O 的知识。
哈！我刚开始研究这个，我做的第一件事就是打印出 sizeof(wchar_t)。它是 4，所以这是我的第一个问题。 USB 使用 UNICODE（每 USB-2.0 秒 9.6.7），但我真正了解的是我见过的每个示例都使用 .string16。我想是时候了解 UNICODE 真正是如何工作的了！
（关于字符集转换的 GCC/libstdc++ 文档）[gcc.gnu.org/onlinedocs/libstdc++/manual/…
wchar_t 大小的简单实现细节似乎一再让人困惑。许多系统使用两字节无符号整数类型来表示宽字符，并使用 Unicode 或 UCS2 的内部编码。（请参阅 AIX、Microsoft NT、Java 等。）其他系统使用四字节无符号整数类型来表示宽字符，并使用 UCS4 的内部编码。（特别是使用 glibc 的 GNU/Linux 系统。）C 编程语言（以及因此 C++）没有为 wchar_t 类型指定特定大小。因此，可移植的 C++ 代码也不能假定字节大小（或字节序）。

【解决方案2】：

如果您因为在 Linux 上遇到 Unicode、宽字符和类似问题而遇到此问题，那么我发现前进的最快方法是使用 libiconv。您将在 C++ 文档中阅读到的 <codecvt> 头文件尚未在 GNU libstdc++ 中实现（截至 2016 年 10 月）。

这是一个演示libiconv的快速示例程序：

#include <iostream>
#include <locale>
#include <cstdint>
#include <iconv.h>
#include <string.h>

int
main( int, char ** )
{
    const char       a[] = "ABC";
    const wchar_t    b[] = L"ABC";
    const char       c[] = u8"ABC";
    const char16_t   d[] = u"ABCDEF";
    const char32_t   e[] = U"ABC";
    iconv_t          utf16_to_utf32 = iconv_open( "UTF-32", "UTF-16" );
    wchar_t          wcbuf[32];
    char            *inp = (char *)d;
    size_t           inl = sizeof(d);
    char            *outp = (char *)wcbuf;
    size_t           outl = sizeof(wcbuf);

    iconv( utf16_to_utf32, &inp, &inl, &outp, &outl );

    std::wcout << "sizeof(a) = " << sizeof(a) << ' ' << a << std::endl
               << "sizeof(b) = " << sizeof(b) << ' ' << b << std::endl
               << "sizeof(c) = " << sizeof(c) << ' ' << c << std::endl
               << "sizeof(d) = " << sizeof(d) << ' ' << d << std::endl
               << "sizeof(e) = " << sizeof(e) << ' ' << e << std::endl
               << "Converted char16_t to UTF-32: " << std::wstring( wcbuf, (wchar_t *)outp - wcbuf ) << std::endl;

    iconv_close( utf16_to_utf32 );

    return 0;
}

结果输出：

user@debian:~/code/unicode$ ./wchar 
sizeof(a) = 4 ABC
sizeof(b) = 16 ABC
sizeof(c) = 4 ABC
sizeof(d) = 14 0x7ffefdae5a40
sizeof(e) = 16 0x7ffefdae5a30
Converted char16_t to UTF-32: ABCDEF
user@debian:~/code/unicode$

请注意，std::wcout 不能正确打印 char16_t 或 char32_t。但是，您可以使用 iconv 将 UTF-16（这显然是您从 u"STRING" 得到的）转换为 UTF-32（这显然与最新型号 Linux 系统上的 wchar_t 兼容）。

【讨论】：