python中的java getBytes()等价物答案

【问题标题】：java's getByte() equivalent in pythonpython中的java getBytes()等价物
【发布时间】：2015-01-08 09:27:20
【问题描述】：

我是 python 的新手。我有一个接受字符串的 java 方法，将字符串转换为字节数组并返回字节数组。该方法看起来像这样。

private static byte[] convert(String str) {
        byte[] byteArray = str.getBytes();
        return byteArray;
    }

convert("sr_shah") 产生一个像这样的字节数组 115 114 95 115 104 97 104。在使用Charset.defaultCharset() 时，我知道我的机器的dfault 字符集是windows-1252。

现在我需要在 Python 中创建与上述方法完全等效的方法。我现在面临的问题是将字符串转换为字节数组。我无法在 python 中获得 java 的 getBytes() 等价物。我在互联网上搜索并从 stackoverflow 之前关于将字符串转换为字节数组的帖子中获得了很多帮助，但不幸的是，它们都不适合我。

我使用的方法是bytearray(),bytes(),str.encode()。我使用了像这样的编码 windows-1252,utf_16,utf_8,utf_16_le,utf_16_be,iso-8859-1 不幸的是，它们都没有像我预期的那样给出正确的结果（即，就像我从 java getBytes() 得到的字节数组一样）。我不明白我在做什么错事。这就是我在 python 中尝试的方式。

>>> bytearray('sr_shah','windows-1252')
bytearray(b'sr_shah')
>>> bytearray('sr_shah','utf_8')
bytearray(b'sr_shah')
>>> bytearray('sr_ahah','utf_16')
bytearray(b'sr_ahah')
>>> bytearray('sr_shah','utf_16_le')
bytearray(b'sr_shah')
>>> name = 'sr_shah'
>>> name.encode('windows-1252')
'sr_shah'
>>> name.encode('utf_8')
'sr_shah'
>>> name.encode('latin_1')
'sr_shah'
>>> name.encode('iso-8859-1')
'sr_shah'
>>> name.encode('utf-8')
'sr_shah'
>>> name.encode('utf-16')
'\xff\xfes\x00r\x00_\x00s\x00h\x00a\x00h\x00'
>>> name.encode('utf-16-le')
's\x00r\x00_\x00s\x00h\x00a\x00h\x00'
>>>

请帮助我进行正确的转换。

【问题讨论】：

[B@1ce59895 不是编码字符串。它看起来更像是编码字符串的地址的表示。
问题是您不是在查看字节数组，而只是查看它们的字符串表示形式。 Java 的 byte[].toString() 实现返回不是很有意义的 "[B@" + Integer.toHexString(hashCode()) 表示，不清楚为什么要重现那个输出。看起来 python 生成了一个字符串表示形式，以与基于 ASCII 的字符和编码的原始字符串相同的方式显示字节数组的内容。如果你想比较数组的字节值，你必须将每个字节打印为数字（在两种语言中）。
您确定[B@1ce59895 不是指地址吗？我认为这意味着地址为0x1ce59895的字节数组。
对不起，我只发布了地址。现在我刚刚编辑了。

标签： java python string bytearray data-conversion

【解决方案1】：

你可以这样做：

str = 'sr_shah'
b = [ord(s) for s in str]
print b

**Output**

[115, 114, 95, 115, 104, 97, 104]

ord() built-in function 与您想要的 getByte() 函数一样接近，尽管它适用于单个字符，因此您需要自己处理数组。

【讨论】：

【解决方案2】：

您在 Python 中创建的 bytearray 包含您想要的字节。要查看它们的十进制表示，请逐个打印字节：

>>> for x in bytearray('sr_shah','windows-1252'): print(x)
...
115
114
95
115
104
97
104

【讨论】：

我不明白，为什么System.out.println(secret.getBytes()); 和for x in bytearray(secret,'windows-1252'): print(x) 的输出不同，secret='secret'.. 谁能帮我理解