【发布时间】:2016-10-04 16:10:36
【问题描述】:
我正在尝试从 rockyou 单词列表中读取并将所有 >= 8 个字符的单词写入一个新文件。
这里是代码 -
def main():
with open("rockyou.txt", encoding="utf8") as in_file, open('rockout.txt', 'w') as out_file:
for line in in_file:
if len(line.rstrip()) < 8:
continue
print(line, file = out_file, end = '')
print("done")
if __name__ == '__main__':
main()
有些词不是 utf-8。
Traceback (most recent call last):
File "wpa_rock.py", line 10, in <module>
main()
File "wpa_rock.py", line 6, in main
print(line, file = out_file, end = '')
File "C:\Python\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0e45' in position
0: character maps to <undefined>
更新
def main():
with open("rockyou.txt", encoding="utf8") as in_file, open('rockout.txt', 'w', encoding="utf8") as out_file:
for line in in_file:
if len(line.rstrip()) < 8:
continue
out_file.write(line)
print("done")
if __name__ == '__main__':
main()```
Traceback (most recent call last):
File "wpa_rock.py", line 10, in <module>
main()
File "wpa_rock.py", line 3, in main
for line in in_file:
File "C:\Python\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 933: invali
d continuation byte
【问题讨论】:
-
这是一个错字。它应该是
utf-8而不是utf8 -
不知道是不是。使用任何一个都会导致相同的错误。
-
您在该位置必须有一个无效字符。您应该显示您正在尝试读取的文件。
-
@Arpan:不,不是。
'utf8'或'utf-8'都可以,一个是另一个的别名。 -
@MarkEvans 您是否尝试在打开输出文件时添加
encoding="utf8"。我现在没有windows机器,所以不能查。
标签: python python-3.x unicode character-encoding