【发布时间】:2020-11-26 21:25:10
【问题描述】:
我正在尝试将带有波兰语字符(例如“ęśążćółń”)的 Excel 文件转换为普通字母“esazcoln”。首先我已经设法将 xlsx 文件转换为 txt,然后:
f = open("PATH_TO_TXT_FILE")
r = f.read()
r.upper()
new_word = ""
for char in r:
if char == "Ą":
new_word += "A"
elif char == "Ć":
new_word += "C"
elif char == "Ę":
new_word += "E"
elif char == "Ł":
new_word += "L"
elif char == "Ó":
new_word += "O"
elif char == "Ż" "Ź":
new_word += "Z"
elif char == "Ź":
new_word += "Z"
elif char == "Ś":
new_word += "S"
else:
new_word += char
encoded_bytes = r.encode('utf-8', "replace")
decoded = encoded_bytes.decode(
"cp1252", "replace")
print(decoded)
文件中写着:asdżółć
输出:asdÃ...¼Ã³Ã...‚ć
我想收到:asdzolc
有人可以帮我吗?
【问题讨论】:
-
你的意思是
r = r.upper()。但这是一种非常粗糙的方法;搜索涉及 Unicode 规范化的现有解决方案。 -
所需的字符串
asdzolc应该在new_word变量中......您可以简化删除重音,请参阅this anwer。那么只有Ł和ł字符应该被显式替换为L和l...
标签: python-3.x string utf-8 char cp1252