python3自定义编码mik-bulgarian答案

【问题标题】：python3 custom encoding mik-bulgarianpython3自定义编码mik-bulgarian
【发布时间】：2020-12-04 13:36:11
【问题描述】：

我正在尝试使用 python 3.8 解码 MIK-BULGARIAN 编码 https://en.wikipedia.org/wiki/MIK_(character_set) 的文件。它是一种与 ASCII 相同的编码，但字节 128-191 是西里尔字母。该文件有拉丁字母和西里尔字母。我当前的解决方案运行良好，但处理大文件时速度较慢。你能给我一些如何加快速度的建议吗（我知道这是伐木工人的方法，我愿意接受建议）。

def opener(filename):

    f = open(filename, "rb")
    filetext = f.read()
    cadText = translate(filetext)
    f.close()
    return cadText

mikdict = {
    128: "А",
    129: "Б",
    130: "В",
    131: "Г",
    132: "Д",
    ....
    188: "ь",
    189: "э",
    190: "ю",
    191: "я"
  }
def translate(textbytes):
    goodText = ""
    for txtbyte in textbytes:
        if (txtbyte >= 128) and (txtbyte <= 191):
            letter = str(mikdict.get(txtbyte))
        else:
            letter = chr(txtbyte)
        goodText = goodText + letter

【问题讨论】：

见stackoverflow.com/questions/38777818/…
这能回答你的问题吗？ str.translate gives TypeError - Translate takes one argument (2 given), worked in Python 2

标签： python character-encoding

【解决方案1】：

[code]显然正确的答案是使用 map() 和 lambda，因为它似乎比我最初的 sn-p 更有效。

def translate(input):
    newChars = map(lambda x: bytes([x]) if (x < 128) else bytes(mik.mikdict.get(x), "utf-8") if (x <= 191) and (x >= 128) else b"", input)
    res = b''.join(newChars).decode("utf-8")
return res

【讨论】：