目录名称在 for 循环中未更改答案

【问题标题】：Directory names are not changed in for loop目录名称在 for 循环中未更改
【发布时间】：2020-09-27 16:43:52
【问题描述】：

我正在尝试以递归方式重命名目录中的所有子目录和文件，特别是为了摆脱德语变音符号并将它们替换为“安全”对应项（即用“ue”替换“ü”）。

为了重命名，我找到了以下代码

def remove_umlaut(string):
    """
    Removes umlauts from strings and replaces them with the letter+e convention
    :param string: string to remove umlauts from
    :return: unumlauted string
    """
    u = 'ü'.encode()
    U = 'Ü'.encode()
    a = 'ä'.encode()
    A = 'Ä'.encode()
    o = 'ö'.encode()
    O = 'Ö'.encode()
    string = string.encode()
    string = string.replace(u, b'ue')
    string = string.replace(U, b'Ue')
    string = string.replace(a, b'ae')
    string = string.replace(A, b'Ae')
    string = string.replace(o, b'oe')
    string = string.replace(O, b'Oe')
    string = string.decode('utf-8')
    return string

单独测试时也有效。我的递归重命名函数如下所示：

def renameInvalid(root):
    for f in os.listdir():
        old = f 
        f = remove_umlaut(f)
        if old != f:                              
            os.rename(old,f)                
            print("renamed " + old + " to " + f )
        if os.path.isdir(f):
            os.chdir(f)
            renameInvalid(".")
            os.chdir("..")

当我在解释器中对此进行测试时，问题似乎是在迭代 os.listdir() 时，无法更改字符串。既不具有上述功能，也不具有正则表达式。

在 Mac 和 Windows 上对此进行了测试。

错在哪里？

【问题讨论】：

如果你在解释器中运行它，你会得到什么输出？您的调试尝试结果如何？
所有字符串都保持不变。 renameInvalid 中的 print 语句总是打印两个相同的字符串。当我在解释器中执行这些步骤时也会发生同样的情况。我还在 remove_umlaut 函数中插入了打印语句，并且字符串在任何时候都没有改变。
你所说的“单独测试”，而不是“在口译员中”到底是什么意思？
哦，我明白这有多令人困惑。在解释器中，如果我在输入的任何字符串上使用“remove_umlaut”，它就会起作用。但是，当我尝试在解释器中遍历 os.listdir() 并在该 for 循环中应用该函数时，它什么也没做。
打印字符串的repr()，而不是字符串本身，以查看确切它们包含哪些字符。

标签： python replace diacritics listdir

【解决方案1】：

试试这个：

from pathlib import Path

def remove_umlaut(string_with_umlaut: str) -> str:
    """
    Removes umlauts from strings and replaces them with the letter+e convention
    :param string_with_umlaut: string to remove umlauts from
    :return: unumlauted string
    """
    umlaut_alternatives = {
         'ü': 'ue',
         'Ü': 'Ue',
         'ä': 'ae',
         #... etc,
    }
    string_without_umlaut = string_with_umlaut
    for umlaut, no_umlaut in umlaut_alternatives.items():
        string_without_umlaut = string_without_umlaut.replace(umlaut, no_umlaut)
    return string_without_umlaut

def rename_invalid(root: str):

    for file in Path(root).glob("**/*"):
        file_new = remove_umlaut(file.name)
        if file_new != file.name:
            print(f"Rename {file.name} to {file_new}")
            file.rename(file.parent / file_new)

【讨论】：

这对你有用吗？在我这边，同样的行为发生了。如果我在返回语句之前插入语句print(string_without_umlaut)，它将打印一堆未更改的名称。我在想，也许我的问题与编码无关，但还有其他问题。
请注意，这里只有在 for 循环中调用时才不起作用。
关于文件夹结构，我将file.rename(file_new) 更改为file.rename(file.parent / file_new)。关于您的其他问题，我建议直接从您的打印语句中复制一个变音符号，然后将其粘贴到字典中，看看它现在是否适用于该字符作为测试。也许它们是多个看起来相同但具有不同 unicode 值的字符？
例如用fileformat.info/info/unicode/char/0308/index.htm 代替fileformat.info/info/unicode/char/fc/index.htm 来制作ü 或许？
例如使用组合变音符号'ü'.encode() 给我b'u\xcc\x88' 而'ü'.encode() 给我b'\xc3\xbc'

【解决方案2】：

感谢丹在 cmets 的帮助，我想通了。

带有元音变音的字母可以在 UTF-8 中以两种方式表示，预组合和分解。前者将它们表示为一个字符，而后者将它们表示为它们的“正常”字符加上字符¨。

我上面的脚本适用于预先组合的形式，但是当从 Mac 中的目录读取时，它们以分解的形式给出。如果您遇到同样的问题，这里的代码与上面相同（即使它可能更整洁）但与分解的变音符号兼容：

def remove_umlaut(name):
    """
    Removes umlauts from names and replaces them with the letter+e convention
    :param name: name to remove umlauts from
    :return: unumlauted name
    """
    u = b'u\xcc\x88'
    U = b'U\xcc\x88'
    a = b'a\xcc\x88'
    A = b'A\xcc\x88'
    o = b'o\xcc\x88'
    O = b'O\xcc\x88'
    ampersand = b'&'
    name = name.encode('utf8')
    name = name.replace(u, b'ue')
    name = name.replace(U, b'Ue')
    name = name.replace(a, b'ae')
    name = name.replace(A, b'Ae')
    name = name.replace(o, b'oe')
    name = name.replace(O, b'Oe')
    name = name.replace(ampersand, b'und')
    name = name.decode('utf8')
    return name

def renameInvalid(root):
    for f in os.listdir(root):
        old = f 
        f = remove_umlaut(f)
        if old != f:                              
            os.rename(old,f)                
            print("renamed " + old + " to " + f )
        if os.path.isdir(f):
            os.chdir(f)
            renameInvalid(".")
            os.chdir("..")

【讨论】：