【问题标题】:Moving files after comparing filenames and recreating source directories在比较文件名和重新创建源目录后移动文件
【发布时间】:2018-12-05 21:47:20
【问题描述】:

我正在学习 shell 脚本,并努力在保持代码库可读性的同时尽可能地保持 POSIX 兼容。目标是从目录 A 读取文件列表,从目录 B 中找到它们的匹配项,并在目录 C 中重新创建目录父 B 的一部分,其中应该移动目录 A 中的文件,然后从中删除匹配/移动的文件目录 B,如果找到的目录 B 文件中的目录为空,则删除它们。目录 A 中的所有文件将始终彼此唯一,并且目录 B 中始终存在一个或多个匹配项,而目录 C 中永远不会匹配,但目录 C 中的子目录可能已经存在以匹配目录 B . 在将匹配项从目录 A 移动到目录 C 后,应删除目录 B 中匹配的所有文件。扩展名会随着文件的单独处理而改变,但文件名将完全匹配。文件名可能包含空格和句点。文件名并不总是相同的长度。输出目录和归档目录有两级子目录。

这是我到目前为止所得到的。我一直坚持编写 for 循环来做脏活。尽量不要超出 find、printf、awk、grep、for 和 if 的范围。

#!/bin/sh
execHome="intendedMachine"
baseDir="/home/library/projects"
folderNew="output"
folderOld="working"
folderArchive="archive"
workingTypes=("jpg", "svg", "bmp", "tiff", "psd")

$folderNew="$baseDir/$folderNew"
$folderOld="$baseDir/$folderOld"
folderArchive="$baseDir/$folderArchive"

if [ "$(uname -n)" = "$execHome" ]
then

  count=$(find $folderNew -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)

  printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"

  find $folderNew -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'

else
  printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
fi

例子:

目录 A

/home/library/projects/output/projectOne 1.a.png
/home/library/projects/output/projectOne 1.b.png
/home/library/projects/output/projectOne 1.c.png
/home/library/projects/output/projectThree 3.m.png
/home/library/projects/output/projectThree 3.o.png
/home/library/projects/output/projectFour 4.t.png
/home/library/projects/output/projectFour 4.u.png

目录 B

/home/library/projects/working/House/2018 01/projectOne 1.a.jpg
/home/library/projects/working/House/2018 01/projectOne 1.a.svg
/home/library/projects/working/House/2018 01/projectOne 1.b.jpg
/home/library/projects/working/House/2018 01/projectOne 1.b.svg
/home/library/projects/working/House/2018 01/projectOne 1.c.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.svg
/home/library/projects/working/House/2018 02/projectTwo 2.h.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.h.svg
/home/library/projects/working/House/2018 02/projectTwo 2.i.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.m.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.n.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.o.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.o.svg
/home/library/projects/working/Car/2018 04/projectFour 4.s.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.t.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.u.jpg

目录C

/home/library/projects/archive/House/2018 01/projectOne 1.d.png
/home/library/projects/archive/House/2018 01/projectOne 1.e.png
/home/library/projects/archive/House/2018 01/projectOne 1.f.png
/home/library/projects/archive/Car/2018 03/projectThree 3.p.png
/home/library/projects/archive/Car/2018 03/projectThree 3.q.png
/home/library/projects/archive/Car/2018 03/projectThree 3.r.png

期望的结果:

目录 A 文件已移至目录 C

/home/library/projects/output/

目录 B 应删除目录 A 文件并删除空文件夹

/home/library/projects/working/House/2018 02/projectTwo 2.g.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.svg
/home/library/projects/working/House/2018 02/projectTwo 2.h.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.h.svg
/home/library/projects/working/House/2018 02/projectTwo 2.i.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.n.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.s.jpg

目录 C 应包含旧存档和新输出文件作为存档

/home/library/projects/archive/House/2018 01/projectOne 1.a.png
/home/library/projects/archive/House/2018 01/projectOne 1.b.png
/home/library/projects/archive/House/2018 01/projectOne 1.c.png
/home/library/projects/archive/House/2018 01/projectOne 1.d.png
/home/library/projects/archive/House/2018 01/projectOne 1.e.png
/home/library/projects/archive/House/2018 01/projectOne 1.f.png
/home/library/projects/archive/Car/2018 03/projectThree 3.m.png
/home/library/projects/archive/Car/2018 03/projectThree 3.o.png
/home/library/projects/archive/Car/2018 03/projectThree 3.p.png
/home/library/projects/archive/Car/2018 03/projectThree 3.q.png
/home/library/projects/archive/Car/2018 03/projectThree 3.r.png
/home/library/projects/archive/Car/2018 04/projectFour 4.t.png
/home/library/projects/archive/Car/2018 04/projectFour 4.u.png

无论如何从 bash 4.4.19 机器上运行代码以查看它是如何工作的,但它并没有像我预期的那样工作。这是结果输出:

Found/processing 4 files in the /home/library/projects/output folder

./auto-archive.sh: line 34: hash["$proj"]: bad array subscript
parent of /home/library/projects/output/.temp/projectThree 3.m.png not found
parent of /home/library/projects/output/projectOne 1.a.png not found
parent of /home/library/projects/output/.temp/projectThree 3.0.png not found
parent of /home/library/projects/output/projectFour 4.t.png not found

我很抱歉。我之前也没有提到不应递归扫描目录 B,这在用例中会产生其他正在写入但可能尚未准备好移动的临时文件。此外,出于测试目的,只有上面列出的四个文件实际上在目录 A 中;并非最初列出的所有文件。此外,在重新创建建议的测试结构后,您的代码似乎可以完美执行;与我的实际文件结构的结果不匹配。我担心我在描述我的实际文件结构/命名约定时可能遗漏了一些关键元素。现在审查描述符差异。很抱歉耽误了时间,但您的准确性肯定给您留下了深刻的印象。感觉我们已经接近了,但肯定需要在早期版本的 bash 上运行。

【问题讨论】:

    标签: bash awk find printf posix


    【解决方案1】:

    任务将分为三个步骤:

    1. 创建一个映射,将每个文件名(项目名称)关联到其在 C 中的父目录名称。这是通过分析 B 中的路径名作为准备阶段执行的。我们将使用关联数组和 bash 版本必须是 4.2 或更高版本

    2. 要遍历 A 中的文件,使用第一步创建的映射组成要存储在 C 中的路径名,然后删除 B 中的文件。

    3. 作为清理阶段,我们删除 B 中的空目录(如果有)。

    那么怎么样:

    #!/bin/bash
    
    execHome="intendedMachine"
    baseDir="/home/library/projects"
    folderNew="output"
    folderOld="working"
    folderArchive="archive"
    workingTypes=("jpg" "svg" "bmp" "tiff" "psd")
    declare -A hash
    
    folderNew="$baseDir/$folderNew"
    folderOld="$baseDir/$folderOld"
    folderArchive="$baseDir/$folderArchive"
    
    if [ "$(uname -n)" != "$execHome" ]; then
        printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
        exit
    fi
    
    count=$(find "$folderNew" -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)
    printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"
    
    # determine parent directory name for each project name and create a map for them
    while IFS=  read -r -d $'\0' f; do 
        proj="${f##*/}"         # remove dirname
        proj="${proj%.*}"               # remove extention
        parent="${f##*$baseDir/}"       # remove pathname until $baseDir
        parent="${parent#*/}"   # strip pathname one-level deeper
        parent="${parent%/*}"   # remove filename
        # now we're mapping "projectOne 1.a" => "House/2018 01" e.g.
    #   echo "$proj" "=>" "$parent"     # just for debugging
        hash["$proj"]="$parent"
    done < <(find "$folderOld" -type f -print0) # directory B
    
    # iterate over files in A; move to archive directory C and remove files in B
    while IFS=  read -r -d $'\0' f; do
        proj="${f##*/}"
        proj="${proj%.*}"
        parent="${hash[$proj]}"
        if [[ "$parent" = "" ]]; then
        echo "parent of $f not found"   # may not occur but just in case ..
        else
        # move from A to C
        destdir="$folderArchive/$parent"
        mkdir -p -- "$destdir"
        mv -- "$f" "$destdir"
    
        # remove relevant file(s) in B
        for ext in "${workingTypes[@]}"; do
            oldfile="$folderOld/$parent/$proj.${ext}"
            [ -f "$oldfile" ] && rm -f -- "$oldfile"
        done
        fi
    done < <(find "$folderNew" -type f -print0) # directory A
    
    # clean-up: remove empty dirs in B
    find "$folderOld" -type d -empty -print0 | xargs -r -0 rmdir --
    

    说明:

    • 您不必使用逗号来分割数组中的元素。
    • 您不应将$ 放在左侧变量名之前。
    • while IFS= ... done &lt; &lt;(find ...) 语法是循环 find 输出的惯用语。
    • ${parameter#word} 语法类型是 parameter expansion,用于从路径中提取子字符串。
    • 关联数组hash 将每个项目名称(例如“projectOne 1.a”)映射到其父目录名称,例如“House/2018 01”。
    • 某些命令中的--s 用于准备可能以- 开头的文件名。 (这种保护可能看起来很病态......)

    如果您的 bash 版本早于 4.2,请告诉我。然后我们需要找到一个替代方案。

    编辑
    以下是 POSIX 兼容版本作为替代方案:
    (显然,如果文件名包含换行符或转义符\x1b,则脚本不起作用。)

    #!/bin/sh
    
    execHome="intendedMachine"
    baseDir="/home/library/projects"
    folderNew="output"
    folderOld="working"
    folderArchive="archive"
    workingTypes="jpg
    svg
    bmp
    tiff
    psd"
    
    folderNew="$baseDir/$folderNew"
    folderOld="$baseDir/$folderOld"
    folderArchive="$baseDir/$folderArchive"
    nl="
    "                   # set to newline character
    esc=$(/bin/echo -ne "\033")      # set to escape character
    #esc=":"            # if \033 does not work well, try another character
    
    # substitute of reading a hash
    # it relies on the context that IFS is set to $nl
    read_lut() {
        local i
        local key
        local val
        local ret=""
        for i in $lut; do
            key="${i%${esc}*}"
            val="${i#*${esc}}"
        if [ "$key" = "$1" ]; then
            # loop until the end and use the last value
            ret="$val"
        fi
        done
        echo "$ret"
    }
    
    # substitute of writing to a hash
    write_lut() {
        lut=$(printf "%s\n%s%c%s" "$lut" "$1" "$esc" "$2")
    }
    
    if [ "$(uname -n)" != "$execHome" ]; then
        printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
        exit
    fi
    
    count=$(find "$folderNew" -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)
    printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"
    
    # determine parent directory name for each project name and create a map for them
    ifs_bak="$IFS"
    IFS="$nl"
    for f in $(find "$folderOld" -type f); do
        proj="${f##*/}"         # remove dirname
        proj="${proj%.*}"               # remove extention
        parent="${f##*$baseDir/}"       # remove pathname until $baseDir
        parent="${parent#*/}"   # strip pathname one-level deeper
        parent="${parent%/*}"   # remove filename
        # now we're mapping "projectOne 1.a" => "House/2018 01" e.g.
    #   echo "$proj" "=>" "$parent"     # just for debugging
        write_lut "$proj" "$parent"
    done
    
    # iterate over files in A; move to archive directory C and remove files in B
    for f in $(find "$folderNew" -type f); do
        proj="${f##*/}"
        proj="${proj%.*}"
        parent=$(read_lut "$proj")
        if [ "$parent" = "" ]; then
            echo "parent of $f not found"   # may not occur but just in case ..
        else
            # move from A to C
            destdir="$folderArchive/$parent"
            mkdir -p -- "$destdir"
            mv -- "$f" "$destdir"
    
            # remove relevant file(s) in B
            for ext in $workingTypes; do
                oldfile="$folderOld/$parent/$proj.${ext}"
                [ -f "$oldfile" ] && rm -f -- "$oldfile"
            done
        fi
    done
    
    # clean-up: remove empty dirs in B
    find "$folderOld" -type d -empty -print0 | xargs -r -0 rmdir --
    
    # restore IFS
    IFS="$ifs_bak"
    

    【讨论】:

    • 感谢您及时周到的回复。实际上,这是在较旧的 bash 中执行的。它是 3.2.57 版本。这就是我指出 POSIX 合规性的原因。我曾假设这会导致一个不依赖于特定 bash 版本的解决方案。我非常感谢您的推荐,并期待向您学习和学习更多信息。
    • 对不起,我可能错过了您的要求。现在我用符合 POSIX 的版本更新了我的答案。 POSIX sh 缺少 bash 中的一些重要功能,如数组、进程替换等,我们需要找到替代方案。恕我直言,bash 可能不适合像这样的复杂任务,因为有很多限制和陷阱,更不用说 POSIX sh。
    • 再次感谢您。做得好。到目前为止,我发现了类似的结果。我看到发生了正确的映射(来自注释调试行),但仍然收到在下一个循环中找不到父级的消息。我将在今晚进行更彻底的调试。
    • 很抱歉给您带来不便。我已经在另一个平台上进行了测试,并重现了您面临的类似错误。虽然我还没有找到根本原因,但可以通过将 esc 变量(字段分隔符)分配给另一个变量(例如 ":")来修复错误。 (该字符不应出现在目录树的文件名/路径名中。)我将继续研究有什么区别。 BR。
    • 没有任何不便。你非常有帮助,随着我了解更多,我将研究工作和迭代作为参考材料。既然测试脚本按计划执行,它就进入生产文件了。祝我好运。我会报告结果。
    猜你喜欢
    • 1970-01-01
    • 2021-10-29
    • 2022-05-31
    • 1970-01-01
    • 1970-01-01
    • 2012-01-01
    • 2021-10-20
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多