【问题标题】:Awk to create directory then subdirectory with zip in itawk 创建目录,然后创建带有 zip 的子目录
【发布时间】:2019-03-07 11:37:55
【问题描述】:

下面的awk会在一个目录下创建子目录(总是file1的最后一行,每个块用空行隔开),如果第2行的数字(总是格式的前6位) file2 的 xx-xxxx) 在 file1 的 $2 中找到。这是当前的 awk 输出。

如果存在匹配项并且在目录中创建了子目录,则 file2 中对应的第 1 行 https 将始终是用于下载的 zip 文件的链接。我似乎无法在子文件夹中创建该链接,下载并解压缩 .zip。下载代码执行并下载 zip,但必须手动添加到终端。我为这篇长文道歉,想包括所有细节来解决这个问题

文件1

xxx_006 19-0000_xxx-yyy-aaa
xxx_007 19-0001_zzz-bbb-ccc
FolderName_001_001

yyyy_0287 19-0v02-xxx
yyyy_0289 19-0v31-xxxx
yyyy_0293 19-0v05-xxxx
FolderName_002_002

文件2

https://xx.yy.zz/path/to/file.zip
19-0v05-xxx_000_001
 cc112233
https://xx.yy.zz/path/to/download/file.zip
19-0v31-xxx-001-000
bb4456784
https://xx.yy.zz/path/to/file.zip
19-0v02-xxx_000_001
aaa331232

awk 编辑

cmd_fmt='mkdir -p "%s/%s"
# run the awk command
awk -v cmd_fmt="$cmd_fmt" '
# create an associative array (key/value pairs) based on the file1
NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } 

# retrieve the first 7-char of each line in file2 as the key to test 
 against the above hash
{ k = substr($0, 1, 7) }

# if find k, then print
k in a { print a[k] "\t" $0 "\t" l }
# save prev line to 'l' which is supposed to be the URL
{ l = $0  } 
' RS= file1 RS='\n' file2 | while IFS=$'\t' read -r base_dir sub_dir link; 
do
echo "download [$link] to '$base_dir/$sub_dir'"
# bash command lines to make sub-folders and download files
 create the format text used in sprintf() to run the desired shell commands
cd "%s/%s" && curl -O -v -k -X GET %s -H "Content-Type:application/x- www-form-urlencoded" -H "Authorization:xxxx" && { filename="%s"; unzip 
"${filename##*/}"; }'
done

所需的 awk 输出

FolderName_002_002 --- directory
    19-0v02-xxx_000_001  --- sub folder
    https://xx.yy.zz/path/to/file.zip  --- zip and extracted downloaded to sub-folder
    19-0v05-xxx_000_001  --- sub-folder
    https://xx.yy.zz/path/to/file.zip  --- zip and extracted downloaded to sub-folder
    19-0v31-xxx-001-000  --- sub-folder
    https://xx.yy.zz/path/to/file.zip  --- zip and extracted downloaded to sub-folder

【问题讨论】:

    标签: awk


    【解决方案1】:

    我相信您的问题与这个问题有关:Bash loop to make directory, if numerical id found in file

    您可以在一个 awk system() 函数中运行所有命令,只需将它们组织好,例如:

    # create the format text used in sprintf() to run the desired shell commands
    cmd_fmt='mkdir -p "%s/%s" && cd "%s/%s" && curl -O -v -k -X GET %s -H "Content- Type:application/x-www-form-urlencoded" -H "Authorization:xxx" && { filename="%s"; unzip "${filename##*/}" && rm -f "${filename##*/}"; }'
    
    # run the awk command
    awk -v cmd_fmt="$cmd_fmt" '
        # create an associative array (key/value pairs) based on the file1
        NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } 
    
        # retrieve the first 7-char of each line in file2 as the key to test against the above hash
        { k = substr($0, 1, 7) }
    
        # if find k, then run the system command    
        k in a { cmd = sprintf(cmd_fmt, a[k], $0, a[k], $0, l, l); print(cmd) }
    
        # save prev line to 'l' which is supposed to be the URL
        { l = $0  } 
    ' RS= file1 RS='\n' file2
    

    print 更改为system 以执行命令。

    注意:如果文件名包含 URL 编码字符,则上述 unziprm 命令可能不起作用。

    根据您的awk edit更新:

    您也可以从awk 行打印所需的信息,然后在 bash 中处理它们,无需在awk 中执行所有操作(也可以删除在awk edit 部分中定义cmd_fmt 的行):

    awk '
        # create an associative array (key/value pairs) based on the file1
        NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } 
    
        # retrieve the first 7-char of each line in file2 as the key to test against the above hash
        { k = substr($0, 1, 7) }
    
        # if find k, then print
        k in a { print a[k] "\t" $0 "\t" l }
    
        # save prev line to 'l' which is supposed to be the URL
        { l = $0  } 
    
    ' RS= file1 RS='\n' file2 | while IFS=$'\t' read -r base_dir sub_dir link; do
        echo "download [$link] to '$base_dir/$sub_dir'"
        # bash command lines to make sub-folders and download files
        mkdir -p "$base_dir/$sub_dir" 
        cd "$base_dir/$sub_dir"
    
        if curl -O -v -k -X GET "$link" -H "Content-Type:application/x-www-form-urlencoded" -H "Authorization:xxxx" >/dev/null 2>&1; then
            echo "  + processing $link"
            # remove query_string from the link, since it might contains '/'
            filename="${link%\?*}"
            # remove path from filename and run `unzip`
            unzip "${filename##*/}" 
        else
            echo "  + error downloading: $link"
        fi
    
        # return to the base directory if it's a relative path
        # if all are absolute paths, then just comment out the following line
        cd ../..
    done
    

    注意:我没有测试curl 行并且不知道不同链接的文件名可能是什么。 filename="${link##*/}" 是删除最后一个'/'之前的所有字符,这将留下文件名和潜在的查询字符串。 "${filename%\?*}" 是从filename 中删除尾随查询字符串。实际上,curl 命令下载的文件名可能会有所不同,您必须从头开始检查和调整。

    【讨论】:

    • 非常感谢:)。
    • zip 存档$filename 以相同的名称提取,所以我需要将rename $filenametmp 并解压缩tmp,但我似乎无法不制作命令错误。谢谢 :)。 && { 文件名="%s";解压 "${filename##*/}" && rm -f "${filename##*/}"; }'
    • @cm0728,不客气:)。不确定您的实际数据文件,但在 bash 下对实际文件进行后处理似乎要容易得多。 awk system() 函数可能不会调用 bash,因此 bash parameter expansion 将不起作用。
    • 我可以在终端上看到输出,但没有创建目录。我在#bash download 下添加了cd "%s/%s" &amp;&amp; curl。谢谢你:)。
    • 添加curl 无效。还有别的吗?谢谢你:)。