【问题标题】:Parse name value pairs in csv line using shell script使用 shell 脚本解析 csv 行中的名称值对
【发布时间】:2013-01-06 07:56:00
【问题描述】:

我的输入看起来像以下 2 行:

TASK1,6,INITIAL,2013-01-15 19:20:40,PREPARING,2013-01-15 19:21:12,SCHEDULED,2013-01-15 19:21:13,TRANSLATING,2013-01-15 19:21:13,LOADING,2013-01-15 19:36:37,COMPLETE,2013-01-15 19:36:42
TASK2,5,INITIAL,2013-01-15 19:20:44,PREPARING,2013-01-15 19:21:13,SCHEDULED,2013-01-15 19:21:14,TRANSLATING,2013-01-15 19:36:37,TERMINAL,2013-01-15 20:28:10

我需要用这些行遍历一个文件,并为每一行计算几个时间差......我在计算等方面很好,但我有一段时间试图弄清楚如何解析这个“可变长度”的名称值对字符串......

基本上,Task# 之后的 # 是“状态”的计数,后跟这些状态及其发生时间。

我想做的是获取其中一条线,并最终得到类似这样的东西,并将值分配给它们各自的变量。 (以第一行为例):

$TASK_ID=TASK1
$STATUS_COUNT=6
$INITIAL=2013-01-15 19:20:40
$PREPARING=2013-01-15 19:21:12
$SCHEDULED=2013-01-15 19:21:12
$TRANSLATING=2013-01-15 19:21:13
$LOADING=2013-01-15 19:36:37
$COMPLETE=2013-01-15 19:36:42
$TERMINAL=<NULL>

使问题更加复杂的是,如果一个任务被多次提交,它将简单地将下一轮状态附加到第一组,这意味着我最终可能会得到如下输入行:

TASK1,11,INITIAL,2013-01-15 19:20:40,PREPARING,2013-01-15 19:21:12,SCHEDULED,2013-01-15 19:21:13,TRANSLATING,2013-01-15 19:21:13,LOADING,2013-01-15 19:36:37,COMPLETE,2013-01-15 19:36:42,INITIAL,2013-01-15 20:20:40,PREPARING,2013-01-15 20:21:12,SCHEDULED,2013-01-15 20:21:13,TRANSLATING,2013-01-15 20:21:13,TERMINAL,2013-01-15 20:36:42

在这种情况下,我希望我的输出是:

$TASK_ID=TASK1
$STATUS_COUNT=11
$INITIAL=2013-01-15 20:20:40
$PREPARING=2013-01-15 20:21:12
$SCHEDULED=2013-01-15 20:21:12
$TRANSLATING=2013-01-15 20:21:13
$LOADING=<NULL>
$COMPLETE=<NULL>
$TERMINAL=2013-01-15 20:36:42

我很困惑,有人可以帮忙吗?

提前致谢

【问题讨论】:

    标签: parsing shell csv sed awk


    【解决方案1】:
    #!/bin/bash
    
    # Splitting on commas, read the task ID and status count followed by all of the statuses,
    # which we'll parse later.
    while IFS=, read -r TASK_ID STATUS_COUNT STATUSES; do
    (
        # Subtly, but importantly, we put the loop body inside parentheses so each loop
        # iteration runs in a sub-shell. This ensures that the $LOADING, $COMPLETE, etc.
        # variables we set don't leak into future iterations.
    
        echo "TASK_ID      = $TASK_ID"
        echo "STATUS_COUNT = $STATUS_COUNT"
    
        # Convert the comma-separated string $STATUSES into an array using `read -a'.
        IFS=, read -ra STATUSES <<< "$STATUSES"
    
        # Assign the statuses to named variables. A side benefit of this is that only the
        # last value of each status type is used.
        for ((i = 0; i < ${#STATUSES[@]}; i += 2)); do
            declare "${STATUSES[$i]}=${STATUSES[$((i+1))]}"
        done
    
        # Print each of the statuses, or <NULL> if that stage wasn't listed.
        echo "INITIAL      = ${INITIAL:-<NULL>}"
        echo "PREPARING    = ${PREPARING:-<NULL>}"
        echo "SCHEDULED    = ${SCHEDULED:-<NULL>}"
        echo "TRANSLATING  = ${TRANSLATING:-<NULL>}"
        echo "LOADING      = ${LOADING:-<NULL>}"
        echo "COMPLETE     = ${COMPLETE:-<NULL>}"
        echo "TERMINAL     = ${TERMINAL:-<NULL>}"
    
        echo
    )
    done
    

    输出:

    $ ./tasks < tasks.txt
    TASK_ID      = TASK1
    STATUS_COUNT = 6
    INITIAL      = 2013-01-15 19:20:40
    PREPARING    = 2013-01-15 19:21:12
    SCHEDULED    = 2013-01-15 19:21:13
    TRANSLATING  = 2013-01-15 19:21:13
    LOADING      = 2013-01-15 19:36:37
    COMPLETE     = 2013-01-15 19:36:42
    TERMINAL     = <NULL>
    
    TASK_ID      = TASK2
    STATUS_COUNT = 5
    INITIAL      = 2013-01-15 19:20:44
    PREPARING    = 2013-01-15 19:21:13
    SCHEDULED    = 2013-01-15 19:21:14
    TRANSLATING  = 2013-01-15 19:36:37
    LOADING      = <NULL>
    COMPLETE     = <NULL>
    TERMINAL     = 2013-01-15 20:28:10
    
    TASK_ID      = TASK1
    STATUS_COUNT = 11
    INITIAL      = 2013-01-15 20:20:40
    PREPARING    = 2013-01-15 20:21:12
    SCHEDULED    = 2013-01-15 20:21:13
    TRANSLATING  = 2013-01-15 20:21:13
    LOADING      = 2013-01-15 19:36:37
    COMPLETE     = 2013-01-15 19:36:42
    TERMINAL     = 2013-01-15 20:36:42
    

    (格伦杰克曼根据新要求添加编辑)

    events=(INITIAL PREPARING SCHEDULED TRANSLATING LOADING COMPLETE TERMINAL)
    
    while IFS=, read -r TASK_ID STATUS_COUNT rest; do
        IFS=, read -ra STATUSES <<< "$rest"
    
        for (( i=0; i < ${#STATUSES[@]}; i+=2 )); do
            # if this this the initial event, reset all statuses
            if [[ ${STATUSES[i]} == ${events[0]} ]]; then
                for event in "${events[@]}"; do
                    declare "$event="
                done
            fi
            declare "${STATUSES[i]}=${STATUSES[i+1]}"
        done
        for var in TASK_ID STATUS_COUNT "${events[@]}"; do
            printf "$%s = %s\n" $var "${!var:-<NULL>}"
        done
    
    done
    

    【讨论】:

    • @glennjackman 我相信我最近从你那里了解到read -a,所以再次感谢你的宝石!
    • 这里唯一的问题是,在最后一种情况下(任务运行不止一次)......我希望 LOADING 和 COMPLETE 为 NULL,因为它们发生在第一次运行中(状态集) 但不是在第二组中... INITIAL 总是开始一个新的运行,如果这有助于打破运行。
    • 使用一些日期数学,我能够通过将一些语句添加到 NULL out 时间戳
    • @user2001998,我在 John 的回答中添加了一个更新来处理这个新要求。约翰,希望你没事。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-02-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多