你的任务并不难。 Bash 具有出色的日期操作实用程序。您需要做的是sort the original 列表,然后是排序文件的read each line,compare the date/time to the previous 日期时间并使用计数器,将重复时间增加counter * 4min 偏移量,然后write the new date/time to your output file. 有很多方法可以处理时间调整。最简单的方法是将日期/时间字符串转换为自纪元以来的秒数。然后只需将偏移量添加到重复时间并将其转换回所需的日期/时间格式。
以下示例显示了执行此操作的一种方法。有几种操作可以组合,但我将偏移量计算分开以使其更具可读性。该脚本将输入文件作为第一个参数(我将其默认设置为dat/env4.dat用于我的测试,请随意设置)。然后脚本排序到一个临时文件,读取临时文件,对重复项进行时间调整,然后将输出写入inputfile.out,在退出前删除临时文件。如果您有任何问题,请告诉我:
#!/bin/bash
ifn="${1:-dat/env4.dat}" # set input filename (ifn) and validate
[ -r "$ifn" ] || {
printf "\n Error: input file not readable. Usage: %s [<filename> (dat/env4.dat)]\n\n" "${0//*\//}" >&2
exit 1
}
## initialize variables
tfn="/tmp/${ifn//*\//}.tmp" # set temp filename (tfn)
ofn="${ifn}.out" # set output filename (ofn)
:> "$ofn" # truncate output file
pdate=0 # initialize prior date
cnt=0 # counter variable
tos=240 # time offset in seconds (4 min.)
tse=0 # time since epoch in seconds
sort "$ifn" > "$tfn" # sort input file into temp file & validate
[ -r "$tfn" ] || {
printf "\n Error: sort failed to produce a tmp file or tmp file not readable\n\n" >&2
exit 1
}
## read temp file into index/idate and add 4 min to each successive duplicate
while read -r index idate || [ -n "$idate" ]; do
if [ "$pdate" = "$idate" ]; then
tse=$(date -d "$idate" +%s) # get time since epoch for idate
cnt=$((cnt+1)) # increase counter
nos=$((cnt*tos)) # set new time offset (not Nitrous Oxide)
ntm=$((tse+nos)) # set new time including offset
# write new time to output
printf "%s\t%s\n" "$index" "$(date -d "@${ntm}" +"%F %T" )" >> "$ofn"
else
cnt=0; nos=0 # reset counter and new time offset
# write output unchanged
printf "%s\t%s\n" "$index" "$idate" >> "$ofn"
fi
pdate="$idate" # save current date/time as prior date/time
done <"$tfn"
[ -r "$tfn" ] && rm "$tfn" # remove temp file
输入文件:
$ cat dat/env4.dat
1414743351 2014-11-01 09:00:00
1414743351 2014-10-31 09:15:51
1414743351 2014-10-30 23:00:00
1414743351 2014-10-31 09:15:51
1414743351 2014-10-30 23:00:00
1414743351 2014-10-31 10:25:00
1414743351 2014-10-31 09:15:51
1414743351 2014-11-01 10:25:00
输出文件:
$ cat dat/env4.dat.out
1414743351 2014-10-30 23:00:00
1414743351 2014-10-30 23:04:00
1414743351 2014-10-31 09:15:51
1414743351 2014-10-31 09:19:51
1414743351 2014-10-31 09:23:51
1414743351 2014-10-31 10:25:00
1414743351 2014-11-01 09:00:00
1414743351 2014-11-01 10:25:00
注意:如果您想翻转重复项,以便首先出现较大的偏移时间,您应该可以对输出文件进行操作。在offset while loop 中执行此操作会使该问题的逻辑过于复杂。如果您想在offset while loop 中包含附加代码,基本方法是将之前的日期和任何匹配的日期存储在一个数组中,然后偏移数组日期/时间值并以相反的顺序写出它们。每次遇到新的日期/时间时取消设置数组。
包括电子邮件和调整字段的附录
如果您有兴趣在输出中添加一个e-mail,然后在date portion 和new date field 的time portion 之间添加一个time adjustment,您可以相对地这样做只需在开头添加电子邮件,然后将date 返回的新字符串拆分为date part 和time part,并在输出中的两者之间插入00:0n:00,即可轻松实现。无论您使用printf 还是echo 都没有区别。 printf 更灵活,但有时echo 也有优势。
注意:在下面的代码中,我形成了00:0n:000(n 是4, 8, etc..,假设只有 2 个重复项。如果有 3 个或更多,你将不得不处理它如果调整后的时间大于8 minutes,则调整逻辑以形成00:nn:00(例如12, 16, 20, ...代表3rd, 4th, 5th, ...重复)。
如果您还有其他问题,请告诉我。
## beginning part of script unchanged
# tse=0 # time since epoch in seconds
email="mi@email.com" # email to output
adjtm=4 # simple value to provide adjustment in 00:04:00, etc.
sort "$ifn" > "$tfn" # sort input file into temp file & validate
[ -r "$tfn" ] || {
printf "\n Error: sort failed to produce a tmp file or tmp file not readable\n\n" >&2
exit 1
}
## read temp file into index/idate and add 4 min to each successive duplicate
while read -r index idate || [ -n "$idate" ]; do
if [ "$pdate" = "$idate" ]; then
tse=$(date -d "$idate" +%s) # get time since epoch for idate
cnt=$((cnt+1)) # increase counter
adj=$((cnt*adjtm)) # compute 4, 8, ... for 00:0n:00 output
nos=$((cnt*tos)) # set new time offset (not Nitrous Oxide)
ntm=$((tse+nos)) # set new time including offset
ndt="$(date -d "@${ntm}" +"%F %T" )" # new date/time value
nd1=${ndt% *} # date portion (first field) of ntd
nd2=${ndt#* } # time portion (second filed) of ntd
ncmb="$nd1 00:0${adj}:00 $nd2" # new combined "date 00:0n:00 time" string
# write new time to output
printf "%s\t%s\t%s\n" "$email" "$index" "$ncmb" >> "$ofn"
else
cnt=0; nos=0 # reset counter and new time offset
nd1=${idate% *} # date portion (first field) of idate
nd2=${idate#* } # time portion (second filed) of idate
ncmb="$nd1 00:00:00 $nd2" # new combined "date 00:00:00 time" string (no adj)
# write output unchanged
printf "%s\t%s\t%s\n" "$email" "$index" "$ncmb" >> "$ofn"
fi
pdate="$idate" # save current date as prior date
done <"$tfn"
[ -r "$tfn" ] && rm "$tfn" # remove temp file
输出文件:(输入相同)
$ bash env4-2.sh
mi@email.com 1414743351 2014-10-30 00:00:00 23:00:00
mi@email.com 1414743351 2014-10-30 00:04:00 23:04:00
mi@email.com 1414743351 2014-10-31 00:00:00 09:15:51
mi@email.com 1414743351 2014-10-31 00:04:00 09:19:51
mi@email.com 1414743351 2014-10-31 00:08:00 09:23:51
mi@email.com 1414743351 2014-10-31 00:00:00 10:25:00
mi@email.com 1414743351 2014-11-01 00:00:00 09:00:00
mi@email.com 1414743351 2014-11-01 00:00:00 10:25:00