【问题标题】:Shell: script to group strings by substringShell:按子字符串对字符串进行分组的脚本
【发布时间】:2011-05-24 09:06:52
【问题描述】:

我有一个输出超过 500k 行的日志文件的程序(抱歉,这不是一个选项)。

我正在尝试根据行中的子字符串将日志文件中的行组合在一起(然后对这些组进行排序)

例如我有类似下面的行:

SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;

我想要分组的是TIM BETWEEN '*' AND '*',其中 * 在行之间匹配,例如:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

将在输出中这样分组:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

每个组也已根据整个字符串进行排序,因此“某些东西”相似的地方是彼此相邻的吗?

我一直在尝试将一个 shell 脚本放在一起以输出我想从日志文件中读取的内容,但没有任何成功!

编辑:我还需要提到“某事”可以是多个单词,例如:

SELECT blah1, blah2 or SELECT blah1, blah2, blah3

【问题讨论】:

    标签: regex shell scripting string-matching


    【解决方案1】:

    您应该可以使用 sort

    sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile
    

    其中 +1 -2 给出“某事”列,+4 -5 给出第一个日期列,+6 -7 给出最后一个日期列。

    (PS!未测试)

    【讨论】:

    • 感谢 Kristofer 的回答,但我不能依赖列数和 TIM BETWEEN '' AND '' 块之间的位置相同行,我已经编辑了原始问题以反映这一点
    • 您可以将“分隔符”设置为空格以外的其他内容来定义列的结尾。通过这样做,您可能可以进行多步排序,在其中更改每个排序之间的分隔符(如果可以使用单词作为分隔符)。 -t 改变分隔符。
    【解决方案2】:

    您必须预先过滤您的数据并将其转换为您可以使用sort 的内容。

    awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \
    | sort -t"|" +1 -2 +2 -3 \
    | sed 's/|/BETWEEN/;s/|/AND/'
    

    输出

    SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
    SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
    SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
    SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
    

    我希望这会有所帮助。

    【讨论】:

      猜你喜欢
      • 2012-10-14
      • 2014-01-05
      • 1970-01-01
      • 1970-01-01
      • 2022-01-23
      • 2012-05-08
      • 2016-03-16
      • 2012-12-21
      • 2012-07-15
      相关资源
      最近更新 更多