【发布时间】:2011-05-24 09:06:52
【问题描述】:
我有一个输出超过 500k 行的日志文件的程序(抱歉,这不是一个选项)。
我正在尝试根据行中的子字符串将日志文件中的行组合在一起(然后对这些组进行排序)
例如我有类似下面的行:
SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;
我想要分组的是TIM BETWEEN '*' AND '*',其中 * 在行之间匹配,例如:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
将在输出中这样分组:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
每个组也已根据整个字符串进行排序,因此“某些东西”相似的地方是彼此相邻的吗?
我一直在尝试将一个 shell 脚本放在一起以输出我想从日志文件中读取的内容,但没有任何成功!
编辑:我还需要提到“某事”可以是多个单词,例如:
SELECT blah1, blah2 or SELECT blah1, blah2, blah3
【问题讨论】:
标签: regex shell scripting string-matching