【问题标题】:formatting the output with sed command使用 sed 命令格式化输出
【发布时间】:2024-01-17 23:47:01
【问题描述】:

我需要你的帮助... 我得到了这样的文字:

2016.04.10 19:24:00,044 +0300 basdahsdjashd asjd ashdjkl [{"socialSecurityNumber":"68888410106514","socialSecurityNumberCountryCode":"EE"}]
2016.04.07 14:29:09,126 +0300 jsjdgdbcgf jjsgftr kksgcxdw2 [{"socialSecurityNumber":"00299288282224","socialSecurityNumberCountryCode":"EE"}]
2016.04.05 22:01:32,005 +0300 jafhaljdhf afs ljhsdhfl adf tng-customer-id=9303801442
2016.04.05 20:44:51,003 +0300 pppcndhfgus23 ofkgjg jdghhfye uksd tng-customer-id=2875223046

我需要的输出是(第一列和第二列以及 socialSecurityNumber 或 tng-customer-id):

2016.04.10 19:24:00,044 "socialSecurityNumber":"68888410106514"
2016.04.07 14:29:09,126 "socialSecurityNumber":"00299288282224"
2016.04.05 22:01:32,005 tng-customer-id=9303801442
2016.04.05 20:44:51,003 tng-customer-id=2875223046

所以问题是……可以用 sed 命令解决这个问题吗?我需要这里的 OR 选项。

如果我尝试单独做,首先,找到社会安全号码,我得到这个:

wsslogfetcher ~/temp/log_parser$ sed 's/\([^+]*\).*\("socialSecurityNumber"[^,]*\).*/\1 \2/' testfile.txt
2016.04.10 19:24:00,044  "socialSecurityNumber":"68888410106514"
2016.04.07 14:29:09,126  "socialSecurityNumber":"00299288282224"
2016.04.05 22:01:32,005 +0300 jafhaljdhf afs ljhsdhfl adf tng-customer-id=9303801442
2016.04.05 20:44:51,003 +0300 pppcndhfgus23 ofkgjg jdghhfye uksd tng-customer-id=2875223046

其次,找到 tng-customer-id,我明白了:

wsslogfetcher ~/temp/log_parser$ sed 's/\([^+]*\).*\(tng-customer-id[^ ]*\).*/\1 \2/' testfile.txt
2016.04.10 19:24:00,044 +0300 basdahsdjashd asjd ashdjkl [{"socialSecurityNumber":"68888410106514","socialSecurityNumberCountryCode":"EE"}]
2016.04.07 14:29:09,126 +0300 jsjdgdbcgf jjsgftr kksgcxdw2 [{"socialSecurityNumber":"00299288282224","socialSecurityNumberCountryCode":"EE"}]
2016.04.05 22:01:32,005  tng-customer-id=9303801442
2016.04.05 20:44:51,003  tng-customer-id=2875223046

因此,如果您可以看到,在第一个示例中,当在最后两行中找不到 socialSecurityNumber 时,它只会将它们打印出来。在第二个例子中同样的情况......

当我尝试使用 OR 运算符完成我的 sed 命令时,我得到这个输出,这是完全错误的:

wsslogfetcher ~/temp/log_parser$ sed 's/\([^+]*\).*\(\("socialSecurityNumber"[^,]*\).*\|\(tng-customer-id=[^ ]*\).*\)/\1 \2/' testfile.txt
2016.04.10 19:24:00,044  "socialSecurityNumber":"68888410106514","socialSecurityNumberCountryCode":"EE"}]
2016.04.07 14:29:09,126  "socialSecurityNumber":"00299288282224","socialSecurityNumberCountryCode":"EE"}]
2016.04.05 22:01:32,005  tng-customer-id=9303801442
2016.04.05 20:44:51,003  tng-customer-id=2875223046

那么……我做错了什么?

【问题讨论】:

    标签: linux bash sed


    【解决方案1】:

    使用这个sed

    sed 's/^\([^ ]*\) \([^ ]*\).*\("socialSecurityNumber":"[^"]*"\|tng-customer-id=[^ ]*\).*$/\1 \2 \3/g' file
    

    测试:

    $ sed 's/^\([^ ]*\) \([^ ]*\).*\("socialSecurityNumber":"[^"]*"\|tng-customer-id=[^ ]*\).*$/\1 \2 \3/g' a
    2016.04.10 19:24:00,044 "socialSecurityNumber":"68888410106514"
    2016.04.07 14:29:09,126 "socialSecurityNumber":"00299288282224"
    2016.04.05 22:01:32,005 tng-customer-id=9303801442
    2016.04.05 20:44:51,003 tng-customer-id=2875223046
    

    根据您的命令:

    sed 's/\([^+]*\).*\(\("socialSecurityNumber"[^,]*\)\|\(tng-customer-id=[^ ]*\)\).*/\1 \2/'
    

    我在每个分组中删除了.*,这些分组由外部单个组分组。这样,不匹配的字符串就不会被分组。

    【讨论】:

    • 哇,非常感谢!我刚刚修改了你的解决方案并得到:wsslogfetcher ~/temp/log_parser$ sed 's/\([^+]*\).*\("socialSecurityNumber":"[^,]*\|tng-customer-id=[^ ]*\).*/\1 \2/' testfile.txt 所以......它也可以。