【问题标题】:How to search/replace a bunch of text files in unix (osx)如何在 unix (osx) 中搜索/替换一堆文本文件
【发布时间】:2011-09-23 02:45:35
【问题描述】:

我有一个在http://regexpal.com/ 上成功测试过的正则表达式:

^(\".+?\"),\d.+?,"X",-99,-99,-99,-99,-99,-99,-99,(\d*),(\d*)

我的测试数据如下所示:

"AB101AA",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X"
"AB101AF",10,"X",-99,-99,-99,-99,-99,-99,-99,394181,806429,179,"S00","SN9","00","QA","MH","X"
"AB101AG",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X"
"AB101AH",10,"X",-99,-99,-99,-99,-99,-99,-99,394371,806359,179,"S00","SN9","00","QA","MH","X"
"AB101AJ",10,"X",-99,-99,-99,-99,-99,-99,-99,394171,806398,179,"S00","SN9","00","QA","MH","X"
"AB101AL",10,"X",-99,-99,-99,-99,-99,-99,-99,394331,806530,179,"S00","SN9","00","QA","MH","X"

我想在每一行上用\1,\2,\3 替换它,例如第 1 行会给出

"AB101AA",394251,806376

如何对我在 osx 文件夹中的所有 csv 文件运行此正则表达式搜索和替换?我尝试使用 sed 但抱怨语法错误(另外我不确定它会支持这个正则表达式吗?)。另外,^(行首)和 $(行尾)锚点会逐行工作,还是会匹配文件的开头和结尾?

更新:使用 cut、awk 等的一些很好的响应可以从 csv 中获取特定字段,但我最近了解到我需要从该列表中获取数字并将它们分成 2 个子值,所以我的示例输出来自上面需要看起来像:

"AB101AA",3,94251,8,06376

据我所知,我需要为此使用正则表达式。

【问题讨论】:

  • 对于第二个问题,答案是在 Javascript 中,您必须激活 ^ 和 $ 的含义,以便在 LINES 的开头和结尾匹配,这不是自动的:regular-expressions.info/anchors.html

标签: regex macos unix sed replace


【解决方案1】:
for file in *csv; do
    cp $file "${file}.bak && \
    awk -F "," 'BEGIN OFS=","} {print $1,$11,$12}' ${file}.bak > ${file}
done

或者

sed -i.bak 's/^\("[^"]\+"\),\d\+,"X",-99,-99,-99,-99,-99,-99,-99,\([0-9]\+\),\([0-9]\+\)/\1,\2,\3/' FILE(S)

例如:

$ sed 's/^\("[^"]\+"\),[0-9]\+,"X",-99,-99,-99,-99,-99,-99,-99,\([0-9]\+\),\([0-9]\+\).*/\1,\2,\3/' <<EOF                                                   
"AB101AA",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X"       
"AB101AF",10,"X",-99,-99,-99,-99,-99,-99,-99,394181,806429,179,"S00","SN9","00","QA","MH","X"
"AB101AG",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X"
"AB101AH",10,"X",-99,-99,-99,-99,-99,-99,-99,394371,806359,179,"S00","SN9","00","QA","MH","X"
"AB101AJ",10,"X",-99,-99,-99,-99,-99,-99,-99,394171,806398,179,"S00","SN9","00","QA","MH","X"
"AB101AL",10,"X",-99,-99,-99,-99,-99,-99,-99,394331,806530,179,"S00","SN9","00","QA","MH","X"
EOF   
"AB101AA",394251,806376
"AB101AF",394181,806429
"AB101AG",394251,806376
"AB101AH",394371,806359
"AB101AJ",394171,806398
"AB101AL",394331,806530
$   

HTH

【讨论】:

  • 谢谢 - 我没有将我的 sed 命令用引号引起来,这是我最初的错误。使用命令 sed -i.bak 's/^(\".+?\"),\d.+?,"X",-99,-99,-99,-99,-99,- 再次尝试99,-99,(\d*),(\d*)/\1,\2,\3/' dh.csv - 给出 sed: 1: "s/^(\".+?\"), \d.+?,"X", ...": \1 未在 RE 中定义
  • 您没有使用我的正则表达式,以:'s/^\("[^ 开头,并且您使用过(至少在您的评论中):s/^(\".+?`. In sed` 正则表达式,您必须转义 ( 对以进行反向引用工作。
  • 啊,你是对的,我没有;)我试过了,但输出是一样的——我假设这是我的正则表达式的错误,虽然我不知道为什么.. 你发现我发布的测试数据对你有用吗?
  • 我也犯了一个错误(忘记了 sed 不能处理 \d 广告 [0-9]),但更新了解决方案。所以现在它对我有用。
  • 啊!这对我不起作用!完全运行您运行的程序会给我与输入相同的输出。我在使用 sed 的 BSD 版本的 osx 上 - 看起来它的工作方式有一些不同..
【解决方案2】:

您要提取字段 1、11 和 12?对于这样的任务,awkcut 真的很出色!例如。

awk -F, '{print $1, $11, $12}' input

使用cut:

cut -d, -f1,11,12 input 

使用perl-a 开启自动拆分模式——perl 会自动将空格上的输入行拆分到 @F 数组中。 -F 与 -a 结合使用,用于选择分割行的分隔符。

perl -F, -lane 'printf "%s, %d, %d\n", $F[0], $F[10], $F[11]' input 

...最后,一个纯 bash 解决方案

#!/bin/bash
IFS=,
while read -ra ARRAY;
do
    echo ${ARRAY[0]}, ${ARRAY[10]}, ${ARRAY[11]}
done < input

【讨论】:

  • 哇,谢谢,我不知道你可以用 awk 和 cut 来做这些事情。我刚刚发现我需要删除我的数字字段,所以我认为我需要坚持使用正则表达式,但这是很好的信息。
【解决方案3】:
cd folder
for file in $(find . -type f -name '*.csv')
do
    echo $file
    awk -F"," '{printf("%s,%s,%s\n", $1, $11, $12)}' $file > /tmp/${file}.$$
    #awk -F"," '/^(\".+?\"),[0-9]+?,"X",-99,-99,-99,-99,-99,-99,-99,([0-9]+),([0-9]+)/ {printf("%s,%s,%s\n", $1, $11, $12)}' $file > /tmp/${file}.$$
    #mv /tmp/${file}.$$ ${file}
done

注释第一个 awk 并取消注释第二个 awk,如果您需要常规 exp。测试后取消注释最后一个 mv。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-06-13
    • 1970-01-01
    • 1970-01-01
    • 2016-02-22
    • 1970-01-01
    相关资源
    最近更新 更多