【问题标题】:Insert line below text range with sed使用 sed 在文本范围下方插入行
【发布时间】:2016-06-13 06:18:33
【问题描述】:

我有一段文本,其中的某些部分用四空格缩进清楚地划定:

PERCHANCE he for whom this bell tolls may be so ill, as that he knows not it
tolls for him; and perchance I may think myself so much better than I am, as
that they who are about me, and see my state, may have caused it to toll for me,
and I know not that. 

    The church is Catholic, universal, so are all her actions; all that she does
    belongs to all. When she baptizes a child, that action concerns me; for that
    child is thereby connected to that body which is my head too, and ingrafted into
    that body whereof I am a member.

And when she buries a man, that action concerns me: all mankind is of one
author, and is one volume; when one man dies, one chapter is not torn out of the
book, but translated into a better language; and every chapter must be so
translated; God employs several translators; some pieces are translated by age,
some by sickness, some by war, some by justice; but God's hand is in every
translation, and his hand shall bind up all our scattered leaves again for that
library where every book shall lie open to one another.

    As therefore the bell that rings to a sermon calls not upon the preacher only,
    but upon the congregation to come, so this bell calls us all; but how much more
    me, who am brought so near the door by this sickness.

There was a contention as far as a suit (in which both piety and dignity,
religion and estimation, were mingled), which of the religious orders should
ring to prayers first in the morning; and it was determined, that they should
ring first that rose earliest.

我希望每个缩进块的前面紧跟START QUOTE,紧跟在END QUOTE 后面。我一直在玩 sed 十五分钟,但仍然不能完全正确。到目前为止,这是我的最大努力:

#!/usr/bin/sed -Ef
/^$/ {
N
    /\n    / {
    P
    s/^\n//
    i\
    START QUOTE
    }
}

/^    / {
N
    /\n$/ {
    s/\n$/&END QUOTE/
    G
    }
}

运行./parse.sed <script.txt,我得到以下输出:

PERCHANCE he for whom this bell tolls may be so ill, as that he knows not it
tolls for him; and perchance I may think myself so much better than I am, as
that they who are about me, and see my state, may have caused it to toll for me,
and I know not that. 

START QUOTE
    The church is Catholic, universal, so are all her actions; all that she does
    belongs to all. When she baptizes a child, that action concerns me; for that
    child is thereby connected to that body which is my head too, and ingrafted into
    that body whereof I am a member.

And when she buries a man, that action concerns me: all mankind is of one
author, and is one volume; when one man dies, one chapter is not torn out of the
book, but translated into a better language; and every chapter must be so
translated; God employs several translators; some pieces are translated by age,
some by sickness, some by war, some by justice; but God's hand is in every
translation, and his hand shall bind up all our scattered leaves again for that
library where every book shall lie open to one another.

START QUOTE
    As therefore the bell that rings to a sermon calls not upon the preacher only,
    but upon the congregation to come, so this bell calls us all; but how much more
    me, who am brought so near the door by this sickness.
END QUOTE

There was a contention as far as a suit (in which both piety and dignity,
religion and estimation, were mingled), which of the religious orders should
ring to prayers first in the morning; and it was determined, that they should
ring first that rose earliest.

请注意第一个引用块上缺少的END QUOTE。我认为这里发生的事情是脚本中的第二个命令:

/^    / {
N
    /\n$/ {
    s/\n$/&END QUOTE/
    G
    }
}

只有在当前行是引用块的最后一行时才能正确找到块末尾的边界。但有时,它会偏离 1,并且边界会被两个单独的 N 命令摄取,因此无法识别。关于使用sed 的正确方法是什么?

【问题讨论】:

    标签: macos sed


    【解决方案1】:

    使用 sed

    在寻找引用的结尾时,原始脚本成对读取。因此,只有在引用包含奇数行时才能找到引用的结尾。解决方案是一次阅读整个报价,然后在其末尾添加END QUOTE

    #!/usr/bin/sed -Ef
    /^$/ {
    N
        /\n    / {
        P
        s/^\n//
        i\
        START QUOTE
        }
    }
    
    /^    / {
        :a;N;/\n$/!ba
        s/$/END QUOTE\n/
    }
    

    这里的关键变化是:a;N;/\n$/!ba,它会读取行,直到找到一个空行。

    [以上是在 GNU sed 下测试的。 BSD (OSX) sed 通常略有不同。]

    使用 awk

    sed 可以做任何事情,但逻辑复杂的事情通常使用awk 更容易做。对于您的问题,请尝试:

    awk '/^    / && q{print;next} q{print "END QUOTE"; q=0} /^    /{print "START QUOTE"; q=1} 1' file
    

    根据您的输入,例如:

    $ awk '/^    / && q{print;next} q{print "END QUOTE"; q=0} /^    /{print "START QUOTE"; q=1} 1' file
    PERCHANCE he for whom this bell tolls may be so ill, as that he knows not it
    tolls for him; and perchance I may think myself so much better than I am, as
    that they who are about me, and see my state, may have caused it to toll for me,
    and I know not that. 
    
    START QUOTE
        The church is Catholic, universal, so are all her actions; all that she does
        belongs to all. When she baptizes a child, that action concerns me; for that
        child is thereby connected to that body which is my head too, and ingrafted into
        that body whereof I am a member.
    END QUOTE
    
    And when she buries a man, that action concerns me: all mankind is of one
    author, and is one volume; when one man dies, one chapter is not torn out of the
    book, but translated into a better language; and every chapter must be so
    translated; God employs several translators; some pieces are translated by age,
    some by sickness, some by war, some by justice; but God's hand is in every
    translation, and his hand shall bind up all our scattered leaves again for that
    library where every book shall lie open to one another.
    
    START QUOTE
        As therefore the bell that rings to a sermon calls not upon the preacher only,
        but upon the congregation to come, so this bell calls us all; but how much more
        me, who am brought so near the door by this sickness.
    END QUOTE
    
    There was a contention as far as a suit (in which both piety and dignity,
    religion and estimation, were mingled), which of the religious orders should
    ring to prayers first in the morning; and it was determined, that they should
    ring first that rose earliest.
    

    工作原理

    此脚本使用单个变量 q,当我们在引号中时为 1,否则为零。

    • /^ / && q{print;next}

      如果q 为真并且该行以4 个空格开头,则打印该行,跳过其余命令并跳转到next 行。

    • q{print "END QUOTE"; q=0}

      如果我们在q 为真时到达这里,那么这一行不会以 4 个空格开头。这意味着报价刚刚结束,我们打印 END QUOTE 并将 q 重置为 false (0)。

    • /^ /{print "START QUOTE"; q=1}

      如果我们到达这里的行以 4 个空格开头,那么引号刚刚开始。我们打印 START QUOTE 并将 q 设置为 true (1)。

    • 1

      这是 awk 用于打印行的神秘简写。

    【讨论】:

    • 整洁。我有意从学习 sed 开始,我计划接下来学习 awk。这看起来确实很简单(无论如何,对于一些简单的定义)。也就是说,关于如何在 sed 中正确执行此操作的任何想法?您似乎在暗示我可能想要分支,这当然 sed 也支持。
    • @RileyAvron 我在 sed 解决方案中添加了(是的,它涉及循环)。 sed 和 awk 之间的主要区别在于 awk 支持变量,而算术和 sed 不支持。此外,awk 命令更易于阅读,因为 awk 支持 if-then-else 语句和 for 循环。 sed 和 awk 都是各自领域的优秀工具。只是复杂的逻辑在 awk 中更简单。
    • @RileyAvron - 以下是您需要了解的有关 sed 的信息,以便在 99% 的情况下有效地使用它:s/old_regexp/new_string/。现在阅读 Arnold Robbins 的《Effective Awk Programming, 4th Edition》一书。不要浪费时间去学习一堆在 1970 年代中期 awk 被发明时已经过时的 sed 结构。
    • 谢谢@EdMorton!我一直在计划阅读 O'Reilly 的 sed & awk 书。你也推荐吗?还是我应该避开它以支持您的建议?
    • 那本书非常陈旧过时,缺少许多有用的现代 awk 功能。你不需要一本书来学习你应该使用 sed 的东西(s、g 和 p 和 -n),我提到的 awk 书是学习 awk 的最佳/最新的。
    【解决方案2】:

    试试这个:

    #!/usr/bin/sed -f
    /^    / {
        H
        d
      }
    /^$/ {
      x
      s/^\n    /START QUOTE&/
      /    /s/$/\nEND QUOTE\n/
    }
    

    以四个空格开头的行被添加以保存空间并从模式空间中删除。

    当找到下一个空行/^$/时,x交换保持空间和模式空间的内容。然后我们将START BLOCKEND BLOCK 添加到块的开头和结尾。

    【讨论】:

      【解决方案3】:

      sed 用于在单独的行上进行简单的替换,仅此而已。对于其他任何你应该使用 awk 的东西:

      $ cat tst.awk
      !inBlock && /^    / { print "START QUOTE"; inBlock=1 }
      inBlock && !/^    / { print "END QUOTE"; inBlock=0 }
      { print }
      

      .

      $ awk -f tst.awk file
      PERCHANCE he for whom this bell tolls may be so ill, as that he knows not it
      tolls for him; and perchance I may think myself so much better than I am, as
      that they who are about me, and see my state, may have caused it to toll for me,
      and I know not that.
      
      START QUOTE
          The church is Catholic, universal, so are all her actions; all that she does
          belongs to all. When she baptizes a child, that action concerns me; for that
          child is thereby connected to that body which is my head too, and ingrafted into
          that body whereof I am a member.
      END QUOTE
      
      And when she buries a man, that action concerns me: all mankind is of one
      author, and is one volume; when one man dies, one chapter is not torn out of the
      book, but translated into a better language; and every chapter must be so
      translated; God employs several translators; some pieces are translated by age,
      some by sickness, some by war, some by justice; but God's hand is in every
      translation, and his hand shall bind up all our scattered leaves again for that
      library where every book shall lie open to one another.
      
      START QUOTE
          As therefore the bell that rings to a sermon calls not upon the preacher only,
          but upon the congregation to come, so this bell calls us all; but how much more
          me, who am brought so near the door by this sickness.
      END QUOTE
      
      There was a contention as far as a suit (in which both piety and dignity,
      religion and estimation, were mingled), which of the religious orders should
      ring to prayers first in the morning; and it was determined, that they should
      ring first that rose earliest.
      

      【讨论】:

        【解决方案4】:

        这可能对你有用(GNU sed):

        sed -r 'N;/^\n\s{4}\S/s//\nSTART QUOTE&/;/^\s{4}\S.*\n$/s//&END QUOTE\n/;t;P;D' file
        

        在一对行的运行窗口中处理文件 (N ...P;D)。当所需的对匹配时,预先/附加所需的文字,然后退出(请参阅t),然后继续使用下一行。

        另一种方法:

        sed '/^    /{s/^/START QUOTE\n/;:a;n;/^    /ba;s/^/END QUOTE\n/}'  file
        

        【讨论】:

          猜你喜欢
          • 2015-01-05
          • 2011-05-04
          • 2021-12-13
          • 2022-10-14
          • 1970-01-01
          • 2012-07-26
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多