我可以使用 grep 和 regex 从文件中查找某些内容并将内容写入新文件吗？答案

【问题标题】：Can I use grep and regex to find certain content from an file and write the content into a new file?我可以使用 grep 和 regex 从文件中查找某些内容并将内容写入新文件吗？
【发布时间】：2016-02-09 18:42:36
【问题描述】：

我想使用正则表达式从 html 中提取一些内容并将该内容写入新的 html。示例 HTML 如下：

<html>
<script src='.....'>
</script>
<style>
...
</style>
<div class='header-outer'>
<div class='header-title'>
<div class='post-content'>
<noscript>
<p>content we want</p>
</noscript>
</div>
</div></div>
<div class='footer'>
</div>
</html>

我可以使用 grep 在<div class='post-content'>和</div> 之间选择内容并将内容写入新的 html 吗？所以新的 html 看起来像这样：

<div class='post-content'>
<noscript>
<p>content we want</p>
</noscript>
</div>

我对堆栈溢出做了一些研究，发现了一些可能对我的问题有帮助的代码，比如

grep -L -Z -r "<div class='post-content'>.*?<\/noscript><\/dive>" .| xargs -0 -I{} mv {} DIR?

正确吗？如果是，xargs 部分是什么意思？谢谢您，期待您的回复！

【问题讨论】：

使用 GNU grep：grep -Poz "(?s)<div class='post-content'>.*</div>" file.xml > new.html
嗨赛勒斯，我试过你的，但不知何故对我不起作用。不过谢谢！

标签： html grep xargs

【解决方案1】：

你可以使用这个 GNU sed

sed -n "/<div class='post-content'>/,/<\/div>/p" file.html > output.html

-n 不打印
p 打印范围内的那些行

【讨论】：

如果您决定附加到一个输出文件中，命令可以是 sed -n "/
/,//p" *.html > > 输出.html
:) 没问题。这里stackoverflow.com/questions/30003570/…
sed 不是满足该要求的正确工具，-> python 或类似
难以使用 sed，还有另一个名为 awk 的工具，我认为如果您重新发布此问题并附加对一堆文件的要求，则在 10 分钟内用 awk 标记它的正确间隔时间为 2 行将收到完整的答案
谢谢！会试试看！