如何使用 grep 搜索模式并排除另一个模式答案

【问题标题】：How to search for a pattern using grep and exclude another pattern如何使用 grep 搜索模式并排除另一个模式
【发布时间】：2023-04-01 18:59:01
【问题描述】：

我一直在寻找其他几个答案，但找不到我想要的。

我有一个包含一些 url 的大文件，我正在寻找其中包含 tt 模式的 url。当然每一行都有http。所以如果我这样做了

grep tt myfile | wc -l

我得到了文件的所有行。如何在不匹配 http 的情况下找到匹配 tt 的模式？

我尝试了 --exclude 并且它不起作用，我认为 exclude 仅适用于路径，对吗？

我可以使用 sed 并用其他东西替换 http，然后正常 grep，但这有多优雅？一定有别的办法……

【问题讨论】：

我想我会用 #### 替换所有 http 并使用 grep。

标签： regex bash sed grep

【解决方案1】：

您可以使用-P 开关让grep 将模式解释为Perl 正则表达式。然后，您可以使用环视断言来匹配 not 前面是 h 和 not 的 tts其次是p://。

grep -iP '(?<!h)tt(?!ps?://)' myfile | wc -l

【讨论】：

没错，这就是我的问题

【解决方案2】：

有下一个测试文件

some text http://example.com/redirect?http://some/test.html             #not wanted
some text http://example.com/notete.html                                #not wanted
some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

下一个：

grep -P 'http://\S*tt(?!p:)' file

打印

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted

意思

  http://                  'http://'
----------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
----------------------------------------------------------------------
  tt                       'tt'
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    p:                       'p:'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------

和

grep -cP 'http://\S*tt(?!p:)' file

将计算匹配的行数

如果开头的http:// 是可选的，

 grep -P '(<=http://)?\S*tt(?!p:)' file

将执行相同的工作并针对相同的输入打印

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

用于捕获 URL（和路径）

grep -oP '.*?\K(http:/)?/\S*tt(?!p:)\S*' file

打印

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
/example.com/somettsome.html

仅捕获http://

grep -oP '.*?\Khttp://\S*tt(?!p:)\S*' file

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

【讨论】：

简单有效！
如果该行不以http 开头，这不会失败吗？
@AmalMurali OP 说：当然每一行都有 http。
@jm666：是的，但这不等于：“当然每一行都以 http开头”

【解决方案3】：

你可以像这样使用awk

cat file:
http://example.com
http://google.com
my.tt.com
t.foo.bar
http://foobar.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/notete.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

awk -F"http:" '$NF~/tt/'
my.tt.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

【讨论】：

【解决方案4】：

egrep -c 'http://[^ ?]*tt' YourFile

-c 四个计数
egrep 用于正则表达式（您也可以使用 grep -E）模式，允许排除搜索条件的 http 部分
添加和排除空格/特殊 url 字符（来自 Jotne 和后续评论的建议）以避免从同一行的最终第二个 url 中获取 tt。

【讨论】：

此假设行确实包含 http 开头。如果是这样，那就没问题了。
@Jotne OP 说：当然每一行都有 http。
@jm666：OP 没有说 begins 以 http 开头。
这并不假定行以 hhtp:// 开头，但该行包含 http://。事实上，它假设 tt 在 http:// 之后，所以如果 tt if before （这里是否可能出现这种情况？并且应该在此配置中考虑到它，而不是因为它告诉获取 URL，无论它周围是什么，它都会失败）就行了。所以如果至少有 1 个以and 开头的 url 里面有 tt，它就可以工作。我只是适应删除 http 和 tt 之间的空间（从下一个 http:// 获取 tt ）

【解决方案5】：

您可以使用 grep -v 排除具有这样模式的行

grep tt myfile | grep -v http | wc -l

这首先给出带有“tt”的行，然后排除带有“http”的行，然后计算它。

【讨论】：