【问题标题】:Filter the access logs on: User-agent contains "Googlebot" Referer contains "google"过滤访问日志:User-agent 包含“Googlebot”Referer 包含“google”
【发布时间】:2021-07-14 17:08:39
【问题描述】:

我要过滤访问日志:

  • 用户代理包含“Googlebot”
  • Referer 包含“google”

我使用这个varnishlog 命令:

varnishlog -q "ReqHeader ~ 'User-Agent.*Googlebot'"

这是我的输出:

*   << Request  >> 564834158
-   Begin          req 564834144 rxreq
-   Timestamp      Start: 1626180326.557796 0.000000 0.000000
-   Timestamp      Req: 1626180326.557796 0.000000 0.000000
-   ReqStart       xx.xxx.xx.xxx 45253
-   ReqMethod      GET
-   ReqURL         /xx/yy/xxx-yyy
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: www.yyyyyy.com
-   ReqHeader      AMP-Cache-Transform: google;v="1..7"
-   ReqHeader      Connection: keep-alive
-   ReqHeader      Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   ReqHeader      From: googlebot(at)googlebot.com
-   ReqHeader      User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
-   ReqHeader      Accept-Encoding: gzip, deflate, br
-   ReqHeader      If-Modified-Since: Mon, 12 Jul 2021 10:56:33 GMT
-   ReqHeader      X-Forwarded-Proto: https
-   ReqHeader      X-Forwarded-For: xx.xxx.xx.xxx
-   VCL_call       RECV
-   ReqUnset       Accept-Encoding: gzip, deflate, br
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      X-Fos-Original-Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   ReqUnset       Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   ReqHeader      accept: application/vnd.fos.user-context-hash
-   ReqHeader      X-Fos-Original-Url: /xx/yy/xxx-yyy
-   ReqURL         /userContext.php
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            561267092 34.193790 10.000000 0.000000
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Date: Tue, 13 Jul 2021 12:44:00 GMT
-   RespHeader     Server: Apache
-   RespHeader     Access-Control-Allow-Origin: https://www.yyyyyy.com
-   RespHeader     Access-Control-Allow-Credentials: true
-   RespHeader     Expires: Thu, 19 Nov 1981 08:52:00 GMT
-   RespHeader     Pragma: no-cache
-   RespHeader     X-User-Context-Hash: dbd07ab4746551895276bf2342469e1dc3b0f86ca18d1bab2dec6b82c9698a8c
-   RespHeader     Cache-Control: max-age=120, s-max-age=120
-   RespHeader     Vary: Cookie
-   RespHeader     Set-Cookie: mainMenuId=1002323; expires=Thu, 12-Aug-2021 12:44:00 GMT; Max-Age=2592000; path=/; domain=.xxxx.yyyy
-   RespHeader     Set-Cookie: PHPSESSID=qhn0l4m4f18g4g3cer3d8e5pg3; path=/; domain=.xxxx.yyyy; HttpOnly
-   RespHeader     Set-Cookie: connexion_id=1898574699; expires=Sun, 09-Jan-2022 12:44:00 GMT; Max-Age=15552000; path=/; domain=.xxxx.yyyy
-   RespHeader     Set-Cookie: connexion_id=1898574699; expires=Sun, 09-Jan-2022 12:44:00 GMT; Max-Age=15552000; path=/; domain=.xxxx.yyyy
-   RespHeader     Set-Cookie: connexion_id=1898574699; expires=Sun, 09-Jan-2022 12:44:00 GMT; Max-Age=15552000; path=/; domain=.xxxx.yyyy
-   RespHeader     Set-Cookie: membre_statut_id=60; expires=Mon, 11-Oct-2021 12:44:00 GMT; Max-Age=7776000; path=/; domain=.xxxx.yyyy
-   RespHeader     Set-Cookie: statutSolde=pub; expires=Thu, 12-Aug-2021 12:44:00 GMT; Max-Age=2592000; path=/; domain=.xxxx.yyyy
-   RespHeader     X-Content-Type-Options: nosniff
-   RespHeader     Content-Type: application/vnd.fos.user-context-hash
-   RespHeader     X-Varnish: 564834158 561267092
-   RespHeader     Age: 85
-   RespHeader     Via: 1.1 varnish (Varnish/5.2)
-   VCL_call       DELIVER
-   RespHeader     X-Cache: HIT
-   RespHeader     X-Cache-Hits: 261
-   ReqHeader      X-User-Context-Hash: dbd07ab4746551895276bf2342469e1dc3b0f86ca18d1bab2dec6b82c9698a8c
-   VCL_return     restart
-   Timestamp      Process: 1626180326.558006 0.000209 0.000209
-   Timestamp      Restart: 1626180326.558009 0.000213 0.000003
-   Link           req 564834159 restart
-   End

*   << Request  >> 562977018
-   Begin          req 562977017 restart
-   Timestamp      Start: 1626180326.055452 0.000223 0.000000
-   ReqStart       xx.xxx.xx.xxx 47603
-   ReqMethod      GET
-   ReqURL         /userContext.php
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: www.yyyyyy.com
-   ReqHeader      Connection: keep-alive
-   ReqHeader      From: googlebot(at)googlebot.com
-   ReqHeader      User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
-   ReqHeader      X-Forwarded-Proto: https
-   ReqHeader      X-Forwarded-For: xx.xxx.xx.xxx
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      X-Fos-Original-Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   ReqHeader      accept: application/vnd.fos.user-context-hash
-   ReqHeader      X-Fos-Original-Url: /xx/yy/xxx-yyy
-   ReqHeader      X-User-Context-Hash: dbd07ab4746551895276bf2342469e1dc3b0f86ca18d1bab2dec6b82c9698a8c
-   VCL_call       RECV
-   ReqUnset       Accept-Encoding: gzip
-   ReqHeader      Accept-Encoding: gzip
-   ReqURL         /be/fr/isseymiyake-sac-seau-lucent-rose-femme-4850263
-   ReqUnset       X-Fos-Original-Url: /xx/yy/xxx-yyy
-   ReqUnset       accept: application/vnd.fos.user-context-hash
-   ReqHeader      accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   ReqUnset       X-Fos-Original-Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 562977019 fetch
-   Timestamp      Fetch: 1626180328.191066 2.135837 2.135614
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Date: Tue, 13 Jul 2021 12:45:26 GMT
-   RespHeader     Access-Control-Allow-Origin: https://www.yyyyyy.com
-   RespHeader     Access-Control-Allow-Credentials: true
-   RespHeader     isVarnish: 1
-   RespHeader     Vary: X-User-Context-Hash,Accept-Encoding
-   RespHeader     templateName: FICHE_PRODUIT_TPL_ID
-   RespHeader     Content-Encoding: gzip
-   RespHeader     X-Content-Type-Options: nosniff
-   RespHeader     Content-Length: 41620
-   RespHeader     Content-Type: text/html; charset=ISO-8859-1
-   RespHeader     Cache-Control: max-age=4
-   RespHeader     X-Varnish: 562977018
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish (Varnish/5.2)
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   RespUnset      Vary: X-User-Context-Hash,Accept-Encoding
-   RespHeader     Vary: ,Accept-Encoding
-   RespUnset      Vary: ,Accept-Encoding
-   RespHeader     Vary: Accept-Encoding
-   VCL_return     deliver
-   Timestamp      Process: 1626180328.191101 2.135873 0.000036
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1626180328.192502 2.137274 0.001400
-   ReqAcct        407 0 407 499 41620 42119
-   End

*   << Request  >> 564834159
-   Begin          req 564834158 restart
-   Timestamp      Start: 1626180326.558009 0.000213 0.000000
-   ReqStart       xx.xxx.xx.xxx 45253
-   ReqMethod      GET
-   ReqURL         /userContext.php
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: www.yyyyyy.com
-   ReqHeader      AMP-Cache-Transform: google;v="1..7"
-   ReqHeader      Connection: keep-alive
-   ReqHeader      From: googlebot(at)googlebot.com
-   ReqHeader      User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
-   ReqHeader      If-Modified-Since: Mon, 12 Jul 2021 10:56:33 GMT
-   ReqHeader      X-Forwarded-Proto: https
-   ReqHeader      X-Forwarded-For: xx.xxx.xx.xxx
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      X-Fos-Original-Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   ReqHeader      accept: application/vnd.fos.user-context-hash
-   ReqHeader      X-Fos-Original-Url: /xx/yy/xxx-yyy
-   ReqHeader      X-User-Context-Hash: dbd07ab4746551895276bf2342469e1dc3b0f86ca18d1bab2dec6b82c9698a8c
-   VCL_call       RECV
-   ReqUnset       Accept-Encoding: gzip
-   ReqHeader      Accept-Encoding: gzip
-   ReqURL         /xx/yy/xxx-yyy
-   ReqUnset       X-Fos-Original-Url: /xx/yy/xxx-yyy
-   ReqUnset       accept: application/vnd.fos.user-context-hash
-   ReqHeader      accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   ReqUnset       X-Fos-Original-Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 564834160 fetch
-   Timestamp      Fetch: 1626180328.405520 1.847724 1.847511
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Date: Tue, 13 Jul 2021 12:45:26 GMT
-   RespHeader     Access-Control-Allow-Origin: https://www.yyyyyy.com
-   RespHeader     Access-Control-Allow-Credentials: true
-   RespHeader     isVarnish: 1
-   RespHeader     Vary: X-User-Context-Hash,Accept-Encoding
-   RespHeader     templateName: CARROUSEL_MARQUE_TPL_ID
-   RespHeader     Content-Encoding: gzip
-   RespHeader     X-Content-Type-Options: nosniff
-   RespHeader     Content-Type: text/html; charset=ISO-8859-1
-   RespHeader     Cache-Control: max-age=4
-   RespHeader     X-Varnish: 564834159
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish (Varnish/5.2)
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   RespUnset      Vary: X-User-Context-Hash,Accept-Encoding
-   RespHeader     Vary: ,Accept-Encoding
-   RespUnset      Vary: ,Accept-Encoding
-   RespHeader     Vary: Accept-Encoding
-   VCL_return     deliver
-   Timestamp      Process: 1626180328.405548 1.847751 0.000027
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Transfer-Encoding: chunked
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1626180328.409407 1.851611 0.003860
-   ReqAcct        586 0 586 507 63033 63540
-   End

我想要这种数据格式:

  • 日期/时间或时间戳
  • 主机名
  • IP 客户端
  • 参考
  • 路径
  • 用户代理
  • 状态码
  • 字节
  • 加载时间(以毫秒为单位)
  • 方案(http 或 htps)

【问题讨论】:

    标签: varnish


    【解决方案1】:

    清漆日志

    如果你想使用varnishlog,这是你需要的命令:

    varnishlog -c -g request -I Timestamp:Start -I ReqHeader:Host \
        -i ReqStart -I ReqHeader:Referer -i ReqUrl -I ReqHeader:User-Agent \
        -i RespStatus -i ReqAcct -I Timestamp:Resp -I ReqHeader:X-Forwarded-Proto \
        -q "ReqHeader:User-Agent ~ 'Googlebot' and ReqHeader:Referer ~ 'google'"
    

    以下是此命令的一些潜在输出:

    *   << Request  >> 46
    -   Timestamp      Start: 1626253862.684183 0.000000 0.000000
    -   ReqStart       172.17.0.1 58432 a0
    -   ReqURL         /
    -   ReqHeader      Host: localhost
    -   ReqHeader      X-Forwarded-Proto: https
    -   ReqHeader      Referer: google
    -   ReqHeader      User-Agent: Googlebot
    -   RespStatus     200
    -   Timestamp      Resp: 1626253862.684398 0.000215 0.000139
    -   ReqAcct        114 0 114 326 612 938
    

    请参阅https://varnish-cache.org/docs/trunk/reference/vsl.html 了解每个 VSL 字段的含义。有关 VSL 查询的更多信息,请参阅 https://varnish-cache.org/docs/trunk/reference/vsl-query.html

    输出是多行的,不容易解析。另一方面,它非常冗长。像往常一样:这是一个权衡。

    清漆

    也可以使用varnishncsa,这是 Apache 风格的日志格式。输出是单行的,不太冗长。

    以下命令可用于检索您需要的信息:

    varnishncsa -c -g request \
         -q "ReqHeader:User-Agent ~ 'Googlebot' and ReqHeader:Referer ~ 'google'" \
         -F "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %T"
    

    你得到的格式是标准格式的扩展,如下:

    %h %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i"
    

    我刚刚添加了%T 以显示检索内容所花费的时间。这以秒为单位,大部分时间等于0。如果你想要微秒级的精度,你可以使用%{VSL:Timestamp:Resp[2]}x

    以下是此命令的一些潜在输出:

    172.17.0.1 - - [14/Jul/2021:09:10:23 +0000] "GET http://localhost/ HTTP/1.1" 200 612 "google" "Googlebot" 0
    

    但是,如果您想要在列表中指定的确切顺序和字段,您将得到以下varnishncsa 命令:

    varnishncsa -c -g request \
         -q "ReqHeader:User-Agent ~ 'Googlebot' and ReqHeader:Referer ~ 'google'" \
         -F "%t %{Host}i %h %{Referer}i %U %{User-agent}i %s %b  %{VSL:Timestamp:Resp[2]}x %{X-Forwarded-Proto}i"
    

    仅供参考:我正在通过 X-Forwarded-Proto 检索该方案。

    以下是一些潜在的输出:

    [14/Jul/2021:09:09:19 +0000] localhost 172.17.0.1 google / Googlebot 200 612  0.001057 https
    

    更多信息请参见https://varnish-cache.org/docs/trunk/reference/varnishncsa.html#display-varnish-logs-in-apache-ncsa-combined-log-format

    【讨论】:

      猜你喜欢
      • 2018-05-20
      • 2011-06-13
      • 1970-01-01
      • 2021-12-10
      • 1970-01-01
      • 1970-01-01
      • 2011-01-01
      • 1970-01-01
      • 2020-04-24
      相关资源
      最近更新 更多