【问题标题】:web.py: Get the original web requestweb.py:获取原始网络请求
【发布时间】:2013-04-16 15:00:46
【问题描述】:

我需要访问浏览器在web.py中发送给服务器的原始http请求。

例如,这是 Chromium 在我浏览某个页面时发出的请求:

$ nc -l 8081
GET / HTTP/1.1
Host: 127.0.0.1:8081
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22
Accept-Encoding: gzip,deflate,sdch
Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

我试图从web.ctx.env 获取它,但那是一本字典(虽然我更喜欢原始的原始文本请求)并且它与其他一些数据混合在一起:

SERVER_SOFTWARE: CherryPy/3.2.0 Server
SCRIPT_NAME: 
ACTUAL_SERVER_PROTOCOL: HTTP/1.1
REQUEST_METHOD: GET
PATH_INFO: /
SERVER_PROTOCOL: HTTP/1.1
QUERY_STRING: 
HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.3
HTTP_USER_AGENT: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22
HTTP_CONNECTION: keep-alive
REMOTE_PORT: 55409
SERVER_NAME: localhost
REMOTE_ADDR: 127.0.0.1
wsgi.url_scheme: http
SERVER_PORT: 8081
wsgi.input: <web.wsgiserver.KnownLengthRFile object at 0x940b16c>
HTTP_HOST: 127.0.0.1:8081
wsgi.multithread: True
REQUEST_URI: /
HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
wsgi.version: (1, 0)
wsgi.run_once: False
wsgi.errors: <open file '<stderr>', mode 'w' at 0xb73010d0>
wsgi.multiprocess: False
HTTP_ACCEPT_LANGUAGE: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
HTTP_ACCEPT_ENCODING: gzip,deflate,sdch

这是我用来获取上面输出的代码:

#!/usr/bin/env python

import web

urls = ('(.*)', 'urlhandler')

class urlhandler:
  def GET(self, url):
    txt = ""
    for k, v in web.ctx.env.items():
      txt += ": ".join([k, str(v)]) + "\n"
    return txt

if __name__ == '__main__':
  app = web.application(urls, globals())
  app.run()

我应该从不需要的数据中清除此字典还是有直接的方法来获取原始请求?

【问题讨论】:

    标签: python http-headers web.py


    【解决方案1】:

    按照 Andrey 的建议,我得到了这段代码。它尝试重建网络请求,也许这不是获取它的最佳方法,但这是我迄今为止发现的唯一方法。

    该程序将显示请求页面的网络请求(它适用于 POST 和 GET 请求):

    #!/usr/bin/env python
    
    import web
    from urllib import quote
    
    urls = ('(.*)', 'urlhandler')
    
    def adaptHeader(txt):
      """Input: string, header name as it is in web.ctx.env
      Output: string, header name according to http protocol.
      es: "HTTP_CACHE_CONTROL" => "Cache-Control"
      """
      txt = txt.replace('HTTP_', '')
      return '-'.join((t[0] + t[1:].lower() for t in txt.split('_')))
    
    def rawRequest(env):
      """Reconstruct and return the web request based on web.ctx.env"""
    
      # url reconstruction
      # see http://www.python.org/dev/peps/pep-0333/#url-reconstruction
      url = env['wsgi.url_scheme']+'://' # http/https
      url += env.get('HTTP_HOST') or (env['SERVER_NAME']+':'+env['SERVER_PORT']) # host + port
      url += quote(env.get('SCRIPT_NAME', ''))
      url += quote(env.get('PATH_INFO', ''))
      url += ('?' + env['QUERY_STRING']) if env.get('QUERY_STRING') else '' # GET querystring
    
      # get/post request
      req = ' '.join((env['REQUEST_METHOD'], url, env['SERVER_PROTOCOL'])) + '\n'
    
      # headers
      for k, v in env.items():
        if k.startswith('HTTP') or k in ('CONTENT_TYPE', 'CONTENT_LENGTH'):
          req += adaptHeader(k) + ': ' + str(v) + '\n'
    
      # post data 
      try:
        req += '\n' + env['wsgi.input'].read(int(env['CONTENT_LENGTH']))
      except:
        pass
    
      return req
    
    class urlhandler:
      def GET(self, url):
        return rawRequest(web.ctx.env)
      def POST(self, url):
        return rawRequest(web.ctx.env)
    
    if __name__ == '__main__':
      app = web.application(urls, globals())
      app.run()
    

    【讨论】:

    • 看来您可以更简单地计算请求 url:web.ctx.home + web.ctx.fullpath。检查这个:webpy.org/cookbook/ctx
    【解决方案2】:

    看看你有什么,你可以通过以“HTTP_”开头的键过滤web.ctx.env。这比获取和解析原始请求标头要容易。

    你可以在这里查看 wsgi 规范 http://www.python.org/dev/peps/pep-0333/#environ-variables

    HTTP_Variables 对应于客户端提供的 HTTP 的变量 请求标头(即名称以“HTTP_”开头的变量)。这 这些变量的存在与否应与 请求中是否存在相应的 HTTP 标头。

    【讨论】:

    • 谢谢,也许这是获得它的唯一方法。无论如何,我不必解析请求,我只需要它。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-08-13
    • 2020-11-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-12-06
    相关资源
    最近更新 更多