【问题标题】:Python Mechanize Browser: HTTP Error 460Python Mechanize 浏览器:HTTP 错误 460
【发布时间】:2014-04-09 20:05:30
【问题描述】:

我正在尝试使用 mechanize 浏览器登录网站并收到 HTTP 460 错误,这似乎是一个虚构的错误,所以我不知道该怎么做。代码如下:

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

br.open("https://foo.com/login")
br.select_form(nr=1)

br[br.form.controls[2].name] = "login@gmail.com" #I can't select the form or controls by name because they change every time
br[br.form.controls[3].name] = "mypassword"
br.method = "post"

response = br.submit()

这是我在打开机械化调试消息时遇到的错误:

>>> response = br.submit()
send: 'POST /login/signin.logincomponent_0.signinform HTTP/1.1\r\nAccept-Encodin
g: identity\r\nContent-Length: 599\r\nConnection: close\r\nUser-Agent: Mozilla/5
.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 F
irefox/3.0.1\r\nHost: myaccount.foo.com\r\nCookie: DC=origin1; STUB_SESS=fil
ler%7E%5E%7E0%7Cguid%7E%5E%7E30909BA355883C551B421713700871E5%7E%5E%7E04%2F09%2F
2014; TLTHID=33A41894C02B10C01D1CF554572C7A31; TLTSID=FFDDD892C02A10C01C0BF55457
2C7A31; STUB_SESSION=filler%7E%5E%7E0%7Cstub_sid%7E%5E%7E0%7E%5E%7E04%2F09%2F201
4; JSESSIONID=B6E04AC06D5885942E299F67EE421640\r\nReferer: https://myaccount.foo
.com/login/Signin?\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\
ntLGGfIeGONQt=H4sIAAAAAAAAAJWQvUoDURCFx0AQEmwEa1ES7G4sTKNVCgUhkeBqLbN3Z9cr98%2B5
N25sfBSfQPISKex8Bx%2FA1spCsxo7w9p%2BzDnfYZ7eoFl2YDdRhVX2ULtCWemMd5ZsvNoXFSCDSgeG
vuNCoEd5TSKipxD5vi%2BkY9IqFSkGEoP0C6KMJ4p01kkoTnz3ct5%2B3Xr%2BaMDaENrS2chOn6GhCJ
vDG7zDnkZb9JLIyhZHUx%2BhVVmPF9ba2wb%2F3TZmJymEZJIaFYJydj7LDvL3x5cGwNSXe9Bd6fUYQu
k4C7fwABBho6LjH1o7vkg3yx3Y%2Fus6GqZc%2FWrWL0bnlJ9mNSLf1Sv%2Bx2TIpMSGlu2tJRpRvWDl
%2BATZ7YRMRAIAAA%3D%3D&GkhkHrHNkEGO=N&NgSEvMJNtPPU=login%40gmail.com&tl
BhliqPEpQP=mypassword&NTFAoHFKrewo=184f4acf-1300-4e65-a81d-3092301d87c213970777534
13&signIn=signIn&shs8q2kGs88H=1979975621'
reply: 'HTTP/1.1 460 Unknown\r\n'
header: Content-Type: text/html
header: Content-Length: 0
header: Content-Language: en
header: Cache-Control: no-cache
header: Cache-Control: no-store
header: Cache-Control: must-revalidate
header: Cache-Control: max-age=0
header: Cache-Control: s-maxage=0
header: Cache-Control: private
header: Expires: Wed, 09 Apr 2014 21:09:40 GMT
header: Pragma: no-cache
header: Date: Wed, 09 Apr 2014 21:09:40 GMT
header: Connection: close
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 541, in submit
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 203, in open
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 255, in _mech_open
mechanize._response.httperror_seek_wrapper: HTTP Error 460: Unknown

知道什么可能导致 460 错误吗?我尝试使用页面上的提交按钮:

response = br.form.click(br.form.controls[6].name)

但我认为我在这里没有正确使用 br.click()?

【问题讨论】:

  • 您能否询问维护您要抓取的网站的人员?正如您正确指出的那样,问题在于 460 不是公开定义的错误。这是 4xx“客户端错误”系列中的错误,这意味着(假设开发人员出于这个原因选择了它)应用程序不喜欢您的请求。但是,这似乎只有开发它的人才能回答。
  • 可怕的是,我实际上只是从发布的内容中认出了这个网站。 (你说得对,硒是一个不错的选择)

标签: python screen-scraping mechanize


【解决方案1】:

我确定该网站使用 Javascript 作为登录身份验证机制的一部分,因此 Mechanize 无法正确模拟浏览器。我切换到 Selenium,它能够处理 javascript 并让我成功登录。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-01-11
    • 2016-02-06
    • 1970-01-01
    • 2018-07-29
    • 2016-10-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多