【问题标题】:Error when assigning variable to urlopen(my_url)将变量分配给 urlopen(my_url) 时出错
【发布时间】:2019-09-22 15:51:48
【问题描述】:

我正在使用 BeautifulSoup 和 urlopen 进行网页抓取。当我尝试使用 urlopen 打开 url 时,会提示一个大错误。我做错了什么?

我使用 python 3.7.2,我安装了 Bs4,和 urlopen。

import bs4
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
my_url = 'https://ebgames.com.au/search?q=Skyrim'
var = urlopen(my_url)

ERROR:

Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    urlopen (my_url)
  File "C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

预期的结果是没有错误。

【问题讨论】:

  • 3.4以上的情况下使用:import requestsrequests.get(url)

标签: python beautifulsoup urlopen


【解决方案1】:

尝试使用您的请求作为请求标头设置浏览器用户代理:

import bs4
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
my_url = req = Request('https://ebgames.com.au/search?q=Skyrim', headers={'User-Agent': 'Mozilla/5.0'})
var = urlopen(my_url).read()

更多信息,请参考answer

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-03-27
    • 1970-01-01
    • 2023-03-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多