【问题标题】:How to fix HTTP error in Python 3 using urlopen with urllib如何使用 urlopen 和 urllib 修复 Python 3 中的 HTTP 错误
【发布时间】:2020-07-04 19:36:53
【问题描述】:

我在标题中添加了一个用户代理。以下是我的代码和错误

from urllib.request import Request, urlopen
import json
from bs4 import BeautifulSoup
import time

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1)'}

domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/",data=bytes(json.dumps(headers), encoding="utf-8"))
response =urlopen(domain)

我也试过不同的版本,注意域变量的变化

from urllib.request import Request, urlopen
import json
from bs4 import BeautifulSoup
import time

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1)'}

domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/",headers)
response =urlopen(domain)

这些代码都不起作用。 错误:

line 9, in <module>
    response =urlopen(domain)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

【问题讨论】:

    标签: python-3.x web-scraping urllib


    【解决方案1】:

    使用.add_header() 添加正确的User-Agent

    例如:

    from urllib.request import Request, urlopen
    
    domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/")
    domain.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0')
    response =urlopen(domain)
    
    print(response.read())
    

    打印:

    b'<!DOCTYPE html>\r\n<html lang="en-US" prefix="og: http://ogp.me/ns#">\r\n<head itemscope="itemscope" itemtype="http://schema.org/WebSite">\r\n\t<meta charset="UTF-8" />
    
    ... and so on.
    

    【讨论】:

      猜你喜欢
      • 2018-05-15
      • 2019-11-15
      • 2013-01-25
      • 1970-01-01
      • 1970-01-01
      • 2012-08-09
      • 1970-01-01
      • 2014-11-09
      • 1970-01-01
      相关资源
      最近更新 更多