【问题标题】:urllib.error.HTTPError: HTTP Error 404: Not Found when using request.urlopen()urllib.error.HTTPError:HTTP 错误 404:使用 request.urlopen() 时未找到
【发布时间】:2020-07-05 02:55:20
【问题描述】:

我正在学习教程,但在使用 request.urlopen(url) 时出现错误,我已尝试检查 URL (https://www.wsj.com/market-data/quotes/PH/XPHS/JFC/historical-prices/download?MOD_VIEW=page&num_rows=150&range_days=150&startDate=06/01/2020&endDate=07/05/2020) 没关系。

这是我的代码:

from urllib import request
import datetime

def download_stock_from_day_until_today(stock_code, start_date):
    current_day = datetime.date.today()
    formatted_current_day = datetime.date.strftime(current_day, "%m/%d/%Y") #formats today's date for links
    #formatted url
    url = "https://www.wsj.com/market-data/quotes/PH/XPHS/"+ stock_code +"/historical-prices/download?MOD_VIEW=page&num_rows=150&range_days=150&startDate="+ start_date +"&endDate=" + formatted_current_day

    print(url)

    response = request.urlopen(url) #requests the csv file
    csv = response.read() #reads the csv file
    csv_str = str(csv)
    lines = csv_str.split("\\n")
    dest_url = r'asd.csv'
    fx = open(dest_url, "w")
    for line in lines:
        fx.write(line + "\n")
    fx.close()

download_stock_from_day_until_today("JFC", "06/01/2020")

我在控制台中得到的错误是:

    Traceback (most recent call last):
  File "C:/Users/Lathrix/PycharmProject/StockExcelDownloader/main.py", line 23, in <module>
    download_stock_from_day_until_today("JFC", "06/01/2020")
  File "C:/Users/Lathrix/PycharmProject/StockExcelDownloader/main.py", line 12, in download_stock_from_day_until_today
    response = request.urlopen(url) #requests the csv file
  File "C:\Users\Lathrix\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Lathrix\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\Lathrix\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 640, in http_response
    response = self.parent.error(
  File "C:\Users\Lathrix\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Users\Lathrix\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "C:\Users\Lathrix\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

【问题讨论】:

    标签: python-3.x downloading-website-files


    【解决方案1】:

    看起来 wsj.com 不喜欢 urllib 的 User-Agent。 随着行   

    response = request.urlopen(request.Request(url,headers={'User-Agent': 'Mozilla/5.0'}))
    

    您的代码工作正常

    【讨论】:

    • 你怎么知道网站不能用它呢?有什么解决办法吗?
    • @lathrix 我运行了你的代码,看到它打印出正确的 url,然后我尝试 urlopen 只是 www.wsj,com 并得到 404 错误。从浏览器它可以工作,所以我怀疑用户代理标头。解决方法在我的回答中。只需将您的第 12 行替换为我提供的行即可。
    猜你喜欢
    • 2020-11-21
    • 2017-10-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-09-30
    • 2021-04-02
    • 2021-10-03
    • 2019-08-20
    相关资源
    最近更新 更多