【问题标题】:Using Pandas to read excel from url使用 Pandas 从 url 读取 excel
【发布时间】:2020-06-04 12:26:46
【问题描述】:

我正在开展一个分析 COVID19 数据的个人项目。目前,我正在下载 ourworldindata.org 提供的 excel 表格,可在此 url -> https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx

但是,当我尝试在 pandas(如下)中执行命令时,我得到一个错误列表。根本原因可能是什么?

url = 'https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx'
df = pd.read_excel(url, sheet_name='Sheet1')

错误

    Traceback (most recent call last):   File "<input>", line 1, in <module>   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 824, in __init__
self._reader = self._engines[engine](self._io)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 21, in __init__
super().__init__(filepath_or_buffer)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 351, in __init__
self.book = self.load_workbook(filepath_or_buffer)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 34, in load_workbook
return open_workbook(file_contents=data)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg) xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n<!D'

如果我下载到我的电脑上,请不要让熊猫可以读取它

【问题讨论】:

  • 你必须用 requests.get().content 来做
  • 下载原始文件,url=url.replace('blob','raw')

标签: python pandas


【解决方案1】:

尝试原始 excel 文件的链接:

import pandas as pd
url='https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx?raw=true'
df=pd.read_excel(url, sheet_name='Sheet1')

【讨论】:

    【解决方案2】:

    你可以通过请求来做到这一点

    import pandas as pd
    import io
    import requests
    
    url = 'https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx'
    
    get_content = requests.get(url).content
    
    df = pd.read_csv(io.StringIO(get_content .decode('utf-8')))
    

    我这样做是为了避免使用本地驱动器或谷歌驱动器,并节省连接时间。

    【讨论】:

    • 感谢您的回复。这带来了一些错误,但是@luigigi 建议的方法运行良好
    猜你喜欢
    • 1970-01-01
    • 2023-04-10
    • 2019-10-31
    • 2018-11-27
    • 2016-06-30
    • 1970-01-01
    • 2020-01-09
    • 1970-01-01
    相关资源
    最近更新 更多