【发布时间】:2020-06-04 12:26:46
【问题描述】:
我正在开展一个分析 COVID19 数据的个人项目。目前,我正在下载 ourworldindata.org 提供的 excel 表格,可在此 url -> https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx
但是,当我尝试在 pandas(如下)中执行命令时,我得到一个错误列表。根本原因可能是什么?
url = 'https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx'
df = pd.read_excel(url, sheet_name='Sheet1')
错误
Traceback (most recent call last): File "<input>", line 1, in <module> File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 824, in __init__
self._reader = self._engines[engine](self._io) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 21, in __init__
super().__init__(filepath_or_buffer) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 351, in __init__
self.book = self.load_workbook(filepath_or_buffer) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 34, in load_workbook
return open_workbook(file_contents=data) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows, File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8]) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg) xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n<!D'
如果我下载到我的电脑上,请不要让熊猫可以读取它
【问题讨论】:
-
你必须用 requests.get().content 来做
-
下载原始文件,
url=url.replace('blob','raw')