【发布时间】:2020-10-26 19:42:20
【问题描述】:
pandas.read_html 只返回出现在未滚动 HTML 页面上的表格数据。因此,滚动返回的表格数据不在返回的数据框列表中。只有按照给定的步骤,我才能让它返回数据框列表:
- 滚动到底部
- 等待内容加载
- 如果内容不再加载,则返回
- 转到步骤 1
我的代码:
import pandas as pd
url = 'https://finance.yahoo.com/quote/GOOG/history?period1=1566844200&period2=1598466600&interval=1d&filter=history&frequency=1d'
dfs = pd.read_html(url)
print(dfs[0])
实际结果:
Date Open High Low Close* Adj Close** Volume
0 Aug 26, 2020 1608.00 1659.22 1603.60 1652.38 1652.38 3993400
1 Aug 25, 2020 1582.07 1611.62 1582.07 1608.22 1608.22 2247100
2 Aug 24, 2020 1593.98 1614.17 1580.57 1588.20 1588.20 1409900
3 Aug 21, 2020 1577.03 1597.72 1568.01 1580.42 1580.42 1446500
4 Aug 20, 2020 1543.45 1585.87 1538.20 1581.75 1581.75 1706900
... ... ... ... ... ... ... ...
96 Apr 09, 2020 1224.08 1225.57 1196.73 1211.45 1211.45 2175400
97 Apr 08, 2020 1206.50 1219.07 1188.16 1210.28 1210.28 1975100
98 Apr 07, 2020 1221.00 1225.00 1182.23 1186.51 1186.51 2387300
99 Apr 06, 2020 1138.00 1194.66 1130.94 1186.92 1186.92 2664700
100 *CPA *CPA *CPA *CPA *CPA *CPA *CPA
[101 rows × 7 columns]
预期结果:
Date Open High Low Close* Adj Close** Volume
0 Aug 26, 2020 1608.00 1659.22 1603.60 1652.38 1652.38 3993400
1 Aug 25, 2020 1582.07 1611.62 1582.07 1608.22 1608.22 2247100
2 Aug 24, 2020 1593.98 1614.17 1580.57 1588.20 1588.20 1409900
3 Aug 21, 2020 1577.03 1597.72 1568.01 1580.42 1580.42 1446500
4 Aug 20, 2020 1543.45 1585.87 1538.20 1581.75 1581.75 1706900
... ... ... ... ... ... ... ...
249 Apr 30, 2019 1224.08 1225.57 1196.73 1211.45 1211.45 2175400
250 Apr 29, 2019 1206.50 1219.07 1188.16 1210.28 1210.28 1975100
251 Apr 27, 2019 1221.00 1225.00 1182.23 1186.51 1186.51 2387300
252 Aug 26, 2019 1138.00 1194.66 1130.94 1186.92 1186.92 2664700
253 *CPA *CPA *CPA *CPA *CPA *CPA *CPA
[253 rows × 7 columns]
【问题讨论】:
-
以下应该可以解决您的问题:stackoverflow.com/questions/39218742/…
标签: python python-3.x pandas dataframe