【发布时间】:2021-12-01 21:23:01
【问题描述】:
我无法确定表格上标题的索引,我想将其抓取并输出到 csv 文件中,因此我需要归类为 ResidualMaturity 和 Last 的列,而我只能得到表的主标题而不是子标题。 我曾尝试使用 df[('Yield', 'Last'),但只能获得该特定列,而不能同时获得两者。
import pandas as pd
import requests
url = 'http://www.worldgovernmentbonds.com/country/japan/'
r = requests.get(url)
df_list = pd.read_html(r.text, flavor='html5lib')
df = df_list[4]
yc = df[["ResidualMaturity", "Yield"]]
print(yc)
电流输出
ResidualMaturity Yield
ResidualMaturity Last Chg 1M Chg 6M
0 1 month -0.114% +9.0 bp +7.4 bp
1 3 months -0.109% 0.0 bp -1.9 bp
2 6 months -0.119% -0.3 bp -1.9 bp
3 9 months -0.119% +10.0 bp +9.9 bp
4 1 year -0.125% -0.7 bp +0.9 bp
5 2 years -0.121% +0.9 bp +1.3 bp
6 3 years -0.113% +2.2 bp +2.7 bp
7 4 years -0.094% +2.6 bp +2.1 bp
8 5 years -0.082% +2.3 bp +1.8 bp
9 6 years -0.056% +3.4 bp +0.4 bp
10 7 years -0.029% +5.1 bp -0.4 bp
11 8 years 0.007% +5.6 bp -0.7 bp
12 9 years 0.052% +5.6 bp -1.3 bp
13 10 years 0.087% +4.7 bp -1.2 bp
14 15 years 0.288% +4.3 bp -2.4 bp
15 20 years 0.460% +3.7 bp -1.5 bp
16 30 years 0.689% +3.5 bp +1.6 bp
17 40 years 0.757% +3.5 bp +7.3 bp
我想要得到的期望输出
ResidualMaturity Last
0 1 month -0.114%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.119%
4 1 year -0.125%
5 2 years -0.121%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.082%
9 6 years -0.056%
10 7 years -0.029%
11 8 years 0.007%
12 9 years 0.052%
13 10 years 0.087%
14 15 years 0.288%
15 20 years 0.460%
16 30 years 0.689%
17 40 years 0.757%
我尝试过使用df[('Yield', 'Last')],但只能获取该特定列,而不能同时获取两者。
【问题讨论】:
标签: python python-3.x pandas csv web-scraping