【发布时间】:2018-07-17 12:46:34
【问题描述】:
我想使用 BeautifulSoup 从下面给出的 html 文件中提取所有表格并将其写入 csv。
HTML 如下所示:
<h4>Site Name : Aria</h4>
<table style="width: 100%">
<tbody><tr>
<th style="width: 25%"><strong>Dn Name:</strong></th>
<td style="width: 25%"><strong>Aria</strong></td>
<th style="width: 25%"><strong>WL:</strong></th>
<td style="width: 25%"><strong> Meters (m)</strong></td>
</tr>
<tr>
<th><strong>River Name:</strong></th>
<td><strong>Ben</strong></td>
<th><strong>DL:</strong></th>
<td><strong> Meters (m)</strong></td>
</tr>
<tr>
<th><strong>Basin Name:</strong></th>
<td><strong>GAN<strong></strong></strong></td>
<th><strong>HFL:</strong></th>
<td><strong>49.4 Meters (m)<strong></strong></strong></td>
</tr>
<tr>
<th><strong>Div Name:</strong></th>
<td><a target="_blank" href="http://imd.gov.in/ onclick="window.open(this.href, this.target, 'width=1000, height=600, toolbar=no, resizable=no'); return false;">LGD-I</a></td>
<th><strong>HFL date:</strong></th>
<td>14-08-2017</td>
</tr>
</tbody></table>
<p> </p>
<table>
<tbody><tr>
<th colspan="3" style="text-align: center;"><strong>PRESENT WL</strong></th>
</tr>
<tr>
<td class="" style="width:33%; height:18px;">Date: 17-07-2018 12:00</td>
<td class="" style="width:33%;">Value: 45.43 Meters (m)</td>
<td class="" style="width:33%;">Trend: Steady</td>
</tr>
<tr>
<th colspan="3" style="text-align: center;"><strong>CUMULATIVE DAILY RF</strong></th>
</tr>
<tr>
<td style="width:33%; height:18px;">Date: 17-07-2018 08:30</td>
<td style="width:33%;">Value: 0.0 Milimiters (mm)</td>
<td style="width:33%;"></td>
</tr>
</tbody></table>
<p> </p>
<table style="width: 100%">
<tbody><tr>
<th colspan="4" style="text-align: center;"><strong>NO FORECAST</strong></th>
</tr>
</tbody></table>
</div>
我慢慢地从所有三个表格中提取文本,但我无法以所需的格式编写它
我的代码
now = datetime.datetime.now()
date = now.strftime("%d-%m-%Y")
os.chdir(r'D:\shared')
soup = BeautifulSoup(response.text,"html5lib")
tables = soup.find_all("tr")
test =[]
for table in tables:
test.append(table.get_text())
filename = 'Water'+'-'+str(date)+'.csv'
out = open(filename, mode='ab')
writer = csv.writer(out)
writer.writerow(data)
out.close()
在输出 csv 中,第一个表被写入第一列,第二个表被写入第二个表,第三个表被写入第三列。
我想要以下格式的数据:
Site Name: Aria
Dn Name: Aria
WL: Meters (m)
River Name: Ben
DL: Meters (m)
Basin Name: GAN
HFL: 49.4 Meters (m)
Div Name: LGD-I)
HFL date: 14-08-2017
PRESENT WL
Date: 17-07-2018 12:00
Value: 45.43 Meters (m)
Trend: Steady
CUMULATIVE
DAILY RF
Date: 17-07-2018 08:30
Value: 0.0 Milimiters (mm)
NO FORECAST
【问题讨论】:
-
data的结构是什么 - 您正在使用 csv writer 编写变量? -
数据是上面提到的html...
标签: python beautifulsoup