【发布时间】:2019-10-28 07:37:30
【问题描述】:
我有一个公司内部网页,其中列出了长长的列表中的各种数据,我想将这些数据转换为 CSV 文件以供查看。数据格式为:
*CUSTOMER_1*
Email Link Category_Text Phone_Numbers
Email Link Category_Text Phone_Numbers
*Customer_2*
Email Link Category_Text Phone_Numbers
Email Link Category_Text Phone_Numbers
用 HTML 编码的样子
<table id="responsibility">
<tr class="customer">
<td colspan="6">
<strong>CUSTOMER 1</strong>
</td>
</tr>
<tr id="tr_1" title="Role_Name1">
<td><a href="email@company.com1">Name_1</a></td>
<td>Category_Text</td>
<td>Phone_Numbers</td>
<td></td>
</tr>
<tr id="tr_2" title="Role_Name2">
<td><a href="email@company.com2">Name_2</a></td>
<td>Category_Text</td>
<td>Phone_Numbers</td>
<td></td>
</tr>
<tr class="customer">
<td colspan="6">
<strong>CUSTOMER 2</strong>
</td>
</tr>
<tr id="tr_1" title="Role_Name1">
<td><a href="email@company.com3">Name_3</a></td>
<td>Category_Text</td>
<td>Phone_Numbers</td>
<td></td>
</tr>
<tr id="tr_2" title="Role_Name2">
<td><a href="email@company.com2">Name_2</a></td>
<td>Category_Text</td>
<td>Phone_Numbers</td>
<td></td>
</tr>
</table>
我想最终得到一个包含这种方式信息的 file.csv
CUSTOMER1,Role_Name1,Name_1,Email_1,Category_Text,Phone_Numbers
CUSTOMER1,Role_Name2,Name_2,Email_2,Category_Text,Phone_Numbers
CUSTOMER2,Role_Name1,Name_3,Email_3,Category_Text,Phone_Numbers
CUSTOMER2,Role_Name1,Name_2,Email_2,Category_Text,Phone_Numbers
现在我可以获得所有客户名称的列表或所有文本的列表,但我无法弄清楚如何迭代每个客户,然后迭代每个客户的每一行
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("source.html"), "html.parser")
with open("output.csv",'w') as file:
responsibility=soup.find('table',{'id':'responsibility'})
line=responsibility.tr
for i in responsibility:
print(line)
line=responsibility.tr.next_sibling
我希望这会打印文档中的每个标签,但它只打印第一个标签,从不循环到下一个标签。
【问题讨论】:
标签: python web-scraping html-table beautifulsoup