【发布时间】:2019-11-22 02:54:55
【问题描述】:
我是一个通过小项目学习python的初学者,所以目前正在学习使用BeautifulSoup进行网页抓取。页面的 html 如下所示:
<div class="BrandList"> <div><b>Brand Name: </b>ONCOTRON INJ</div>
<div><b>Manufacture Name: </b>SUN PHARMA</div> <div><b>Compositions:
</b>
Mitoxantrone 2mg/ml injection,
</div>
我需要解析信息并将其存储在包含三列的 csv 中:名称、制造商名称和成分。
我尝试运行我的代码,但我只能提取品牌名称,而我想要 div 中的剩余文本。
import requests
from bs4 import BeautifulSoup
data = requests.get ('http://www.inpharmation.in/Search/BrandList?Type=Manufacturer&ProductID=79').text
soup= BeautifulSoup(data, 'lxml')
brand = soup.find('div', attrs = {'id':'maincontent'})
out_filename = "Sunp.csv"
headers = "brand,Compositions \n"
f = open(out_filename, "w")
f.write(headers)
for BrandList in brand.findAll('div', attrs = {'class':'BrandList'}):
BrandList['Name'] = Brand_Name.b.text
BrandList['Compositions'] = Compositions.b.text
print("brand: " + brand + "\n")
print("Compositions: " + Compositions + "\n")
f.write (brand + "," + Compositions + "\n")
f.close()
我期望输出品牌名称、成分和制造商名称,但我只得到品牌名称。
【问题讨论】:
标签: python web-scraping beautifulsoup