【发布时间】:2018-10-21 12:10:00
【问题描述】:
我的目标是从维基百科page of Microsoft 的信息框中提取“成立”和“产品”信息。我正在使用 python 3,我使用了我在网上找到的以下代码,但它不起作用
# importing modules
import requests
from lxml import etree
# manually storing desired URL
url='https://en.wikipedia.org/wiki/Microsoft'
# fetching its url through requests module
req = requests.get(url)
store = etree.fromstring(req.text)
# trying to get the 'Founded' portion of above
# URL's info box of Wikipedia's page
output = store.xpath('//table[@class="infoboxvcard"]/tr[th/text()="Founded"]/td/i')
# printing the text portion
print output[0].text
#Expected result:
Founded:April 4, 1975; 43 years ago in Albuquerque, New Mexico, U.S.
【问题讨论】:
-
你可以使用wikidata API代替抓取。
标签: python web-scraping extract wikipedia