【发布时间】:2022-01-18 22:27:24
【问题描述】:
我是 Python 新手。以下问题:
-
我有一个要从中抓取数据的网址列表。我不知道我的代码有什么问题,我无法从所有 url 中检索结果。该代码仅抓取第一个 url 而不是其余的。如何成功抓取列表中所有 url 中的数据(标题、信息、描述、应用程序)?
-
如果问题 1 有效,如何将数据转换为 CSV 文件?
代码如下:
import requests
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup
import lxml
import pandas as pd
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
urlList = ["url1","url2","url3"...lists of urls.......]
for url in urlList:
try:
html = urlopen(url)
except HTTPError as e:
print(e)
except URLError:
print("error")
else:
soup = BeautifulSoup(html.read(),"html5lib")
# Scraping
def getTitle():
for title in soup.find('h2', class_="xx").text:
print(title)
def getInfo():
for info in soup.find('ul', class_="j-k-i").text:
print(info)
def getDescription():
for description in soup.find('div', class_="b-d").text:
print(description)
def getApplication():
for application in soup.find('div', class_="g-b bm-b-30").text:
print(application)
for soups in soup():
getTitle()
getInfo()
getDescription()
getApplication()
【问题讨论】:
标签: python web-scraping beautifulsoup