【发布时间】:2021-12-06 23:40:33
【问题描述】:
我正在尝试从包含多个静态网页的表格中获取值。这里是韩语动词的动词变位数据:https://koreanverb.app/
我的 Python 脚本使用 Beautiful Soup。目标是从多个 URL 输入中获取所有结合,并将数据输出到 CSV 文件。
共轭存储在“table-responsive”类的表中的页面上以及“conjugation-row”类的表行下。每页上有多个“共轭行”表行。我的脚本是有人只用“conjugation-row”类抓取第一个表格行。
为什么 for 循环不抓取所有具有类“conjugation-row”的 td 元素?我会很感激一个解决方案,它可以用类“conjugation-row”抓住所有的 tr。我尝试使用job_elements = results.find("tr", class_="conjugation-row"),但出现以下错误:
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
此外,当我获取数据并输出到 CSV 文件时,数据按预期位于单独的行中,但会留下空白。,它将第二个 URL 的数据行放在索引中的所有数据行之后第一个网址。在此处查看示例输出:
在此处查看代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
# create csv file
outfile = open("scrape.csv","w",newline='')
writer = csv.writer(outfile)
## define first URL to grab conjugation names
url1 = 'https://koreanverb.app/?search=%ED%95%98%EB%8B%A4'
# define dataframe columns
df = pd.DataFrame(columns=['conjugation name'])
# get URL content
response = requests.get(url1)
soup = BeautifulSoup(response.content, 'html.parser')
# get table with all verb conjugations
results = soup.find("div", class_="table-responsive")
##### GET CONJUGATIONS AND APPEND TO CSV
# define URLs
urls = ['https://koreanverb.app/?search=%ED%95%98%EB%8B%A4',
'https://koreanverb.app/?search=%EB%A8%B9%EB%8B%A4',
'https://koreanverb.app/?search=%EB%A7%88%EC%8B%9C%EB%8B%A4']
# loop to get data
for url in urls:
response = requests.get(url)
soup2 = BeautifulSoup(response.content, 'html.parser')
# get table with all verb conjugations
results2 = soup2.find("div", class_="table-responsive")
# get dictionary form of verb/adjective
verb_results = soup2.find('dl', class_='dl-horizontal')
verb_title = verb_results.find('dd')
verb_title_text = verb_title.text
job_elements = results2.find_all("tr", class_="conjugation-row")
for job_element in job_elements:
conjugation_name = job_element.find("td", class_="conjugation-name")
conjugation_korean = conjugation_name.find_next_sibling("td")
conjugation_name_text = conjugation_name.text
conjugation_korean_text = conjugation_korean.text
data_column = pd.DataFrame({ 'conjugation name': [conjugation_name_text],
verb_title_text: [conjugation_korean_text],
})
#data_column = pd.DataFrame({verb_title_text: [conjugation_korean_text]})
df = df.append(data_column, ignore_index = True)
# save to csv
df.to_csv('scrape.csv')
outfile.close()
print('Verb Conjugations Collected and Appended to CSV, one per column')
【问题讨论】:
-
您可以使用
find_all()代替返回list的find(),然后您可以编写一个for循环来迭代并获取数据。
标签: python csv beautifulsoup