使用 Beautiful Soup 在网站中使用两个表进行 Web 抓取答案

【问题标题】：Web Scraping with two tables in a Website using Beautiful Soup使用 Beautiful Soup 在网站中使用两个表进行 Web 抓取
【发布时间】：2021-06-20 10:14:03
【问题描述】：

问：使用漂亮的汤提取包含 Tesla Quarterly Revenue 的表，并将其存储到名为 tesla_revenue 的数据框中。数据框应包含日期和收入列。确保从“收入”列中删除逗号和美元符号。

我为此使用以下代码：

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])

for row in soup.find("tbody").find_all("tr"):

   col = row.find_all("td")
   date =col[0].text
    revenue = col[1].text.replace("$", "").replace(",", "")

tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue},ignore_index=True)

该代码仅适用于第一个表“特斯拉年收入”，但不能检索“特斯拉季度收入”表

给定网址：https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue?cm_mmc=Email_Newsletter-

【问题讨论】：

标签： python web-scraping beautifulsoup html-table

【解决方案1】：

试试这个：

revenue = soup.find_all("table", attrs={"class": "historical_data_table table"})

table = revenue[1]
body = table.find_all("tr")
head = body[0]
body_rows = body[1:]

all_rows = []
for row_num in range(len(body_rows)):
    row = []
    for row_item in body_rows[row_num].find_all("td"):
        clean = row_item.text.replace("$", "").replace(",", "")
        
        row.append(clean)
    all_rows.append(row)

tesla_revenue = pd.DataFrame(data=all_rows, columns=["Date", "Revenue"])
tesla_revenue

【讨论】：