【问题标题】:BeautifulSoup: Merge tables and export to .csvBeautifulSoup:合并表并导出到 .csv
【发布时间】:2020-09-05 19:07:00
【问题描述】:

我一直在尝试从不同的 url 下载数据,然后将其保存到 csv 文件中。

这个想法是从以下位置提取年度/季度数据: https://www.marketwatch.com/investing/stock/MMM/financials/

年度:

https://www.marketwatch.com/investing/stock/MMM/financials/cash-flow

季度

https://www.marketwatch.com/investing/stock/MMM/financials/cash-flow/quarter

使用以下代码:

 import requests
 import pandas as pd
    
    urls = ['https://www.marketwatch.com/investing/stock/AAPL/financials/cash-flow',
            'https://www.marketwatch.com/investing/stock/MMM/financials/cash-flow']
    
    
    def main(urls):
        with requests.Session() as req:
            goal = []
            for url in urls:
                r = req.get(url)
                df = pd.read_html(
                    r.content, match="Cash Dividends Paid - Total")[0].iloc[[0], 0:3]
                goal.append(df)
            new = pd.concat(goal)
            print(new)
    
    
    main(urls)

输出:

我可以提取所需的信息(在 2015 年和 2016 的示例中,2 家公司)但仅适用于 1 组(季度或每年)

我想合并表格Annual + Quarter

为此,我在这段代码中想到了:

import requests
import pandas as pd
from urllib.request import urlopen
from bs4 import BeautifulSoup
import csv

html = urlopen('https://www.marketwatch.com/investing/stock/MMM/financials/')
soup = BeautifulSoup(html, 'html.parser')

ids = ['cash-flow','cash-flow/quarter']


with open("news.csv", "w", newline="", encoding='utf-8') as f_news:
    csv_news = csv.writer(f_news)
    csv_news.writerow(["A"])

    for id in ids:
      a = soup.find("Cash Dividends Paid - Total", id=id)
      csv_news.writerow([a.text])

但是在得到以下错误:

【问题讨论】:

    标签: python python-3.x web-scraping beautifulsoup export-to-csv


    【解决方案1】:

    BeautifulSoup 元素没有属性text,而是方法get_text()

      csv_news.writerow([a.get_text()])
    

    https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text

    【讨论】:

      【解决方案2】:

      这意味着您的soup.find() 没有找到您想要的元素。 aNone

      为什么需要id?我在 5 月 19 日查看了年度页面。不用id

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-03-03
        • 2017-02-20
        • 1970-01-01
        • 2018-08-31
        • 2018-09-11
        • 2017-03-18
        • 2021-11-17
        • 2021-03-05
        相关资源
        最近更新 更多