【问题标题】:Exporting to .csv from python with BeautifulSoup使用 BeautifulSoup 从 python 导出到 .csv
【发布时间】:2021-03-23 08:13:15
【问题描述】:

我对此很陌生,似乎无法正确导出。

# select document
with open('scrape1.html') as html_file:
    soup = BeautifulSoup(html_file, 'lxml')

# create/name csv
with open('speechengine_report.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(['computer', 'usagedata']) 

# tell bs4 to only look at x tags with a class of y
for licensedata in soup.find_all('div', class_='licensedata'):

    # scrape pc id
    computer = licensedata.p.b.text
    print(computer)

    # scrape usage stats for each id
    for usagedata in licensedata.find_all('td'):

        # minutes = usagedata.table.tbody
        print(usagedata.text)

    # blank line
    print()

    # writer.writerow([computer, usagedata])

    
csv_file.close()

【问题讨论】:

    标签: python beautifulsoup export-to-csv


    【解决方案1】:

    您要将数据写入 csv 文件的其余代码应位于 with 块中。此外,您不需要 csv_file.close() ,因为它会为您处理。试试下面的代码。阅读file handling in python

    with open('scrape1.html') as html_file:
        soup = BeautifulSoup(html_file, 'lxml')
    
    # create/name csv
    with open('speechengine_report.csv', 'w') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(['computer', 'usagedata']) 
        # tell bs4 to only look at x tags with a class of y
        for licensedata in soup.find_all('div', class_='licensedata'):
    
            # scrape pc id
            computer = licensedata.p.b.text
            print(computer)
    
            # scrape usage stats for each id
            for usagedata in licensedata.find_all('td'):
    
            # minutes = usagedata.table.tbody
                print(usagedata.text)
    
            # blank line
            print()
    
            # writer.writerow([computer, usagedata])
    
    

    【讨论】:

      猜你喜欢
      • 2014-03-07
      • 2017-02-20
      • 2018-02-12
      • 1970-01-01
      • 2021-06-24
      • 1970-01-01
      • 2020-09-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多