使用 Python 将 HTML 表保存到 excel 中的问题答案

【问题标题】：Problem saving HTML table into excel using Python使用 Python 将 HTML 表保存到 excel 中的问题
【发布时间】：2021-05-18 00:25:37
【问题描述】：

这是我第一次使用 Python，我正在尝试 scraping 方法并将网上可用的代码放在一起，目前我一直坚持将输出保存到 Excel 文件中。

好的，首先我需要从 Outlook 中读取一封电子邮件并获取其中的数据。但它是表格格式，这意味着创建者将 Excel 中的数据复制粘贴为表格，因此我发现最好的方法是将其转换为 HTML 文件。

import win32com.client
import xlsxwriter
import pandas as pd
import requests
from bs4 import BeautifulSoup

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
'''message = messages.GetLast()
body_content = message.Body
subject = message.Subject
categories = message.Categories
print(body_content)
print(subject)
print(categories)'''
string = "Monthly PPM Report"
for message in messages:
    if string in message.Subject:
        print(message.HTMLBody)
        Html_file= open("filename.html","w", encoding="utf-8")
        Html_file.write(message.HTMLBody)
        Html_file.close()

因此，使用上面的代码，我设法将电子邮件保存为 HTML 文件。下一步是找到以 div 类为目标的表。

rfile  = open('filename.html')
rsoup  = BeautifulSoup(rfile)
nodes1  = rsoup.find('div',{'class':'MsoNormalTable'})

当我尝试打印时，我设法得到了我需要的表格，但是当我尝试使用 nodes1.to_excel('test.xlsx') 将其保存为 Excel 文件时，我收到了这个错误。

nodes1.to_excel('test.xlsx') AttributeError: 'NoneType' object has no 属性“to_excel”

关于我错过了什么步骤有什么建议吗？

【问题讨论】：

标签： python excel pandas

【解决方案1】：

要使用 pandas to_excel() 方法，您首先需要一个 pandas DataFrame

假设 nodes1 是一个字典对象：

data_frame = pd.DataFrame(data=nodes1)
data_frame.to_excel('label_name')

【讨论】：

谢谢，问题是nodes1 不是字典，所以我需要在添加之前进行转换。

【解决方案2】：

您可以使用 pandas 函数 read_html 读取表格：

import pandas as pd
rfile  = open('filename.html')
html = rfile.read()

# all tables in document
tab_list = pd.read_html(html)
# tables with header
tab_list = pd.read_html(html, header=0)
# table with attributes
tab_list = pd.read_html(html, attrs={'class':'xxx', 'id':'xxx', 'align':"center", 'cellspacing':"1", 'cellpadding':"4", 'border':"0"})

# your nodes1 from BeautifulSoup
tab_list = pd.read_html(str(nodes1))

# save first table
tab_list[0].to_excel('test.xlsx')

【讨论】：