【问题标题】:How to concat multiple spreadsheets in Excel workbooks into pandas dataframe?如何将 Excel 工作簿中的多个电子表格连接到 Pandas 数据框中?
【发布时间】:2023-12-14 06:17:01
【问题描述】:

我有多个文件夹和子文件夹,其中包含带有多个选项卡的 Excel 工作簿。如何将所有信息连接到 1 个熊猫数据框中?

到目前为止,这是我的代码:

from pathlib import Path
import os
import pandas as pd
import glob

p = Path(r'C:\Users\user1\Downloads\key_folder')

globbed_files = p.glob('**/**/*.xlsx')

df = []

for file in globbed_files:
    frame = pd.read_excel(file, sheet_name = None, ignore_index=True)
    frame['File Path'] = os.path.basename(file)
    df.append(frame)

# df = pd.concat([d.values() for d in df], axis = 0, ignore_index=True)

df = pd.concat(df, axis=0, ignore_index = True)

这会产生以下错误: cannot concatenate object of type "<class 'collections.OrderedDict'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

当我运行pd.DataFrame(df) 时,我看到每个 Excel 电子表格选项卡都是一个单独的列。单元格包含文本形式的数据和标题,形成一个非常长的字符串。

感谢任何帮助!谢谢!

【问题讨论】:

    标签: python excel pandas dataframe glob


    【解决方案1】:

    这是最终代码:

        from pathlib import Path
        import os
        import pandas as pd
        import glob
        import xlrd
    
        p = Path('path here')
    
        globbed_files = p.glob('**/**/*.xlsx')
    
        list_dfs = []
        dfs = []
    
        for file in globbed_files:
            xls = xlrd.open_workbook(file, on_demand=True)
            for sheet_name in xls.sheet_names():
                df = pd.read_excel(file,sheet_name)
                df['Sheet Name'] = sheet_name
                list_dfs.append(df)
    
        dfs = pd.concat(list_dfs,axis=0)
    
        dfs.to_excel('merged spreadsheet.xlsx')
    

    【讨论】: