pandas ExcelFile 是否在初始化时解析所有工作表？（并且可以避免）答案

【问题标题】：Does pandas ExcelFile parse all sheets on initialization? (and can it be avoided)pandas ExcelFile 是否在初始化时解析所有工作表？（并且可以避免）
【发布时间】：2020-06-04 08:31:24
【问题描述】：

我有一系列大型（且格式不正确）的 Excel 电子表格，我正在尝试使用 pandas 进行处理。每个 excel 文件包含 50-60 张工作表，我只对每个文件中的工作表子集感兴趣。

我尝试将整个电子表格作为pd.ExcelFile 对象读取，因此我可以使用sheet_names 属性来解析特定工作表（而且我不知道每个工作表的名称提前）。这有效 - 但似乎异常缓慢（每个 ~30mb excel 文件接近一分钟）。

我只能假设这是因为每个工作表都被解析为 pd.ExcelFile 对象正在初始化（...可能是错误的？）。如果是这样，有没有办法防止这种行为？ - 我真的只想获取工作表名称，然后从那里解析特定的工作表。

提前致谢！

【问题讨论】：

也许这个话题可以帮到你：stackoverflow.com/questions/12250024/…

标签： python excel pandas

【解决方案1】：

Excel 工作表通常具有大量格式，在打开 Excel 文件时需要使用和解释所有这些格式。你能解析出你需要的特定表格吗？你事先知道这些吗？如果是这样，您可以将多个 Excel 文件（每个都有多个工作表）拆分为单独的文件，并仅关注这些对象。试试下面的代码，看看它是否能帮助你到达你需要的地方。

import os
import xlrd
from xlutils.copy import copy
import xlwt

path = 'C:\\path_to_Multiple_Excel_Files\\'
targetdir = ('C:\\path_to_out_files\\') #where you want your new files

if not os.path.exists(targetdir): #makes your new directory
    os.makedirs(targetdir)

for root,dir,files in os.walk(path, topdown=False): #all the files you want to split
    xlsfiles=[f for f in files] #can add selection condition here

for f in xlsfiles:
    wb = xlrd.open_workbook(os.path.join(root, f), on_demand=True)
    for sheet in wb.sheets(): #cycles through each sheet in each workbook
        newwb = copy(wb) #makes a temp copy of that book
        newwb._Workbook__worksheets = [ worksheet for worksheet in newwb._Workbook__worksheets if worksheet.name == sheet.name ]
        #brute force, but strips away all other sheets apart from the sheet being looked at
        newwb.save(targetdir + f.strip(".xls") + sheet.name + ".xls") 
        #saves each sheet as the original file name plus the sheet name

【讨论】：

【解决方案2】：

据我所知，pandas 使用 xlrd 或类似的引擎来打开和解析 excel 文件。 xlrd 是default 引擎。当您使用 xlrd 打开一个 excel 文件时，它默认加载所有工作表。因此，熊猫大概也是如此。使用 xlrd 打开 excel 文件可能会更好，将 on_demand kwarg 设置为 True，然后将 defining the df after pulling in data using xlrd.

【讨论】：

谢谢 - 根据上面的@aleksandr-iurkin 评论，请在此处查看相关问题：stackoverflow.com/questions/12250024/…