将 pandas 数据框导出到 xlsx：处理 python 3.9 上的 openpyxl 问题答案

【问题标题】：Export pandas dataframe to xlsx: dealing with the openpyxl issue on python 3.9将 pandas 数据框导出到 xlsx：处理 python 3.9 上的 openpyxl 问题
【发布时间】：2021-03-12 20:59:10
【问题描述】：

使用最新的包版本：openpyxl: 3.0.6 |熊猫：1.2.3 |python：3.9

在将上面的软件包更新到报告的最新版本之前，下面的功能运行良好。

现在它引发错误：“zipfile.BadZipFile: File is not a zip file”。

这样的功能真的很有用，如果知道它是否可以修复以使其正常工作会很高兴。

下面的函数可以直接运行，只需将“pathExport”替换为你的导出目录进行测试。

    def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                           truncate_sheet=False, 
                           **to_excel_kwargs):
        """
        Append a DataFrame [df] to existing Excel file [filename]
        into [sheet_name] Sheet.
        If [filename] doesn't exist, then this function will create it.
    
        Parameters:
          filename : File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
          df : dataframe to save to workbook
          sheet_name : Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
          startrow : upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
          truncate_sheet : truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
          to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                            [can be dictionary]
    
        Returns: None
    
        (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
        """
        from openpyxl import load_workbook
    
        # ignore [engine] parameter if it was passed
        if 'engine' in to_excel_kwargs:
            to_excel_kwargs.pop('engine')
    
        writer = pd.ExcelWriter(filename, engine='openpyxl')
    
        # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
        try:
            FileNotFoundError
        except NameError:
            FileNotFoundError = IOError
    
    
        try:
            # try to open an existing workbook
            writer.book = load_workbook(filename)
            
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
    
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
            
            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        except FileNotFoundError:
            # file does not exist yet, we will create it
            pass
    
        if startrow is None:
            startrow = 0
    
        # write out the new sheet
        df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
    
        # save the workbook
        writer.save()

pathExport = r"F:\PYTHON\NB-Suite_python39\MNE\outputData\df.xlsx"
df1 = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue']
                    })

append_df_to_excel(pathExport, df1, sheet_name="DF1", index=False, startcol=0, startrow=0)

【问题讨论】：

这看起来像 #66471466 的副本，我认为这是因为 Pandas 用空文件覆盖了现有工作簿。处理后只需为文件使用不同的名称即可。
@CharlieClark 请问我应该在哪里更改函数中的名称？所以我可以试一试，如果可行，请让用户在这里，提前感谢
而不仅仅是 filename 使用 source_filename 与 openpyxl 和 target_filename 与 Pandas。请注意，两者肯定是不同的。
@CharlieClark 谢谢。我尽了最大努力，但我不确定我应该在哪里进行这些更改。你能用已经修复的代码回复吗？所以我可以对其进行测试并确认它是否有效，我相信它会对所有其他将在这里阅读的用户有用，谢谢
快速更新：刚刚注意到“leeprevost”在帖子#20219254 上提出了完全相同的问题，就在@MaxU 的回答下，但仍然没有得到回答

标签： pandas openpyxl xlsx python-3.9

【解决方案1】：

好的，我能够复制问题。它与pandas 相关。在 pandas 1.1.5 之前一切正常在 pandas 1.2.0 中，他们做了some changes

在你实例化pd.ExcelWriter的时候

writer = pd.ExcelWriter(filename, engine='openpyxl')`

它会创建大小为 0 字节的空文件并覆盖现有文件，然后在尝试加载它时会出错。它与openpyxl 无关，因为使用最新版本的openpyxl 它适用于pandas 1.1.5。

解决办法——指定mode='a'，把上面那行改成

writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

或者 - 查看 this 或 this 解决方案，它会在实例化 pd.ExcelWriter 之前加载文件。

编辑：我在 cmets 中被告知，如果文件不存在，mode='a' 会引发 FileNotFoundError。虽然在这种情况下它不会创建文件是出乎意料的，但解决方案是在现有的 try 块内创建 writer 并在 except 部分中创建模式为 w 的 writer：

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                           truncate_sheet=False, 
                           **to_excel_kwargs):
        """
        Append a DataFrame [df] to existing Excel file [filename]
        into [sheet_name] Sheet.
        If [filename] doesn't exist, then this function will create it.
     
        Parameters:
          filename : File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
          df : dataframe to save to workbook
          sheet_name : Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
          startrow : upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
          truncate_sheet : truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
          to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                            [can be dictionary]
     
        Returns: None
     
        (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
        """
        from openpyxl import load_workbook
     
        # ignore [engine] parameter if it was passed
        if 'engine' in to_excel_kwargs:
            to_excel_kwargs.pop('engine')
     
        
     
        # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
        try:
            FileNotFoundError
        except NameError:
            FileNotFoundError = IOError
     
     
        try:
            writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
            # try to open an existing workbook
            writer.book = load_workbook(filename)
             
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
     
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
             
            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        except FileNotFoundError:
            # file does not exist yet, we will create it
            writer = pd.ExcelWriter(filename, engine='openpyxl')
     
        if startrow is None:
            startrow = 0
     
        # write out the new sheet
        df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
     
        # save the workbook
        writer.save()

【讨论】：

感谢@buran。您的解决方案仅在您已经有 df.xlsx 文件的情况下才有效，但如果还没有文件，则会引发错误，因为它无法附加到不存在的文件。但是，您的想法帮助我想到了解决该问题的解决方法，我将其集成到函数中，如下所示： try: f = open(filename) # Do something with the file except IOError: # print("File not accessible") wb = Workbook() ws = wb.active ws.title = sheet_name wb.save(filename) 非常感谢
@Baggio，在这种情况下，只需将行移动到现有 try 块内，并在 except 块中创建一个带有 mode='w' 的 writer（即默认值）。我相应地编辑了我的答案。

【解决方案2】：

解决方法如下：

import pandas as pd

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
    truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file

      resizeColumns: default = True . It resize all columns based on cell content width
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]
      na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''


    Returns: None

    *******************

    CONTRIBUTION:
    Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
    Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
    Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)


    Features of the new helper function:
    1) Now it works with python 3.9 and latest versions of pandas and openpxl
    ---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
    2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
    3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
    4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

    *******************



    """
    from openpyxl import load_workbook
    from string import ascii_uppercase
    from openpyxl.utils import get_column_letter
    from openpyxl import Workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    try:
        f = open(filename)
        # Do something with the file
    except IOError:
        # print("File not accessible")
        wb = Workbook()
        ws = wb.active
        ws.title = sheet_name
        wb.save(filename)

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        # startrow = -1
        startrow = 0

    if startcol is None:
        startcol = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)


    if resizeColumns:

        ws = writer.book[sheet_name]

        def auto_format_cell_width(ws):
            for letter in range(1,ws.max_column):
                maximum_value = 0
                for cell in ws[get_column_letter(letter)]:
                    val_to_check = len(str(cell.value))
                    if val_to_check > maximum_value:
                        maximum_value = val_to_check
                ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2

        auto_format_cell_width(ws)

    # save the workbook
    writer.save()

示例用法：

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue'],
                    'NaNcheck': [float('NaN'), 1, float('NaN')],
                    })

# EDIT YOUR PATH FOR THE EXPORT 
filename = r"C:\DataScience\df.xlsx" 

# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCEL FILE
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

【讨论】：