如何删除一个工作簿的多个 Excel 工作表中的重复列？答案

【问题标题】：How to delete duplicate columns in multiple Excel sheets of one workbook?如何删除一个工作簿的多个 Excel 工作表中的重复列？
【发布时间】：2019-05-15 22:15:53
【问题描述】：

我在一个 Excel 工作簿中有多个工作表，每个工作表中都有重复的列。我需要删除重复项并仅保留原始列。

我知道如何在工作表中删除重复项。

df_sheet_map['> Acute Hospital Bed SLM']
result2=df_sheet_map['> Acute Hospital Bed SLM'].T.drop_duplicates().T

dfList = []
path = 'J:/TestDup' 
newpath = 'J:/TestDup/Test2'

for fn in os.listdir(path):
    file = os.path.join(path, fn)
    if os.path.isfile(file): 
        # Import the excel file and call it xlsx_file 
        xlsx_file = pd.ExcelFile(file) 
        # View the excel files sheet names 
        xlsx_file.sheet_names 
        # Load the xlsx files Data sheet as a dataframe 
        df = xlsx_file.parse('Sheet1',header= None) 
        df_NoHeader = df[2:] 
        data = df_NoHeader 
        # Save individual dataframe
        data.to_excel(os.path.join(newpath, fn))

        dfList.append(data) 

appended_data = pd.concat(dfList)
appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))

上面的代码正在运行。但是，我需要遍历所有工作表。此外，它显示要删除前两行，我需要更改以删除重复项。

【问题讨论】：

我的尝试 [import pandas as pd df_sheet_map=pd.read_excel("H:/SLM_Final/SLM Indicator template Main to clean.xlsx",sheet_name=None) df_sheet_map['SLM By DHB'] result1 =df_sheet_map['SLM By DHB'].T.drop_duplicates().T df_sheet_map['>急性病床SLM'] result2=df_sheet_map['>急性病床SLM'].T.drop_duplicates().T 然后保存.我有超过 100 张要做同样的事情，请帮助 Mazin
嗨，我的回答很好。但是，任何改进将不胜感激。我正在寻找一个在 python 中包含删除重复项而不是使用 VB 的函数。

标签： python excel python-3.x

【解决方案1】：

#Transpose all sheets in a workbook.  then delete duplicates. then Transpose back to original file and save all sheets
#Transpose all sheets in the workbook file

    import pyexcel
    import pyexcel_xlsx as pe
    from pyexcel_xlsx import get_data

    book = pyexcel.get_book(file_name="H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
    for sheet in book:
        sheet.transpose()
        pass
    book.save_as("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")

#run excel VB from python

    import win32com.client as win32
    import time
    xl = win32.Dispatch('Excel.Application')
    xl.Visible = 0
    ss = xl.Workbooks.Open('H:/SLM_Final/DeleteDup.xlsm')
    xl.Run("deleteDuplicate") 
    time.sleep(30)
    xl.Quit() 
    time.sleep(30)

#VB syntax to add on excel workbook
'''Sub deleteDuplicate()

    Dim ws As Worksheet
    Dim wkbk1 As Workbook
    Dim w As Long
    Dim lRow As Long
    Dim iCntr As Long
    Set wkbk1 = Workbooks.Open("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
    'Set wkbk1 = ThisWorkbook

    wkbk1.Activate

    With wkbk1

        For w = 1 To .Worksheets.Count

            With Worksheets(w)

                .UsedRange.RemoveDuplicates Columns:=Array(3, 4), Header:=xlYes

            End With

        Next w

    End With
wkbk1.Save
wkbk1.Close
End Sub''''

#
#Transpose files back to the original shape

    import pyexcel
    import pyexcel_xlsx as pe
    from pyexcel_xlsx import get_data

    book = pyexcel.get_book(file_name="H:/SLM_Final/SLM Indicator template Main to clean.xlsx")
    for sheet in book:
        sheet.transpose()
        #sheet.delete_duplicates(keep=False, inplace=False)
        pass
    book.save_as("H:/SLM_Final/SLM Indicator template Main to clean.xlsx")

我希望这会有所帮助。

【讨论】：