【问题标题】:How to sort column names, that are string formatted dates, by datetime order如何按日期时间顺序对作为字符串格式的日期的列名进行排序
【发布时间】:2020-10-05 16:38:53
【问题描述】:

我有一个数据框,其中有一个名为 Month 的日期时间列和另外两个列:

data = [['Canada',10, '2020-09-01'], ['Canada',20, '2020-10-01'], ['Canada',30, '2020-12-01'], ['Canada',40, '2021-01-01'],
            ['Europe',30, '2020-09-01'], ['Europe',20, '2020-10-01'], ['Europe',10, '2020-12-01'], ['Europe',40, '2021-01-01'],
            ['US',40, '2020-09-01'], ['US',10, '2020-10-01'], ['US',20, '2020-12-01'], ['US',30, '2021-01-01']]

df = pd.DataFrame(data,columns=['Region','sales', 'Month'])

接下来,我将“月份”列转换为具有特定格式的字符串:

df['Month'] = df['Month'].dt.strftime('%b-%Y')

现在,我旋转数据框并导出到 excel:

df['Month'] = pd.pivot_table(df['Month'], values = 'sales', index=["Region"], columns = "Month").reset_index() 

df.to_excel(writer, sheet_name='sales', index=False, startrow=4, header=False)

由于“月份”列是一个字符串,当我将数据框写入 excel 时,日期按字母顺序排序。我希望日期按日期时间值排序。

我尝试在旋转之前将“月份”列转换为日期时间,但在这种情况下,导出到 excel 后我没有得到正确格式的日期:

df['Month'] = pd.to_datetime(df['Month'], format='%b-%Y')

我什至尝试使用 ExcelWriter 格式,但似乎效果不佳。

我得到的输出:

期望的输出:

df['Month'] = pd.to_datetime(df['Month'])
df = pd.pivot_table(df, values = 'sales', index=["Region"], columns = "Month").reset_index()
df = df.append(pd.Series(df.sum(),name='System'))\
    .assign(Total=df.sum(1))

# extract the datetime component of the multilevel column names
dates = [v for v in df.columns[1:]]

# reformat dates to the desired string format
dates_str = [v.strftime('%b-%Y') for v in dates]

# create a dict
updates = dict(zip(dates, dates_str))

# rename the columns, which will stay in the current, correct order
df = df.rename(columns=updates, inplace=True)

df.to_excel(writer, sheet_name='sales', index=False, startrow=4, header=False)

【问题讨论】:

    标签: python pandas sorting


    【解决方案1】:
    • 这假定'Month' 列日期是来自pd.to_datetimedatetime dtype
      • df.Month = pd.to_datetime(df.Month),在旋转之前。
    • 在将'Months' 转换为字符串之前先旋转数据框,然后重命名列。
      • 使用.rename,因为通过列表切片和赋值来分配列名会导致TypeError(例如df.columns[:1] = 3
    import pandas as pd
    
    # sample dataframe
    data = {'Region': ['Canada', 'Canada', 'Canada', 'Canada', 'Europe', 'Europe', 'Europe', 'Europe', 'US', 'US', 'US', 'US'],
            'sales': [10, 20, 30, 40, 30, 20, 10, 40, 40, 10, 20, 30],
            'Month': [pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00'), pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00'), pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00')]}
    df = pd.DataFrame(data)
    
    # pivot the dataframe before changing Month to a string
    dfp = pd.pivot_table(df, values='sales', index=["Region"], columns="Month").reset_index() 
    
    # add Total column
    dfp = dfp.append(pd.Series(dfp.sum(), name='System')).assign(Total=dfp.sum(1))
    
    # extract the datetime column names
    dates = [v for v in dfp.columns[1:-1]]
    
    # reformat dates to the desired string format
    dates_str = [v.strftime('%b-%Y') for v in dates]
    
    # create a dict
    updates = dict(zip(dates, dates_str))
    
    # rename the columns, which will stay in the current, correct order
    dfp.rename(columns=updates, inplace=True)
    
    # display(dfp)
    Month           Region  Sep-2020  Oct-2020  Dec-2020  Jan-2021  Total
    0               Canada        10        20        30        40  100.0
    1               Europe        30        20        10        40  100.0
    2                   US        40        10        20        30  100.0
    System  CanadaEuropeUS        80        50        60       110    NaN
    
    # save to Excel, without the Total column; remove .iloc[:, :-1] to keep the Total column
    dfp.iloc[:, :-1].to_excel('test.xlsx', sheet_name='sales', index=False, header=True)
    

    Excel 视图

    【讨论】:

    • 我的错!我应该提到 Totals 列。它现在完美运行。谢谢你这么有帮助!! :)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-11-11
    • 1970-01-01
    • 2018-07-07
    • 2018-04-27
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多