根据 Pandas 中的常用列将一个数据帧拆分为多个子数据帧答案

【问题标题】：Split one dataframe to multiple sub-dataframes based on common columns in Pandas根据 Pandas 中的常用列将一个数据帧拆分为多个子数据帧
【发布时间】：2021-02-07 09:29:46
【问题描述】：

假设数据框df如下：

            ISIN  Coupon  Issue Date Maturity Date  Jan 18th  Jan 19th  \
0   CA135087A610  0.0150   7/30/2012      6/1/2023    103.13    103.14   
1   CA135087F254  0.0075  10/19/2015      3/1/2021    100.08    100.08   
2   CA135087F585  0.0075   4/11/2016      9/1/2021    100.40    100.40   
3   CA135087G328  0.0050  10/11/2016      3/1/2022    100.44    100.43   
4   CA135087H490  0.0175   10/6/2017      3/1/2023    103.29    130.29   
5   CA135087ZU15  0.0275    8/2/2011      6/1/2022    103.61    103.60   
6   CA135087J546  0.0225   10/5/2018      3/1/2024    106.21    106.21   
7   CA135087J967  0.0150    4/5/2019      9/1/2024    104.27    104.28   
8   CA135087K528  0.0125  10/11/2019      3/1/2025    103.60    103.60   
9   CA135087K940  0.0050    4/3/2020      9/1/2025    100.33    100.32   
10  CA135087L518  0.0025   10/9/2020      3/1/2026     98.78     98.77   

    Jan 20th  
0     103.10  
1     100.08  
2     100.39  
3     100.41  
4     103.25  
5     103.57  
6     106.17  
7     104.25  
8     103.59  
9     100.30  
10     98.76

我需要根据公共列 ISIN、Coupon、Issue Date、Maturity Date 将其拆分为 3 个子数据帧。

common_columns = ['ISIN', 'Coupon', 'Issue Date', 'Maturity Date']
df1.columns = common_columns + 'Jan 18th'
df2.columns = common_columns + 'Jan 19th'
df3.columns = common_columns + 'Jan 20th'

然后将它们保存到一个 Excel 文件中，其中 sheet_name 为 Jan 18th、Jan 19th 和 Jan 20th。请注意，实际数据中还有其他日期列Jan 21th、Jan 22th等。

保存数据框：

writer = pd.ExcelWriter('data.xlsx')
df1.to_excel(writer, 'Jan 18th')
df2.to_excel(writer, 'Jan 19th')
df3.to_excel(writer, 'Jan 20th')
writer.save()

我如何在 Python 中做到这一点？谢谢。

更新：

第一张表的ISIN 被2021-01-18 替换，1 月18 日被Close Price 替换，第二张表的ISIN 被2021-01-19 替换，1 月19 日被Close Price 替换，第三张表的ISIN 被2021-01-20 替换, 1 月 20 日 Close Price.

【问题讨论】：

标签： python-3.x pandas dataframe

【解决方案1】：

让我们使用index.difference 通过从数据框中删除comm_columns 来获取日期类列，然后对于每个日期类列使用DataFrame.to_excel 和pd.ExcelWriter 来创建工作表：

comm_cols = ['ISIN', 'Coupon', 'Issue Date', 'Maturity Date']
date_cols = df.columns.difference(comm_cols)

with pd.ExcelWriter('data.xlsx') as writer:
    for col in date_cols:
        sub_df = df[comm_cols + [col]]
        sub_df.to_excel(writer, sheet_name=col)
        
    writer.save()

更新：到rename 列ISIN 同时保存工作表使用：

comm_cols = ['ISIN', 'Coupon', 'Issue Date', 'Maturity Date']
isin_cols = ['2021-01-18', '2021-02-18', '2021-02-19']
date_cols = df.columns.difference(comm_cols)

with pd.ExcelWriter('data.xlsx') as writer:
    for col, isin_col in zip(date_cols, isin_cols):
        sub_df = df[comm_cols + [col]].rename(columns={'ISIN': isin_col})
        sub_df.to_excel(writer, sheet_name=col)
        
    writer.save()

【讨论】：

谢谢，在保存到 excel 之前，是否可以使用 '2021-01-18', 'Coupon', 'Issue Date', 'Maturity Date', 'Close Price' 为 df1、'2021-01-19', 'Coupon', 'Issue Date', 'Maturity Date', 'Close Price' 为 df2 等修改每个子数据框的列？
这意味着我需要将ISIN 重命名为2021-01-18, etc，并将Jan 18th 重命名为Close Price 为df1。
太好了，你能添加代码吗？
@ahbon 您能否提供要映射到 ISIN 列的新名称列表？
@ahbon 根据您的要求更新了答案。