【问题标题】:pandas group by value & create new data frames?pandas 按值分组并创建新的数据框?
【发布时间】:2026-02-19 08:15:02
【问题描述】:

我有以下sample 数据框:

id      shcool_id   time_created
710     1045152     2019-07-26 15:10:26
5141    6853654     2020-10-07 11:32:30
2278    3460257     2019-11-01 17:31:11
3877    2186089     2020-02-14 14:53:43
3877    1841367     2020-02-14 14:53:43
2019    3266938     2019-11-01 12:40:35
4910    1608407     2020-09-21 15:47:40
3926    4480633     2020-02-14 16:07:04
3447    5416477     2020-01-17 13:13:36

我想按id 对这个数据框进行分组,这样我就有了几个数据框,例如:

df1=id      shcool_id   time_created
      710     1045152     2019-07-26 15:10:26

df2=id      shcool_id   time_created
     5141    6853654     2020-10-07 11:32:30

df3=id      shcool_id   time_created
     2278    3460257     2019-11-01 17:31:11

df4=id      shcool_id   time_created
     3877    2186089     2020-02-14 14:53:43
     3877    1841367     2020-02-14 14:53:43

df5=id      shcool_id   time_created
    2019    3266938     2019-11-01 12:40:35

df6=id      shcool_id   time_created
    4910    1608407     2020-09-21 15:47:40

df7=id      shcool_id   time_created
    3926    4480633     2020-02-14 16:07:04

df8=id      shcool_id   time_created
     3447    5416477     2020-01-17 13:13:36

df9=id      shcool_id   time_created
    1935    2788320     2019-10-31 14:10:46

我不知道有多少个唯一 ID,所以我想知道是否有办法解决这个问题。

抱歉,如果之前有人问过这个问题。我搜索了,但可能我没有搜索到正确的短语¯_(ツ)_/¯

非常感谢您!

【问题讨论】:

标签: python pandas dataframe group-by uniqueidentifier


【解决方案1】:

如果您希望数据帧在全球范围内可用,则必须分配给globals()

>>> for i, (_, v) in enumerate(df.groupby('id'), start=1):
...     globals()[f'df{i}'] = v

# Now all the new dfs will be available globally

>>> df1
    id  shcool_id         time_created
0  710    1045152  2019-07-26 15:10:26

但最好创建一个dict

>>> database = {f'df{i}': v for i, (_, v) in enumerate(df.groupby('id'), start=1)}

>>> database['df1']
    id  shcool_id         time_created
0  710    1045152  2019-07-26 15:10:26

如果您希望能够通过他们的索引组访问dfs:

>>> database = dict(list(df.groupby('id')))

>>> database[710]

    id  shcool_id         time_created
0  710    1045152  2019-07-26 15:10:26

【讨论】:

    【解决方案2】:

    这里的 df 是您的原始数据框。 df_list 将包含根据 id 拆分的所有数据帧的列表

    df_list = []
    uniq_ids = df.id.unique()
    for id in uniq_ids:
      new_df = df[df.id == id]  
      df_list.append(new_df)
    

    样本输出

    df_list[2]
         id    shcool_id    time_created
    2   2278    3460257     2019-11-01 17:31:11
    
    df_list[3]
         id     shcool_id   time_created
    3   3877    2186089     2020-02-14 14:53:43
    4   3877    1841367     2020-02-14 14:53:43
    

    【讨论】: