【问题标题】:Group Pandas data by hour of the day按一天中的小时对 Pandas 数据进行分组
【发布时间】:2018-11-29 15:25:07
【问题描述】:

我使用以下代码生成随机日期和值:

import pandas as pd
import numpy as np

time = pd.date_range('1/1/2000', periods=2000, freq='5min')

series = pd.Series(np.random.randint(100, size=2000), index=time)

输出如下所示:

2000-01-01 00:00:00    40
2000-01-01 00:05:00    13
2000-01-01 00:10:00    99
2000-01-01 00:15:00    72
2000-01-01 00:20:00     4
2000-01-01 00:25:00    36
2000-01-01 00:30:00    24
2000-01-01 00:35:00    20
2000-01-01 00:40:00    83
2000-01-01 00:45:00    44

然后我按索引小时值对这些数据进行排序,然后按平均值聚合它,如下所示:

0     50.380952
1     49.380952
2     49.904762
3     53.273810
4     47.178571
5     46.095238
6     49.047619
7     44.297619
8     53.119048
9     48.261905
10    45.166667
11    54.214286
12    50.714286
13    56.130952
14    50.916667
15    42.428571
16    46.880952
17    56.892857
18    54.071429
19    47.607143
20    50.940476
21    50.511905
22    44.550000
23    50.250000

但是,如果我想仅按索引小时值而不使用平均值对所有数据进行分组,以便获得与其索引关联的所有值,我现在应该怎么做?

提前致谢。

问候,

【问题讨论】:

    标签: python pandas pandas-groupby


    【解决方案1】:

    如果想通过hours 聚合,方法:

    np.random.seed(456)
    time = pd.date_range('1/1/2000', periods=2000, freq='5min')
    series = pd.Series(np.random.randint(100, size=2000), index=time)
    
    s = series.groupby(series.index.hour).mean()
    print (s)
    0     49.392857
    1     52.523810
    2     53.047619
    3     49.083333
    4     49.785714
    5     49.071429
    6     52.476190
    7     47.821429
    8     52.190476
    9     50.000000
    10    49.035714
    11    52.988095
    12    52.785714
    13    52.023810
    14    46.964286
    15    52.095238
    16    51.047619
    17    52.166667
    18    48.357143
    19    51.416667
    20    45.214286
    21    46.130952
    22    49.750000
    23    48.527778
    dtype: float64
    

    但如果需要 MultiIndex 按小时计算:

    series.index = [series.index.hour, series.index]
    print (series)
    0   2000-01-01 00:00:00    27
        2000-01-01 00:05:00    43
        2000-01-01 00:10:00    89
        2000-01-01 00:15:00    42
        2000-01-01 00:20:00    28
        2000-01-01 00:25:00    79
        2000-01-01 00:30:00    60
        2000-01-01 00:35:00    45
        2000-01-01 00:40:00    37
        2000-01-01 00:45:00    92
        2000-01-01 00:50:00    39
        2000-01-01 00:55:00    81
    1   2000-01-01 01:00:00    11
        2000-01-01 01:05:00    77
        2000-01-01 01:10:00    69
        2000-01-01 01:15:00    98
    

    ...

    然后可以按小时选择:

    print (series.loc[0])
    2000-01-01 00:00:00    27
    2000-01-01 00:05:00    43
    2000-01-01 00:10:00    89
    2000-01-01 00:15:00    42
    2000-01-01 00:20:00    28
    2000-01-01 00:25:00    79
    2000-01-01 00:30:00    60
    2000-01-01 00:35:00    45
    2000-01-01 00:40:00    37
    2000-01-01 00:45:00    92
    2000-01-01 00:50:00    39
    2000-01-01 00:55:00    81
    2000-01-02 00:00:00    82
    2000-01-02 00:05:00    69
    2000-01-02 00:10:00    99
    2000-01-02 00:15:00    17
    2000-01-02 00:20:00    59
    ...
    

    如果需要means 也没有变化DatetimeIndex:

    s1 = series.groupby(series.index.hour).transform('mean')
    print (s1)
    2000-01-01 00:00:00    49.392857
    2000-01-01 00:05:00    49.392857
    2000-01-01 00:10:00    49.392857
    2000-01-01 00:15:00    49.392857
    2000-01-01 00:20:00    49.392857
    2000-01-01 00:25:00    49.392857
    2000-01-01 00:30:00    49.392857
    2000-01-01 00:35:00    49.392857
    2000-01-01 00:40:00    49.392857
    2000-01-01 00:45:00    49.392857
    2000-01-01 00:50:00    49.392857
    2000-01-01 00:55:00    49.392857
    2000-01-01 01:00:00    52.523810
    2000-01-01 01:05:00    52.523810
    2000-01-01 01:10:00    52.523810
    2000-01-01 01:15:00    52.523810
    2000-01-01 01:20:00    52.523810
    2000-01-01 01:25:00    52.523810
    2000-01-01 01:30:00    52.523810
    ...
    

    编辑:

    对于每小时使用的列表:

    s = series.groupby(series.index.hour).apply(list)
    print (s)
    0     [27, 43, 89, 42, 28, 79, 60, 45, 37, 92, 39, 8...
    1     [11, 77, 69, 98, 78, 84, 34, 66, 4, 8, 85, 62,...
    2     [16, 41, 10, 72, 44, 35, 48, 51, 99, 53, 22, 3...
    3     [56, 22, 74, 85, 81, 6, 44, 44, 49, 43, 95, 11...
    4     [21, 90, 89, 76, 62, 20, 66, 50, 68, 79, 69, 4...
    5     [51, 85, 31, 58, 97, 10, 91, 25, 4, 11, 94, 28...
    6     [5, 71, 62, 57, 62, 87, 12, 41, 43, 47, 25, 15...
    7     [84, 17, 26, 32, 14, 76, 72, 35, 8, 60, 79, 27...
    8     [15, 30, 80, 53, 10, 97, 71, 83, 37, 44, 89, 1...
    9     [58, 20, 98, 77, 75, 26, 63, 26, 24, 62, 93, 6...
    10    [39, 61, 92, 43, 61, 73, 86, 64, 26, 0, 75, 11...
    11    [24, 13, 13, 54, 50, 38, 22, 46, 67, 15, 29, 4...
    12    [21, 56, 16, 63, 46, 79, 11, 85, 87, 18, 66, 9...
    13    [10, 89, 66, 80, 60, 2, 6, 19, 77, 81, 38, 48,...
    14    [17, 64, 90, 91, 71, 32, 77, 9, 76, 14, 9, 79,...
    15    [95, 75, 49, 34, 5, 31, 43, 68, 84, 48, 25, 69...
    16    [13, 68, 87, 96, 6, 83, 9, 5, 29, 93, 57, 92, ...
    17    [77, 6, 73, 41, 76, 93, 11, 50, 72, 84, 82, 53...
    18    [95, 11, 61, 56, 30, 24, 24, 9, 0, 65, 96, 82,...
    19    [31, 14, 98, 67, 7, 54, 29, 60, 77, 83, 45, 70...
    20    [4, 15, 37, 78, 79, 59, 63, 97, 14, 74, 33, 2,...
    21    [88, 69, 31, 20, 41, 10, 41, 6, 36, 27, 63, 49...
    22    [4, 90, 70, 66, 92, 46, 54, 47, 6, 54, 62, 80,...
    23    [27, 23, 21, 18, 29, 39, 77, 88, 21, 86, 7, 45...
    dtype: object
    

    【讨论】:

    • 我确实需要按索引小时对数据进行排序,但我想按索引小时列出所有值,如下所示:0 48 0 23 0 49 1 12 1 36 1 99 2 84 2 56 2 47 3 63 3 42 ... 无论如何要这样做?
    • @Yannick - 你觉得series.groupby(series.index.hour).apply(list) 吗?
    • s = series.groupby(series.index.hour).agg(list)
    • @Yannick - 如果最终需要字典 - series.groupby(series.index.hour).apply(list).to_dict()
    • 完美!这正是我一直在寻找的。非常感谢您的宝贵时间。
    猜你喜欢
    • 2017-02-26
    • 2021-07-25
    • 1970-01-01
    • 2019-11-12
    • 2020-09-21
    • 2015-06-05
    • 2017-08-31
    • 2014-04-13
    • 2022-12-09
    相关资源
    最近更新 更多