【问题标题】:Get the average date from multiple dates - pandas从多个日期获取平均日期 - 熊猫
【发布时间】:2019-01-31 02:13:37
【问题描述】:

DataFrame,其中 Date 是日期时间:

   Column   |       Date             
:-----------|----------------------:
    A       |   2018-08-05 17:06:01 
    A       |   2018-08-05 17:06:02 
    A       |   2018-08-05 17:06:03 
    B       |   2018-08-05 17:06:07 
    B       |   2018-08-05 17:06:09 
    B       |   2018-08-05 17:06:11 

返回表是;

   Column   |       Date            
:-----------|----------------------:
    A       |   2018-08-05 17:06:02 
    B       |   2018-08-05 17:06:09 

【问题讨论】:

  • 我对 idxmin 和 idexmax 做了同样的事情来获取最大值和最小值。但想不出一种平均方法。

标签: python pandas timestamp average


【解决方案1】:

以你为例。

您的数据:

df = pd.DataFrame(data=[['A', '2018-08-05 17:06:01'],
                   ['A', '2018-08-05 17:06:02'],
                   ['A', '2018-08-05 17:06:03'],
                   ['B', '2018-08-05 17:06:07'],
                   ['B', '2018-08-05 17:06:09'],
                   ['B', '2018-08-05 17:06:11']],
            columns = ['column', 'date'])

解决方案:

df.date = pd.to_datetime(df.date).values.astype(np.int64)

df = pd.DataFrame(pd.to_datetime(df.groupby('column').mean().date))

输出:

                      date
column                    
A      2018-08-05 17:06:02
B      2018-08-05 17:06:09

希望对你有所帮助。

【讨论】:

    【解决方案2】:

    准备一个示例数据框:

    # Initiate dataframe
    date_var = "date"
    df = pd.DataFrame(data=[['A', '2018-08-05 17:06:01'],
                            ['A', '2018-08-05 17:06:02'],
                            ['A', '2018-08-05 17:06:03'],
                            ['B', '2018-08-05 17:06:07'],
                            ['B', '2018-08-05 17:06:09'],
                            ['B', '2018-08-05 17:06:11']],
                      columns=['column', date_var])
    
    # Convert date-column to proper pandas Datetime-values/pd.Timestamps
    df[date_var] = pd.to_datetime(df[date_var])
    

    提取所需的平均时间戳值:

    # Extract the numeric value associated to each timestamp (epoch time)
    # NOTE: this is being accomplished via accessing the .value - attribute of each Timestamp in the column
    In:
    [tsp.value for tsp in df[date_var]]
    Out:
    [
        1533488761000000000, 1533488762000000000, 1533488763000000000,
        1533488767000000000, 1533488769000000000, 1533488771000000000
    ]
    
    # Use this to calculate the mean, then convert the result back to a timestamp
    In:
    pd.Timestamp(np.nanmean([tsp.value for tsp in df[date_var]]))
    Out:
    Timestamp('2018-08-05 17:06:05.500000')
    

    【讨论】:

    • 我正在寻找一种在 groupby 聚合中包含 DateTime 列的方法。您的解决方案是解决 pandas 最初将它们排除在外的事实。
    猜你喜欢
    • 1970-01-01
    • 2021-09-25
    • 1970-01-01
    • 1970-01-01
    • 2023-03-05
    • 2018-08-29
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多