【问题标题】:Wan to get group by date and status from the dataWan从数据中按日期和状态获取分组
【发布时间】:2022-01-17 18:14:21
【问题描述】:

我有以下数据

data=[
{"name":"aa"..."created_at":"2022-01-17 07:38:26.403Z","status":"success"},
{"name":"bb"..."created_at":"2021-12-1 07:38:26.403Z","status":"failed"},
{"name":"kk"..."created_at":"2022-01-13 07:38:26.403Z","status":"success"},

{"name":"ll"..."created_at":"2021-12-17 07:38:26.403Z","status":"success"},

]

我想要一份类似的报告

created_at,success

December,1
Jan,2    

我在下面尝试过

 d = defaultdict(int)
 for i in kk:
     if(i["status"] == 'success')
     d[i["status"]]+=1

它只给出成功计数,但不能按月计算。

【问题讨论】:

    标签: python


    【解决方案1】:

    您可以使用pd.to_datetime 将您的“created_at”列转换为日期时间,过滤您的“成功”行,然后每月使用groupby 聚合count

    df['created_at'] = pd.to_datetime(df['created_at'])
    out = df.loc[df.status.eq('success')].groupby(df.created_at.dt.month).status.count()
    

    打印:

       created_at  success
    0           1        2
    1          12        1
    
    # as a dict
    
    >>> out.set_index('created_at').to_dict()
    {'success': {1: 2, 12: 1}}
    

    然后您可以将月份数字转换为月份名称,如图所示here

    【讨论】:

    • @Rajarshi DAs 我相信接受的答案使用pandas 库进行分析。因此,请在将来适当地标记您的问题以避免任何混淆。目前它仅在python 下标记。
    【解决方案2】:

    您必须先使用datetime.strptime 解析日期,然后使用itetools.groupby 根据月份进行分组。

    >>> from datetime import datetime
    >>> import calendar
    >>>
    >>> from itertools import groupby
    >>>
    >>> data = [
    ...     {"name": "aa", "created_at": "2022-01-17 07:38:26.403Z", "status": "success"},
    ...     {"name": "bb", "created_at": "2021-12-1 07:38:26.403Z", "status": "failed"},
    ...     {"name": "kk", "created_at": "2022-01-13 07:38:26.403Z", "status": "success"},
    ...     {"name": "ll", "created_at": "2021-12-17 07:38:26.403Z", "status": "success"},
    ... ]
    >>>
    >>>
    >>> def custom_key_fun(row):
    ...     created_at_as_datetime = datetime.strptime(
    ...         row["created_at"], "%Y-%m-%d %H:%M:%S.%fZ"
    ...     )
    ...     month = calendar.month_abbr[created_at_as_datetime.month]
    ...     return month
    ...
    >>>
    >>> result = {
    ...     key: sum(1 for row in group if row["status"] == "success")
    ...     for key, group in groupby(sorted(data, key=custom_key_fun), key=custom_key_fun)
    ... }
    >>>
    >>> print(result)
    {'Dec': 1, 'Jan': 2}
    

    【讨论】:

      【解决方案3】:

      加载熊猫

      import pandas as pd
      from datetime import datetime
      
      data = [{"name": "aa", "created_at": "2022-01-17 07:38:26.403Z", "status": "success"},
              {"name": "bb", "created_at": "2021-12-1 07:38:26.403Z", "status": "failed"},
              {"name": "kk", "created_at": "2022-01-13 07:38:26.403Z", "status": "success"},
              {"name": "ll", "created_at": "2021-12-17 07:38:26.403Z", "status": "success"},
              ]
      df = pd.DataFrame(data)
      df
          name    created_at  status
      0   aa  2022-01-17 07:38:26.403Z    success
      1   bb  2021-12-1 07:38:26.403Z     failed
      2   kk  2022-01-13 07:38:26.403Z    success
      3   ll  2021-12-17 07:38:26.403Z    success
      

      构建新列

      def date_convert(x):
          x = x.split('.')[0]
          return datetime.strptime(x, "%Y-%m-%d %H:%M:%S").strftime("%B")
      
      
      df['month_name'] = df.created_at.apply(date_convert)
      df
          name    created_at  status  month_name
      0   aa  2022-01-17 07:38:26.403Z    success     January
      1   bb  2021-12-1 07:38:26.403Z     failed  December
      2   kk  2022-01-13 07:38:26.403Z    success     January
      3   ll  2021-12-17 07:38:26.403Z    success     December
      

      计算并构建新的 DataFrame

      group_month = df[df.status == "success"].groupby('month_name').groups
      data = [{"created_at": i, "success": len(j)} for i, j in group_month.items()]
      new_df = pd.DataFrame(data)
      new_df
          created_at  success
      0   December    1
      1   January     2
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-08-25
        • 2021-03-09
        相关资源
        最近更新 更多