【问题标题】:Generate dataframe based on specific condition and input dictionary - pandas根据特定条件和输入字典生成数据框 - pandas
【发布时间】:2020-12-13 22:10:28
【问题描述】:

我有一本字典,如下所示。

d1 = { 'start_date' : '2020-10-01T20:00:00.000Z',
       'end_date'  : '2020-10-05T20:00:00.000Z',
       'n_days'    : 6,
       'type'      : 'linear',
       "coef": [0.1,0.1,0.1,0.1,0.1,0.1]    
     }

从上面的字典作为函数的输入,我想生成下面的df作为输出。

预期输出:

Date                Day           function_type         function_value
2020-10-01          1             linear                (0.1*1)+0.1 = 0.2
2020-10-02          2             linear                (0.1*2)+0.1 = 0.3
2020-10-03          3             linear                (0.1*3)+0.1 = 0.4
2020-10-04          4             linear                (0.1*4)+0.1 = 0.5
2020-10-05          5             linear                (0.1*5)+0.1 = 0.6

注意:

type 可以是线性的、常数的、多项式的和指数的。

a0, a1, a2, a3, a4, a5 = d1['coef']

If constant:
funtion_value = a0

If exponential: 
funtion_value = e**(a0+a1T)

if polynomial:
funtion_value = a0+a1T+a2(T**2)+a3(T**3)+a4(T**4)+a5(T**5)

T: value of Day column

【问题讨论】:

    标签: python-3.x pandas dataframe dictionary


    【解决方案1】:

    定义一个函数funcValue,它根据字典中的type 从给定的输入字典d 和天数列T 计算函数值 列:

    def funcValue(d, T):
        a0, a1, a2, a3, a4, a5 = d['coef']
        func = {
            'constant': a0,
            'linear': a0 + a1*T,
            'polynomial': a0 + a1*T + a2*(T**2)+ a3 * (T**3) + a4*(T**4) + a5*(T**5),
            'exponential':  np.power(np.e, a0 + a1*T)
        }
    
        return func[d['type']]
    

    然后定义一个函数getDF,它根据用户定义的字典d中提供的信息生成所需的数据帧:

    def getDF(d):
        date = pd.date_range(d['start_date'], d['end_date'], freq='D').tz_localize(None).floor('D')
        days = (date - date[0]).days + 1
        return pd.DataFrame({'Date': date, 'Day': days, 'function_type': d['type'],
                             'function_value': funcValue(d, days)})
    

    结果:

    print(getDF(d1))
    
            Date  Day function_type  function_value
    0 2020-10-01    1        linear             0.2
    1 2020-10-02    2        linear             0.3
    2 2020-10-03    3        linear             0.4
    3 2020-10-04    4        linear             0.5
    4 2020-10-05    5        linear             0.6
    

    【讨论】:

      【解决方案2】:

      使用 timedelta 并生成列表列表会有所帮助:

      from datetime import timedelta
      
      d1 = { 'start_date' : '2020-10-01T20:00:00.000Z',
             'end_date'  : '2020-10-05T20:00:00.000Z',
             'n_days'    : 6,
             'type'      : 'linear',
             "coef":[0.1,0.1,0.1,0.1,0.1,0.1] 
           }
      
      def value(tp, d, cf):
          if tp == "linear":
              val = (cf*d)+cf 
          elif tp == "exp":
              val = d**cf
          elif tp == "constant":
              val = d
          elif tp == "polynomial":
              val = cf*d**2+cf*d+cf
          return val
      
      start = datetime.strptime(d1["start_date"], '%Y-%m-%dT%H:%M:%S.%f%z')
      # end = datetime.strptime(d1["end_date"], '%Y-%m-%dT%H:%M:%S.%f%z')
      end = start + timedelta(days=d1["n_days"])
      
      df = [[start + timedelta(days=i),i,d1["type"],value(d1["type"],i,d1["coef"][i])] for i in range((end-start).days+1)]
      df = pd.DataFrame(df,columns = ["Date","Day","function_type","function_value"])
      

      输出:

          Date                        Day function_type   function_value
      0   2020-10-01 20:00:00+00:00   0   linear          0.1
      1   2020-10-02 20:00:00+00:00   1   linear          0.2
      2   2020-10-03 20:00:00+00:00   2   linear          0.3
      3   2020-10-04 20:00:00+00:00   3   linear          0.4
      4   2020-10-05 20:00:00+00:00   4   linear          0.5
      

      【讨论】:

      • 总是我们必须替换 end_date = start_date + n_days
      • d1是用户输入,d1:是用户输入,用户可能输入错误的start_date、end_date和n_days
      • 我已经对这个案例进行了编辑。但如果一切都可能出错,你的参考是什么?
      • 如果 start_date = 0,则不应考虑。
      • 你想在答案中也包含这个吗?
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-06-01
      • 2021-06-23
      • 1970-01-01
      • 1970-01-01
      • 2023-01-26
      • 2016-06-28
      相关资源
      最近更新 更多