【问题标题】:Split up rows in DataFrame (hourly) based on another column根据另一列拆分 DataFrame 中的行(每小时)
【发布时间】:2020-11-06 05:49:21
【问题描述】:

我有一个由电动汽车充电会话组成的 DataFrame,其方案和内容如下:

PlugInTime, PlugOutTime, DeliveredEnergy, ChargeDuration                    

 - 2019-01-02 05:53:00, 2019-01-02 14:38:00, 36.0, 05:57:00
 - 2019-01-02 07:53:00, 2019-01-02 16:38:00, 8.0, 03:12:00
 - ...

我想根据 ChargeDuration 列拆分各个行,以便例如将第一行“分解”为:

 - 2019-01-02 05:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 06:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 07:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 08:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 09:53:00, 2019-01-02 14:38:00, 6.05, 05:57:00
 - 2019-01-02 10:53:00, 2019-01-02 14:38:00, 5.75, 05:57:00

如您所见,我希望 ChargeDuration 确定 PlugInTime 被拆分成多少块,并且我希望 DeliveredEnergy 通过线性插值相应地拆分,以便总和等于原始值。

之后,我想每小时对 DataFrame 重新采样一次,以便确定所有记录的充电会话的每小时能源需求。

在这篇文章之前的研究中,我遇到了许多关于在 pandas 中重新采样的线程,但他们缺少基于另一列(此处为 ChargeDuration)的拆分步骤。

非常感谢任何帮助。

【问题讨论】:

  • 我认为您应该在第一行的预期输出中获得更多行,对吧?
  • 是的,我希望将原来的第一行替换为我描述的 6 个新行。当然,我也希望所有剩余的行都这样。
  • 不,我是说第一行应该只有 7 个吧?你只提到了 6 个?
  • 哦,现在我明白你的意思了。是的,我的错。我会纠正它。 :)
  • 现在我想一想:这是 5 小时 57 分钟,因此总共开始了第 6 小时(但未完成),因此应该是 6 个间隔,在其中观察相应的能量负载(前 5 个间隔为 6.05,其余为开始的第 6 小时)。或者我错过了什么。 :)

标签: python pandas dataframe datetime


【解决方案1】:

它不是很漂亮,但它可以完成工作。很高兴看看这个问题是否有更优雅的解决方案。

import pandas as pd

# create df
df = pd.DataFrame({'PlugInTime': ['2019-01-02 05:53:00', '2019-01-02 07:53:00'], 'PlugOutTime': ['2019-01-02 14:38:00', '2019-01-02 16:38:00'], 'DeliveredEnergy': [36.0, 8.0], 'ChargeDuration': ['05:57:00', '03:12:00'] })

# set dtypes
df['PlugInTime'] = pd.to_datetime(df.PlugInTime)
df['PlugOutTime'] = pd.to_datetime(df.PlugOutTime)
df['ChargeDuration'] = pd.to_timedelta(df.ChargeDuration)

# determine charge time in hours
df['hours'] = df.ChargeDuration.dt.seconds / 3600

# split the hours into a list
df['split'] = df.ChargeDuration.apply(lambda x: [1 for _ in range(int(x.total_seconds() // 3600))] + [(x.total_seconds() % 3600 / 3600 or None)])

# explode the list
df = df.explode('split')

# calculate the energy per hour
df['DeliveredEnergy'] = df.DeliveredEnergy / df.hours * df.split

# update PlugInTime
df['PlugInTime'] = df.PlugInTime + pd.to_timedelta(df.PlugInTime.groupby(level=0).cumcount(), unit='h')

print(df)

           PlugInTime         PlugOutTime DeliveredEnergy ChargeDuration  hours split
0 2019-01-02 05:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 06:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 07:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 08:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 09:53:00 2019-01-02 14:38:00         6.05042       05:57:00   5.95     1
0 2019-01-02 10:53:00 2019-01-02 14:38:00          5.7479       05:57:00   5.95  0.95
1 2019-01-02 07:53:00 2019-01-02 16:38:00             2.5       03:12:00   3.20     1
1 2019-01-02 08:53:00 2019-01-02 16:38:00             2.5       03:12:00   3.20     1
1 2019-01-02 09:53:00 2019-01-02 16:38:00             2.5       03:12:00   3.20     1
1 2019-01-02 10:53:00 2019-01-02 16:38:00             0.5       03:12:00   3.20   0.2

【讨论】:

    【解决方案2】:

    我尝试使用虚拟数据构建类似的解决方案

    import pandas as pd
    import numpy as np
    
    tst = pd.DataFrame([{'id':1, 'start': 1, 'duration': 7, 'work_rate': 100}, {'id': 2, 'start': 1, 'duration': 4, 'work_rate': 35}])
    tst
    
        id  start   duration    work_rate
    0   1   1       7           100
    1   2   1       4           35
    
    def create_list_for_splits(_id, start, duration, work_rate):
        start_end = list(range(start, start + duration))
        work_rate_split = [round(work_rate/duration, 2)]*duration
        
        res_ls = []
        
        for s, w in zip(start_end, work_rate_split):
            res_ls.append({'id': _id, 'start': s, 'work_rate': w})
        
        return res_ls
    
    # Test the function
    
    create_list_for_splits(1, 1, 7, 100)
    
    [{'id': 1, 'start': 1, 'work_rate': 14.29},
     {'id': 1, 'start': 2, 'work_rate': 14.29},
     {'id': 1, 'start': 3, 'work_rate': 14.29},
     {'id': 1, 'start': 4, 'work_rate': 14.29},
     {'id': 1, 'start': 5, 'work_rate': 14.29},
     {'id': 1, 'start': 6, 'work_rate': 14.29},
     {'id': 1, 'start': 7, 'work_rate': 14.29}]
    
    tst['work_split'] = tst.apply(lambda x: create_list_for_splits(x['id'], x['start'], x['duration'], x['work_rate']), axis=1)
    
    pd.DataFrame([item for subilst in tst['work_split'].values for item in subilst])
    
        id  start   work_rate
    0   1   1       14.29
    1   1   2       14.29
    2   1   3       14.29
    3   1   4       14.29
    4   1   5       14.29
    5   1   6       14.29
    6   1   7       14.29
    7   2   1       8.75
    8   2   2       8.75
    9   2   3       8.75
    10  2   4       8.75
    

    【讨论】:

      【解决方案3】:

      试试这个。对某些事情使用了一种幼稚的方法,如果您想更改,请随时这样做:

      import datetime
      import time
      import pandas as pd
      
      def func(PlugInTime, PlugOutTime, DeliveredEnergy, ChargeDuration):
      
          # inTime calculation
          inTime = str(PlugInTime).split(' ')
          d = inTime[0].split('-')
          t = inTime[1].split(':')
          Timein = int(datetime.datetime(int(d[0]), int(d[1]), int(d[2]), int(t[0]), int(t[1]), int(t[2])).strftime('%s'))
      
          # outTime calculation
          outTime = str(PlugOutTime).split(' ')
          d = outTime[0].split('-')
          t = outTime[1].split(':')
          Timeout = int(datetime.datetime(int(d[0]), int(d[1]), int(d[2]), int(t[0]), int(t[1]), int(t[2])).strftime('%s'))
      
          # energy
          energy = float(DeliveredEnergy)
      
          # charge
          h, m, s = str(ChargeDuration).split(':')
          charge = int(h) * 3600 + int(m) * 60 + int(s)
          val = charge
      
          # hourly
          add = 3600
      
          # chargeval
          if add < val:
              chargeval = (add/val)*energy
      
      
          list_ = []
          while charge - add >= 0:
              Timein += add
              t_ = time.localtime(Timein)
              lt = [time.strftime("%Y-%m-%d %H:%M:%S", t_), PlugOutTime, chargeval, ChargeDuration]
              list_.append(lt)
              charge -= add
      
          if charge:
              t_ = time.localtime(Timein+charge)
              lt = [time.strftime("%Y-%m-%d %H:%M:%S", t_), PlugOutTime, (charge/val) * energy, ChargeDuration]
              list_.append(lt)
      
          return list_
      
      
      if __name__ == '__main__':
      
          l = [['2019-01-02 05:53:00', '2019-01-02 14:38:00', '36.0', '05:57:00']]
      
          # input dataframe
          df = pd.DataFrame(l, columns = ['PlugInTime', 'PlugOutTime', 'DeliveredEnergy', 'ChargeDuration'])
      
          list_ = []
          for ind, row in df.iterrows():
              l = func(row['PlugInTime'], row['PlugOutTime'], row['DeliveredEnergy'], row['ChargeDuration'])
              list_.extend(l)
      
              # printing the output for a row
              for i in l:
                  print(i)
      
          # output dataframe
          df_out = pd.DataFrame(list_, columns = ['PlugInTime', 'PlugOutTime', 'DeliveredEnergy', 'ChargeDuration'])
      

      输出:

      ['2019-01-02 06:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
      ['2019-01-02 07:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
      ['2019-01-02 08:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
      ['2019-01-02 09:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
      ['2019-01-02 10:53:00', '2019-01-02 14:38:00', 6.050420168067228, '05:57:00']
      ['2019-01-02 11:50:00', '2019-01-02 14:38:00', 5.747899159663865, '05:57:00']
      

      【讨论】:

        猜你喜欢
        • 2021-07-12
        • 2020-02-11
        • 2017-08-11
        • 1970-01-01
        • 1970-01-01
        • 2020-07-19
        • 1970-01-01
        • 2019-08-29
        • 1970-01-01
        相关资源
        最近更新 更多