【发布时间】:2020-12-06 08:02:42
【问题描述】:
我有一个这样的 df:
Timestamp Time Power Total Energy ID Energy
2020-04-09 06:45:00 2020-04-09 06:40:40.559719 7500 5636690.0 1 140.0
2020-04-09 06:46:00 2020-04-09 06:40:40.559719 7500 5636710.0 1 160.0
2020-04-09 06:47:00 NaT NaN NaN NaN NaN
2020-04-09 06:48:00 2020-04-09 06:40:40.559719 7500 5636960.0 1 410.0
2020-04-09 06:49:00 NaT NaN NaN NaN NaN
2020-04-09 06:50:00 NaT NaN NaN NaN NaN
2020-04-09 06:51:00 NaT NaN NaN NaN NaN
... ... ... ... ... ...
2020-04-30 23:55:00 2020-04-29 16:30:38.559871 7500 18569270.0 5 100.0
2020-04-30 23:54:00 NaT NaN NaN NaN NaN
2020-04-30 23:55:00 2020-04-29 16:30:38.559871 7500 18569370.0 5 180.0
我必须调整/添加一些值:
- 为 df['Time'] 添加行 > df['Timestamp']: df['Timestamp'] in 1 分钟间隔; df['Time'] = df['Time'] 的条目; df['Power'] = df['Energy'] / (delta t (=时间与现有时间戳之间的差异 (以小时为单位))); df['Total Energy']、df['ID'] 和 df['Energy'] 类似 df['time']
- 在时间不变的区域填充 NaN/NaT 值(使用 bfill 或 ffill)
- 用 0 填充两个不同 df['Time'] 条目之间的 NaN/Nat 值,分别用最后一个条目 (ffill) 填充 df['Total Energy']
预期结果:
Timestamp Time Power Total Energy ID Energy
2020-04-09 06:41:00 2020-04-09 06:40:40.559719 2100 5636690.0 1 140.0
2020-04-09 06:42:00 2020-04-09 06:40:40.559719 2100 5636690.0 1 140.0
2020-04-09 06:43:00 2020-04-09 06:40:40.559719 2100 5636690.0 1 140.0
2020-04-09 06:44:00 2020-04-09 06:40:40.559719 2100 5636690.0 1 140.0
2020-04-09 06:45:00 2020-04-09 06:40:40.559719 7500 5636690.0 1 140.0
2020-04-09 06:46:00 2020-04-09 06:40:40.559719 7500 5636710.0 1 160.0
2020-04-09 06:47:00 2020-04-09 06:40:40.559719 7500 5636710.0 1 160.0
2020-04-09 06:48:00 2020-04-09 06:40:40.559719 7500 5636960.0 1 410.0
2020-04-09 06:49:00 - 0 5636960.0 - 0
2020-04-09 06:50:00 - 0 5636960.0 - 0
2020-04-09 06:51:00 - 0 5636960.0 - 0
... ... ... ... ... ...
2020-04-30 23:55:00 2020-04-29 16:30:38.559871 7500 18569270.0 5 100.0
2020-04-30 23:54:00 2020-04-29 16:30:38.559871 7500 18569270.0 5 100.0
2020-04-30 23:55:00 2020-04-29 16:30:38.559871 7500 18569370.0 5 180.0
我认为解决方案在某些条件下与 ffill() 有关,不幸的是我不知道如何制定这个。
编辑: 这是我的代码示例:
df = pd.DataFrame({"Time": ["2020-04-09 06:40:40.559719","2020-04-09 06:40:40.559719", 'NaT', "2020-04-09 06:40:40.559719", 'NaT', 'NaT', 'NaT', '2020-04-09 16:50:38.559871', 'NaT', '2020-04-29 16:50:38.559871'],
"Power": [7500, 6000, 'NaN', 6000, 'NaN', 'NaN', 'NaN', 3600, 'NaN', 4200],
"Total Energy": [5000, 5100, 'NaN', 5300, 'NaN', 'NaN', 'NaN', 5360, 'NaN', 5500],
"ID": [1, 1, 'NaN', 1, 'NaN', 'NaN', 'NaN', 2, 'NaN', 2],
"Energy": [500, 600, 'NaN', 800, 'NaN', 'NaN', 'NaN', 60, 'NaN', 200]},
index=pd.date_range(start = "2020-04-09 6:45", periods = 10, freq = 'T'))
df['Time'] = pd.to_datetime(df['Time'])
df['Power'] = pd.to_numeric(df['Power'], errors = 'coerce')
df['Total Energy'] = pd.to_numeric(df['Total Energy'], errors = 'coerce')
df['ID'] = pd.to_numeric(df['ID'], errors = 'coerce')
df['Energy'] = pd.to_numeric(df['Energy'], errors = 'coerce')
df
预期结果:
Time Power Total Energy ID Energy
2020-04-09 06:41:00 2020-04-09 06:40:40.559719 0 4500.0 1.0 0
2020-04-09 06:42:00 2020-04-09 06:40:40.559719 7500.0 4625.0 1.0 125.0
2020-04-09 06:43:00 2020-04-09 06:40:40.559719 7500.0 4750.0 1.0 250.0
2020-04-09 06:44:00 2020-04-09 06:40:40.559719 7500.0 4875.0 1.0 375.0
2020-04-09 06:45:00 2020-04-09 06:40:40.559719 7500.0 5000.0 1.0 500.0
2020-04-09 06:46:00 2020-04-09 06:40:40.559719 6000.0 5100.0 1.0 600.0
2020-04-09 06:47:00 2020-04-09 06:40:40.559719 6000.0 5200.0 1.0 700.0
2020-04-09 06:48:00 2020-04-09 06:40:40.559719 6000.0 5300.0 1.0 800.0
2020-04-09 06:49:00 - 0 5300.0 - 0
2020-04-09 06:50:00 - 0 5300.0 - 0
2020-04-09 06:51:00 2020-04-09 16:50:38.559871 0 5300.0 2.0 0
2020-04-09 06:52:00 2020-04-09 16:50:38.559871 3600.0 5360.0 2.0 60.0
2020-04-09 06:53:00 2020-04-09 16:50:38.559871 4200.0 5430.0 2.0 130.0
2020-04-09 06:54:00 2020-04-29 16:50:38.559871 4200.0 5500.0 2.0 200.0
- df['Time']:创建新行直到 df['Timestamp'] = df['Time']
- 填充新行:df['Energy'] = 0 第一行,而不是线性填充; df['Power'] = 0 对于第一行,而不是 df['Power'] = df['Energy']/(1/60); df['Time'] 和 df['ID'] 用 bfill() 填充; df['总能量'] = df['能量']的总和
- 两个不同时间之间的界限:按照预期结果填写
- 时间序列中的 NaN 值(例如 @2020-04-09 06:47:00):带有 ffill() 的 df['Time'] 和 df['ID']; df['Energy'] = 现有行之间的差异(如果有更多的 NaN-Lines --> interpolate linearly); df['总能量'] = 旧值 + df['能量']; df['Power'] = df['Energy']/(1/60)
感谢您的帮助
【问题讨论】: