【发布时间】:2019-03-24 22:25:57
【问题描述】:
这里是 Python 新手,想看看是否有更优雅的解决方案。
我有一个带有运动指示器的远程信息处理设备的时间序列数据。我需要将运动指示器扩展到实际运动开始和停止的 +/- 1 行(由下面的 motion2 列表示)。我在 SQL 中使用 case 语句和超前和滞后窗口函数来做这件事。正在尝试将我的代码转换为 python...
这是数据。 将熊猫导入为 pd
data = {'device':[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2],
'time':[1,2,3,4,5,6,7,8,9,10,11,12,5,6,7,8,9,10,11,12,13,14],
'motion':[0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0]}
df = pd.DataFrame.from_dict(data)
df = df[['device','time','motion']]
##sort data chronologically for each device
df.sort_values(['device','time'], ascending = True, inplace = True)
这就是 df 的样子
device, time, motion
1,1,0
1,2,0
1,3,1
1,4,1
1,5,1
1,6,0
1,7,0
1,8,0
1,9,1
1,10,1
1,11,0
1,12,0
2,5,0
2,6,0
2,7,0
2,8,1
2,9,1
2,10,1
2,11,0
2,12,1
2,13,0
2,14,0
我需要的是下面添加到数据框中的motion2列。
device, time, motion, motion2
1,1,0,0
1,2,0,1
1,3,1,1
1,4,1,1
1,5,1,1
1,6,0,1
1,7,0,0
1,8,0,1
1,9,1,1
1,10,1,1
1,11,0,1
1,12,0,0
2,5,0,0
2,6,0,0
2,7,0,1
2,8,1,1
2,9,1,1
2,10,1,1
2,11,0,1
2,12,1,1
2,13,0,1
2,14,0,0
下面是运行的 python 代码。但是,想知道是否有更优雅的方式。
##create new columns for prior and next motion indicator
df['prev_motion'] = df.groupby(['device'])['motion'].shift(1)
df['next_motion'] = df.groupby(['device'])['motion'].shift(-1)
##create the desired motion2 indicator to expand +/- 1 record of the motion
start and stop
df['motion2'] = df[['prev_motion', 'motion', 'next_motion']].apply(lambda
row: 1 if row['motion']==1 else (1 if row['prev_motion']==1 or
row['next_motion']==1 else 0), axis=1)
##drop unwanted columns
df.drop(columns=['prev_motion', 'next_motion'], inplace = True)
这在 SQL 中使用 case 语句和窗口函数(领先和滞后)要容易得多。
case
when motion = 1 then 1
when motion = 0 and (lead(motion) over (partition by device order by time) = 1) then 1
when motion = 0 and (lag(motion) over (partition by device order by time) = 1) then 1
else 0
end as motion2
【问题讨论】:
标签: python sql pandas window-functions case-statement