【问题标题】:Find closest event in one dataframe before an event in another dataframe在另一个数据帧中的事件之前在一个数据帧中查找最近的事件
【发布时间】:2014-11-11 08:56:20
【问题描述】:

我有以下数据框 df1 和 df2。我希望 df3 退出加入他们,如下所述。

df1 和 df2 都包含带有时间戳的事件,用于特定的机器。

在 df3 中,我想拥有所有 df1,但还为每一行添加 df2 中的事件时间戳,用于最接近 df1 中行的时间戳但在它之前的同一台机器。如果 df1 事件之前没有 df2 事件,则该新值可以为空。

所以这是一种合并操作,除了两个表之间的链接是“机器”上的相等性,而是应该在时间戳的一个方向上最小化的不等式。

这是生成示例数据帧的代码:

import pandas as pd
df1=pd.DataFrame({"Machine":[0,2,3,0,2,3],"Status":["blah","foo","bar","blah","foo","bar"],"Date-time":["2014-02-20 11:00:19.0","2014-02-21 12:29:55.0","2014-02-20 11:00:21.0","2014-02-19 09:10:19.0","2014-02-18 12:19:47.0","2014-02-20 1:33:00.0"]})
df1["Date-time"]=pd.to_datetime(df1["Date-time"])

df2=pd.DataFrame({"Machine":[0,2,3,0,2,3],"Date of maintenance":["2014-02-20","2014-02-21","2014-02-20","2014-02-10","2014-02-07","2014-02-03"]})
df2["Date of maintenance"]=pd.to_datetime(df2["Date of maintenance"])

df3=pd.DataFrame({"Machine":[0,2,3,0,2,3],"Status":["blah","foo","bar","blah","foo","bar"],"Date-time":["2014-02-20 11:00:19.0","2014-02-21 12:29:55.0","2014-02-20 11:00:21.0","2014-02-19 09:10:19.0","2014-02-18 12:19:47.0","2014-02-20 1:33:00.0"],"Date of last maintenance":["2014-02-20","2014-02-21","2014-02-20","2014-02-10","2014-02-07","2014-02-20"]})

编辑:

所以我记下了以下内容。我在那里有一些重复,但我应该能够轻松地处理它们。缺少的大部分是如何通过机器而不是整个表进行匹配。

import pandas as pd
import numpy as np
df1=pd.DataFrame({"Machine":[0,2,3,0,2,3,0,1,0],"Status":["blah","foo","bar","blah","foo","bar","blah","foo","bar"],"Date-time":["2014-02-20 11:00:19.0","2014-02-21 12:29:55.0","2014-02-20 11:00:21.0","2014-02-19 09:10:19.0","2014-02-18 12:19:47.0","2014-02-20 1:33:00.0","2014-02-07 04:10:19.0","2014-02-19 11:11:47.0","2014-03-20 1:23:00.0"]})
df1["Date-time"]=pd.to_datetime(df1["Date-time"])
df1=df1.sort(["Date-time"])
df1=df1.reset_index(drop=True)

df2=pd.DataFrame({"Machine":[0,2,3,0,2,3],"Date of maintenance":["2014-02-20","2014-02-21","2014-02-20","2014-02-10","2014-02-07","2014-02-03"]})
df2["Date of maintenance"]=pd.to_datetime(df2["Date of maintenance"])
df2=df2.sort(["Date of maintenance"])
df2=df2.reset_index(drop=True)


df2["searchsortindex"]=np.searchsorted(np.array(df1["Date-time"]), np.array(df2["Date of maintenance"]), side='left', sorter=None)
df3=pd.merge(df1,df2,how='left',left_index=True,right_on='searchsortindex')

【问题讨论】:

    标签: python join pandas


    【解决方案1】:

    您可以为此使用numpy.searchsorted()。它假定您有一个排序数组(例如时间戳)和第二个数组,您想在第一个数组中“定位”。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-12-07
      • 2018-01-10
      • 1970-01-01
      • 1970-01-01
      • 2017-11-18
      • 2021-01-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多