从 2 个数据框 pandas 的列中减去两个日期答案

【问题标题】：Subtracting Two dates from columns in 2 dataframes pandas从 2 个数据框 pandas 的列中减去两个日期
【发布时间】：2023-04-11 04:53:01
【问题描述】：

我有以下代码：

for tup in unique_tuples:
    user_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == tup[1])]     

    for friend in tup[2]:
        friend_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == friend)] 

        if (friend_review.date - user_review.date) <= 62:
            tup[2].remove(friend)

我正在从元组列表中提取值并将它们与数据框中的列中的值进行匹配，然后屏蔽该值等于 true 的行。

user_review_mask 是一行，代表用户对企业的评论。 friend_review 掩码也是一行，代表用户的朋友所做的评论。

tup[2] 是 tup[1] 中 user_id 的朋友 ID 列表。因此，我遍历用户的每个朋友，然后将该friend_id 与他对企业的评论进行匹配。

基本上我想看看，对于 2 个不同用户的 2 个不同评论，friend_review.date 和 user_review.date 之间的差异是否为如果差异是'不少于2个月，我想从tup[2]列表中删除friend_id。

两个数据帧/行中的两个日期都是数据类型 datetime64[ns]，每个日期的格式都是“yyyy-mm-dd”，所以我想我可以很容易地减去它们，看看是否有评论之间的差异不到 2 个月。

但是，我不断收到以下错误：

TypeError: invalid type comparison

它还提到 Numpy 不喜欢比较与“无”，我也有点困惑，因为我的列中没有空值。

更新：解决方案 最终追加到新列表而不是从当前列表中删除，但这有效。

#to append tuples
business_reviewer_and_influenced_reviewers = []

#loop through each user and create a single row df based on a match from the reviews df and our tuple values
for tup in unique_tuples:
    user_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                (reviews_prior_to_influence_threshold.user_id == tup[1]), 'date']     

    user_review_date = user_review_date.values[0]

    #loop through list each friend of the reviewer that also reviewed the business in tup[2]
    for friend in tup[2]:
        friend_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                      (reviews_prior_to_influence_threshold.user_id == friend), 'date']

        friend_review_date = friend_review_date.values[0]
        diff = pd.to_timedelta(friend_review_date - user_review_date).days

        #append business_id, reviewer, and influenced_reviewer as a tuple to a list
        if (diff >= 0) and (diff <= 62):
            business_reviewer_and_influenced_reviewers.append((tup[0], tup[1], friend))

【问题讨论】：

标签： python date pandas numpy

【解决方案1】：

数据框中的日期可能不是 datetime64 dtype 实例，因此是 invalid type comparison。您可以通过df.dtypes 查询。如果是这样，请使用df.date = pd.to_datetime(df.date)。

您的数据框中可能有一些日期为null，因此比较与“无”。使用df[pd.notnull(df.dates)]。

顺便说一句：减去日期应该会得到timedelta，因此您可能需要执行(friend_review.date - user_review.date).dt.days <= 62 之类的操作。

【讨论】：

@dsc03 check print type(friend_review)，你的其他变量就是你所期望的。
哦，有趣。 @connor.xyz timedelta 是否也包含数年？例如，friend_review.date = 2012/01/15 和 user_review.date = 2011/11/22。
@connorxyz 你上面写的函数返回多个布尔值。它不是减去两者，而是简单地评估每一个并返回一个布尔值
@dsc03 听起来像friend_review 和user_review 是Series 实例（这是type() 出现的地方）。我怀疑根本问题是你的迭代方案（即你实际上是在你期望你使用单个行或值的地方工作系列）。 DataFrame.iterrows() 和 Series.iteritems() 可能会有所帮助。