【发布时间】:2025-12-08 08:55:02
【问题描述】:
我有一个 groupby 对象。对于这些组中的每一个,我需要检查特定列是否包含包含 value-A 和 value-B 的行,并且只返回组中的那 2 行。如果我使用 isin 或“|”我会遇到其中任何一个值都存在的情况。现在我正在检查第一个条件,然后检查第二个条件,如果第一个条件为真,然后连接两个检查的结果。
我的代码如下:
import pandas as pd
from datetime import datetime, timedelta
from statistics import mean
dict = {'col-a': ['T1A', 'T1A', 'T1A', 'T1B', 'T1B', 'T1C', 'T1C', 'P1', 'P1'],
'col-b': ['07:57:00', '09:00:00', '12:00:00', '08:00:00', '08:25:00', '08:15:00', '07:25:00', '10:00:00', '07:45:00'],
'col-c': ['11111', '22222', '99999', '33333', '22222', '22222', '99999', '22222', '99999'],
'col-d': ['07:58:00', '09:01:00', '12:01:00', '08:01:00', '08:26:00', '08:16:00', '07:26:00', '10:01:00', '07:46:00'],
}
original_df = pd.DataFrame(dict)
print("original df\n", original_df)
# condition 1: must contain T1 in col-a
# condition 2: must contain 22222(variable) amongst each group of col-a
# condition 3: record containing 22222 should have col-b value between 7 and 9
# condition 4: must contain 99999(stays the same) among amongst each group of col-a where above conditions are met
no_to_check = '22222' # comes from another dataframe column
# filtering rows where col-a contains T1
filtered_df = original_df[original_df['col-a'].str.contains('T1')]
# grouping by col-a
trip_groups = filtered_df.groupby('col-a')
# checking if it contains '22222' in column c and '22222' has time between 7 and 9 in column b
trips_time_dict = {}
for group_key, group in trip_groups:
check1 = group[(group['col-c'] == no_to_check) & (group['col-b'].between('07:00:00', '09:00:00'))]
if len(check1) != 0:
# checking if the group contains '99999' in column c
check2 = group[group['col-c'] == '99999']
if len(check2) != 0:
all_conditions = pd.concat([check1,check2])
对于满足条件的每个组,所需的输出应包含一行 22222 和一行 99999。
【问题讨论】:
-
为了清楚起见,您能否在帖子中也包含您想要的输出?
标签: python-3.x pandas pandas-groupby