【发布时间】:2020-08-11 02:59:44
【问题描述】:
我需要根据 3 个条件提取行:
-
col1列应包含列表 list_words 中的所有单词。 -
第一行应以单词
Story结尾 -
下一行应该以
ac结尾
我已经设法在这个问题 Extract rows based on conditions Pandas Python 的帮助下使其工作,但问题是我需要提取以 Story 结尾的每一行以及以 ac 结尾的行之后的行.
这是我当前的代码:
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Plan Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
df["Suffix"] = df.col1.apply(lambda x: x.split()[-1])
# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))
# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story")
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1)
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_4"] = df.col1.str.endswith("ac")
# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3
df["conditions&"] = df.conditions | df.conditions.shift(1)
print(df[['condition_1', 'condition_2','condition_3' ,'condition_4']])
df.to_excel('cond.xlsx', 'Sheet1', index=True)
df["TrueFalse"] = df.conditions | df.conditions.shift(1)
df1=df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse]
print(df1)
这是我的输出:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
这是所需的输出:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
2 anny ac cc True ac
3 antoine ac dd True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
13 Update SW Quality Assurance Plan ac nn True ac
14 joseph ac oo True ac
我需要在以Story(包括第二和第三行)结尾的行之后提取所有以ac 结尾的行,而不仅仅是第一行。
可行吗?
【问题讨论】:
-
因为第 13 行没有
list_words中的所有单词(它以ac而不是Story结尾)但你是对的,它应该在我想要的输出中跨度>
标签: python-3.x pandas rows multiple-conditions