如何根据指定的条件将数据帧划分为新的数据帧？答案

【问题标题】：How to divide the dataframes into new dataframes according to the specified condition?如何根据指定的条件将数据帧划分为新的数据帧？
【发布时间】：2021-07-26 04:10:49
【问题描述】：

我有一个数据框：例如：

df =

Questions             Answers

Where is Amazon?       Brazil

Is he a scientist?         No

Did he stole my money?    Yes

What does your father do?  Business

He is a great player.      I don't think so.
 
She is my girlfriend.      I too agree.

我想在条件下从上面的数据框创建三个数据框：

df1 的条件：

如果 df['Questions'] 的第一个单词来自列表：

# list of Yes/No verbs
yn_list = ['Do','Does','Did','do','does','did','Am','Are','Is','Was','Were','am','are','is','was','were',
           'Have','Has','Had','have','has','had','Will','Would','Shall','Should','Can','Could','May',
           'Might','will','would','shall','should','can','could','may','might']

# list of negative Yes/No verbs
yn_negative_list = ["Don't","Doesn't","Didn't","don't","doesn't","didn't","Aren't","Isn't","aren't","isn't",
                    "Wasn't","Weren't","wasn't","weren't","Haven't","Hasn't","Hadn't","haven't","hasn't",
                    "hadn't","Won't","Wouldn't","won't","wouldn't","Shan't","shan't","Shouldn't","Can't",
                    "Couldn't","shouldn't","can't","couldn't","may not","May not","Mightn't","mightn't"]

df2 的条件：

如果 df['Questions'] 的第一个单词来自列表：

wh_list = ['who','where','what','when','why','whom','which','whose','how']

df3 的条件：

如果句子以“.”结尾签到

【问题讨论】：

标签： python python-3.x pandas list dataframe

【解决方案1】：

你的第三个条件：

df[df['Question'].str.endswith('.')]

                Question                   Answer
4  He is a great player.        I don't think so.
5  She is my girlfriend.             I too agree.

第二个条件：

df[df['Question'].str.lower().str.startswith(tuple(wh_list))]

                    Question         Answer
0           Where is Amazon?         Brazil
3  What does your father do?       Business

第一个条件：

df[df['Question'].str.lower().str.startswith(tuple(yn_list+yn_negative_list))]

                 Question       Answer
1      Is he a scientist?           No
2  Did he stole my money?          Yes

【讨论】：

直到.str.startswith 接受一个元组。但我在documentation +1 中找不到任何内容

【解决方案2】：

df1 = df[df["Questions"].str.split(n=1).str[0].isin(yn_list + yn_negative_list)]
print(df1)
print()


df2 = df[df["Questions"].str.lower().str.split(n=1).str[0].isin(wh_list)]
print(df2)
print()

df3 = df[df["Questions"].str.endswith(".")]
print(df3)
print()

打印：

                Questions Answers
1      Is he a scientist?      No
2  Did he stole my money?     Yes

                   Questions   Answers
0           Where is Amazon?    Brazil
3  What does your father do?  Business

               Questions            Answers
4  He is a great player.  I don't think so.
5  She is my girlfriend.       I too agree.

【讨论】：

确实简单直接。我认为为这些复杂的查询创建一个布尔函数会有所帮助。 def question(row): return row.question.startswith(tuple(yn_list+yn_negative_list)) 特别是你有很多这样的，如果它们涉及更多的列（比如答案）