【问题标题】:How to divide the dataframes into new dataframes according to the specified condition?如何根据指定的条件将数据帧划分为新的数据帧?
【发布时间】:2021-07-26 04:10:49
【问题描述】:

我有一个数据框: 例如:

df =

Questions             Answers

Where is Amazon?       Brazil

Is he a scientist?         No

Did he stole my money?    Yes

What does your father do?  Business

He is a great player.      I don't think so.
 
She is my girlfriend.      I too agree.

我想在条件下从上面的数据框创建三个数据框:

df1 的条件:

如果 df['Questions'] 的第一个单词来自列表:

# list of Yes/No verbs
yn_list = ['Do','Does','Did','do','does','did','Am','Are','Is','Was','Were','am','are','is','was','were',
           'Have','Has','Had','have','has','had','Will','Would','Shall','Should','Can','Could','May',
           'Might','will','would','shall','should','can','could','may','might']

# list of negative Yes/No verbs
yn_negative_list = ["Don't","Doesn't","Didn't","don't","doesn't","didn't","Aren't","Isn't","aren't","isn't",
                    "Wasn't","Weren't","wasn't","weren't","Haven't","Hasn't","Hadn't","haven't","hasn't",
                    "hadn't","Won't","Wouldn't","won't","wouldn't","Shan't","shan't","Shouldn't","Can't",
                    "Couldn't","shouldn't","can't","couldn't","may not","May not","Mightn't","mightn't"]

df2 的条件:

如果 df['Questions'] 的第一个单词来自列表:

wh_list = ['who','where','what','when','why','whom','which','whose','how']

df3 的条件:

如果句子以“.”结尾签到

【问题讨论】:

    标签: python python-3.x pandas list dataframe


    【解决方案1】:

    你的第三个条件:

    df[df['Question'].str.endswith('.')]
    
                    Question                   Answer
    4  He is a great player.        I don't think so.
    5  She is my girlfriend.             I too agree.
    

    第二个条件:

    df[df['Question'].str.lower().str.startswith(tuple(wh_list))]
    
                        Question         Answer
    0           Where is Amazon?         Brazil
    3  What does your father do?       Business
    

    第一个条件:

    df[df['Question'].str.lower().str.startswith(tuple(yn_list+yn_negative_list))]
    
                     Question       Answer
    1      Is he a scientist?           No
    2  Did he stole my money?          Yes
    

    【讨论】:

    • 直到.str.startswith 接受一个元组。但我在documentation +1 中找不到任何内容
    【解决方案2】:
    df1 = df[df["Questions"].str.split(n=1).str[0].isin(yn_list + yn_negative_list)]
    print(df1)
    print()
    
    
    df2 = df[df["Questions"].str.lower().str.split(n=1).str[0].isin(wh_list)]
    print(df2)
    print()
    
    df3 = df[df["Questions"].str.endswith(".")]
    print(df3)
    print()
    

    打印:

                    Questions Answers
    1      Is he a scientist?      No
    2  Did he stole my money?     Yes
    
                       Questions   Answers
    0           Where is Amazon?    Brazil
    3  What does your father do?  Business
    
                   Questions            Answers
    4  He is a great player.  I don't think so.
    5  She is my girlfriend.       I too agree.
    
    

    【讨论】:

    • 确实简单直接。我认为为这些复杂的查询创建一个布尔函数会有所帮助。 def question(row): return row.question.startswith(tuple(yn_list+yn_negative_list)) 特别是你有很多这样的,如果它们涉及更多的列(比如答案)
    猜你喜欢
    • 1970-01-01
    • 2021-04-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-08-22
    相关资源
    最近更新 更多