【问题标题】:Python: Return statement inside if condition returns only last valuePython:如果条件仅返回最后一个值,则在内部返回语句
【发布时间】:2021-06-03 00:56:48
【问题描述】:

我有一个带有列的fruits 数据框:(Name, Color, ID) 和一个带有列的sentence 数据框:(Sentence, ID)。我需要将水果数据框的每条记录与句子数据框进行比较,如果水果名称在句子中完全如此,请将其颜色连接到句子中水果名称之前。

这就是我所做的:

import pandas as pd
import regex as re

# create fruit dataframe 
fruit_data = [['Apple', 'Red', 1], ['Mango', 'Yellow', 2], ['Grapes', 'Green', 3]] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color', 'ID']) 
print(fruit_df)

# create sentence dataframe 
sentence = [['I like Apple', 1], ['I like ripe Mango', 2], ['Grapes are juicy', 3]] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence', 'ID']) 
print(sentence_df)


def search(desc, name, color, id):
    flag = 0
    if re.findall(r"\b" + name + r"\b", desc):
        desc_id = (sentence_df[sentence_df['Sentence'] == desc]['ID'].values[0])
        if desc_id == id:
            flag = 1
        
        if flag == 1:
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
           
            print("modified sentence: ", new_desc)
            return new_desc 

def compare(name, color, id):
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color, id))
    

fruit_df.apply(lambda x: compare(x['Name'], x['Color'], x['ID']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

代码的结果是:

     Name    Color  ID
0   Apple     Red    1
1   Mango  Yellow    2
2  Grapes   Green    3

            Sentence  ID
0       I like Apple   1
1  I like ripe Mango   2
2   Grapes are juicy   3


modified sentence:  I like Red Apple
modified sentence:  I like ripe Yellow Mango
modified sentence:  Green Grapes are juicy


The final result is: 
0                      None
1                      None
2    Green Grapes are juicy
Name: Result, dtype: object

句子被正确修改,但问题是,前两个句子没有存储在句子数据框的Result 列中,只有最后一个句子被存储。这是正确的做法还是我错过了什么?

【问题讨论】:

    标签: python pandas function dataframe return


    【解决方案1】:

    经过一些修改:

    import pandas as pd
    import re
    # create fruit dataframe 
    fruit_data = [['Apple', 'Red', 1], ['Mango', 'Yellow', 2], ['Grapes', 'Green', 3]] 
    fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color', 'ID']) 
    print(fruit_df)
    
    # create sentence dataframe 
    sentence = [['I like Apple', 1], ['I like ripe Mango', 2], ['Grapes are juicy', 3]] 
    sentence_df = pd.DataFrame(sentence, columns = ['Sentence', 'ID']) 
    print(sentence_df)
    
    
    def search(ids):
        name = fruit_df[fruit_df['ID']==ids]['Name'].values[0]
        desc = sentence_df[sentence_df['ID']==ids]['Sentence'].values[0]
        color = fruit_df[fruit_df['ID']==ids]['Color'].values[0]
        if True:# kept to maintain this indentation
                # for loop is used because fruit can appear more than once in sentence
                all_indexes = []
                for match in re.finditer(r"\b" + name + r"\b", desc):
                         all_indexes.append(match.start())
                
                arr = list(desc)
                for idx in sorted(all_indexes, reverse=True):
                           arr.insert(idx, color + " ")
    
                new_desc = ''.join(arr)
               
                print("modified sentence: ", new_desc)
                return new_desc
    
    
    sentence_df['Result'] = sentence_df['ID'].apply(lambda x: search(x))
        
    
    print("The final result is: ")
    print(sentence_df['Result'])
    

    变化:

    主要问题在这里fruit_df.apply。这会为fruits_df 中的每个项目调用比较函数,这意味着在示例中提供了 3 次。

    然后compare根据fruit_df.apply传递的当前水果修改了Result列中的所有条目。

    所以,第一步是只调用一次。

    需要进行的另一项更改是使用 foreign-key : ID

    ID 存在于两个数据帧中,因此足以识别search 函数中的namedesccolor


    输出:

          Name   Color  ID
    0   Apple     Red   1
    1   Mango  Yellow   2
    2  Grapes   Green   3
    
    
                Sentence  ID
    0       I like Apple   1
    1  I like ripe Mango   2
    2   Grapes are juicy   3
    
    
    modified sentence:  I like Red Apple
    modified sentence:  I like ripe Yellow Mango
    modified sentence:  Green Grapes are juicy
    
    
    The final result is: 
    0            I like Red Apple
    1    I like ripe Yellow Mango
    2      Green Grapes are juicy
    Name: Result, dtype: object
    

    编辑:根据 OP 的要求,解决方案的快速修复版本。

    只要把你原来的代码底部改成如下,

    def compare(name, color, id):
        sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color, id) or x)
        
    sentence_df['Result'] = sentence_df['Sentence']
    
    fruit_df.apply(lambda x: compare(x['Name'], x['Color'], x['ID']), axis=1)
    print ("The final result is: ")
    print(sentence_df['Result'])
    

    注意:在此修复中,上述问题并未在技术上得到解决。只是引入了一个小旁路来达到所需的输出。

    【讨论】:

    • 感谢您的解决方案,它工作得很好,但我不应该重构代码。
    【解决方案2】:

    您在这两个地方都使用 Dataframe.apply 方法是代码中的问题。 apply 方法可帮助您在数据框的任何轴上应用方法,而不是添加新列并为其分配值。 如果您要处理相同的数据帧并想要执行上述操作,您可以使用 .assign 方法。这使您可以分配一个新列,其值是根据其他列的值计算的。 对于您的代码,如果您想保留相同的代码而不是按照上面的建议重构它,您需要的只是一个循环。

    for idx, row in fruit_df.iterrows():
    result = search(sentence_df.loc[idx,"Sentence"], row["Name"], row["Color"], row["ID"])
    sentence_df.loc[idx,"Result"] = result
    

    【讨论】:

    • 感谢您的解决方案,但我听说dataframe.apply() 比iterrows() 更有效,所以这就是我使用df.apply() 编写它的原因。是否可以单独使用 df.apply() 来解决问题?
    • 由于 DataFrame.apply 方法适用于列或行,因此答案是“否”。除非,您想从一开始就添加一个空的“结果”列并在结果列系列上执行您的 .apply 方法。
    猜你喜欢
    • 2022-01-03
    • 1970-01-01
    • 2020-09-23
    • 1970-01-01
    • 2012-04-02
    • 1970-01-01
    • 2014-12-30
    • 2022-01-07
    • 2016-10-06
    相关资源
    最近更新 更多