【发布时间】:2021-06-03 00:56:48
【问题描述】:
我有一个带有列的fruits 数据框:(Name, Color, ID) 和一个带有列的sentence 数据框:(Sentence, ID)。我需要将水果数据框的每条记录与句子数据框进行比较,如果水果名称在句子中完全如此,请将其颜色连接到句子中水果名称之前。
这就是我所做的:
import pandas as pd
import regex as re
# create fruit dataframe
fruit_data = [['Apple', 'Red', 1], ['Mango', 'Yellow', 2], ['Grapes', 'Green', 3]]
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color', 'ID'])
print(fruit_df)
# create sentence dataframe
sentence = [['I like Apple', 1], ['I like ripe Mango', 2], ['Grapes are juicy', 3]]
sentence_df = pd.DataFrame(sentence, columns = ['Sentence', 'ID'])
print(sentence_df)
def search(desc, name, color, id):
flag = 0
if re.findall(r"\b" + name + r"\b", desc):
desc_id = (sentence_df[sentence_df['Sentence'] == desc]['ID'].values[0])
if desc_id == id:
flag = 1
if flag == 1:
# for loop is used because fruit can appear more than once in sentence
all_indexes = []
for match in re.finditer(r"\b" + name + r"\b", desc):
all_indexes.append(match.start())
arr = list(desc)
for idx in sorted(all_indexes, reverse=True):
arr.insert(idx, color + " ")
new_desc = ''.join(arr)
print("modified sentence: ", new_desc)
return new_desc
def compare(name, color, id):
sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color, id))
fruit_df.apply(lambda x: compare(x['Name'], x['Color'], x['ID']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])
代码的结果是:
Name Color ID
0 Apple Red 1
1 Mango Yellow 2
2 Grapes Green 3
Sentence ID
0 I like Apple 1
1 I like ripe Mango 2
2 Grapes are juicy 3
modified sentence: I like Red Apple
modified sentence: I like ripe Yellow Mango
modified sentence: Green Grapes are juicy
The final result is:
0 None
1 None
2 Green Grapes are juicy
Name: Result, dtype: object
句子被正确修改,但问题是,前两个句子没有存储在句子数据框的Result 列中,只有最后一个句子被存储。这是正确的做法还是我错过了什么?
【问题讨论】:
标签: python pandas function dataframe return