【发布时间】:2021-09-02 05:23:21
【问题描述】:
我正在尝试应用以下两个函数,但仅适用于我的 df 的特定行。 例如,仅当 column1 中的值填充有字符串“这是一个句子”时,该函数才应运行。在这种情况下,如果找到 column1 上的值,则函数应针对其他两列运行,但仅在 column1 中找到该值的对应行上运行。 就像这个例子一样
column1 column2 column3 output_required
this is a sentence phrase1 SENT SENT phrase1
this is a brand phrase2 10 10
no value available phrase3 apple apple
this is a sentence phrase4 Fruits Fruits phrase4
我的职能是:
#concatenate with value from column 3
col_target = 'column3'
col_restrict = 'column2'
target = df[col_target].tolist() # this generates a list of lists
restrict = df[col_restrict].tolist() # this also generates a list of lists
def parse_str_into_list(s):
if str(s).startswith('[') and str(s).endswith(']'):
return ' '.join(str(s).strip('[]').strip("'").split("', '"))
return str(s)
def filter_restrict_words(row):
targets = parse_str_into_list(row[0]).split(' ', -1)
restricts = parse_str_into_list(row[1]).split(' ', -1)
words_to_keep = []
for word in targets:
# condition to keep eligible words
if word not in words_to_keep:
# if word not in restricts and word not in words_to_keep:
words_to_keep.append(word)
print('words to keep')
print(words_to_keep)
return ' '.join(words_to_keep)
df['Title'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)
df['output_required'] = df['Title'].map(str) + ' ' + df['column3'].map(str)
#and remove duplicates keeping the first occurrence
def uniqueList(row):
words = row.split(" ")
unique = words[0]
for w in words:
if w.lower() not in unique.lower():
unique = unique + " " + w
return unique
df["Correction_Value"] = df["Correction_Value"].apply(uniqueList)
此时,这些函数正在我的 df 的所有行上运行。
【问题讨论】:
-
您的示例的预期输出是什么?
-
@99_m4n output_required 列来自上述示例
标签: python pandas loops iteration