Python：用单词列表替换句子中的一个单词并将新句子放在熊猫的另一列中答案

【问题标题】：Python: Replace one word in a sentence with a list of words and put thenew sentences in another column in pandasPython：用单词列表替换句子中的一个单词并将新句子放在熊猫的另一列中
【发布时间】：2025-11-28 18:45:01
【问题描述】：

我有一个数据框，其中一些句子包含单词 'o'clock'，我想用我拥有的小时列表替换之前提到的时间，并将新句子放在另一列中，如下所示：

data= {"sentences":["I have a class at ten o'clock", "she is my friend", "she goes to school at eight o'clock"]}
my_list=['two', 'three','five','ten']

我希望看到的是一个额外的列，新句子如下所示，其中时间更改为列表中的所有时间：

输出：

     sentences                            new_sentences
0    I have a class at ten o'clock        I have a class at two o'clock, I have a class at three o'clock,...
1    she is my friend                     she is my friend
2    she goes to school at eight o'clock  she goes to school at two o'clock,....

new_sentences 列中的重复是可以的。我曾尝试使用 np.where:

np.where(data.str.contains('o\'clock', regex=False, case=False, na=False), data["sentence"].replace()... )

但我不知道如何替换'o'clock之前的单词

提前谢谢你

【问题讨论】：

标签： python regex pandas list dataframe

【解决方案1】：

用途：

# STEP 1
df1 = data['sentences'].str.extract(
    r"(?i)(?P<before>.*)\s(?P<clock>\w+(?=\so'clock))\s(?P<after>.*)")

# STEP 2
df1['clock'] = df1['clock'].str.replace(
    r'\w+', ','.join(my_list)).str.split(',')

# STEP 3
data['new_sentences'] = df1.dropna().explode('clock').agg(
    ' '.join, 1).groupby(level=0).agg(', '.join)

# STEP 4
data['new_sentences'] = data['new_sentences'].fillna(data['sentences'])

说明/步骤：

步骤 1：使用Series.str.extract 和给定的正则表达式模式创建一个三列数据帧，其中第一列对应于时钟 e.g. 10 之前的句子，中间列对应于时钟本身，右列对应于时钟后的句子。

# df1
                  before  clock    after
0      I have a class at    ten  o'clock
1                    NaN    NaN      NaN
2  she goes to school at  eight  o'clock

步骤 2：使用Series.str.replace 将时钟列中的标记替换为my_list 中的所有项目。然后使用Series.str.split 将替换的标记拆分为分隔符,。

# df1
                  before                    clock    after
0      I have a class at  [two, three, five, ten]  o'clock
1                    NaN                      NaN      NaN
2  she goes to school at  [two, three, five, ten]  o'clock

第 3 步：Dataframe.explode 围绕列 clock 展开数据框 df1，使用 .agg 沿轴 1 连接列。然后在级别 0 上使用 groupby 进一步聚合此 datframe。

# data
                             sentences                                      new_sentences
0        I have a class at ten o'clock  I have a class at two o'clock, I have a class ...
1                     she is my friend                                                NaN
2  she goes to school at eight o'clock  she goes to school at two o'clock, she goes to...

第 4 步：最后使用Series.fillna 从对应的sentences 列中填充new_sentences 列中的缺失值。

# data
                             sentences                                      new_sentences
0        I have a class at ten o'clock  I have a class at two o'clock, I have a class ...
1                     she is my friend                                   she is my friend
2  she goes to school at eight o'clock  she goes to school at two o'clock, she goes to...

【讨论】：

非常感谢您的回答和详细的解释，非常感谢。

【解决方案2】：

这符合您的预期吗？

import re
data= {"sentences":["I have a class at ten o'clock", "she is my friend", "she goes to school at eight o'clock"]}
my_list=['two', 'three','five','ten']

regex = re.compile(r"(\w+) (?=o'clock)", re.IGNORECASE)
new = []

for i in data["sentences"]:
    for j in my_list:
        new.append(re.sub(regex, j + ' ', i))

new = list(set(new))

print(new)

输出：

I have a class at two o'clock
I have a class at ten o'clock
she goes to school at two o'clock
she goes to school at five o'clock
I have a class at five o'clock
I have a class at three o'clock
she goes to school at ten o'clock
she goes to school at three o'clock
she is my friend

或等价物：

import re
data= {"sentences":["I have a class at ten o'clock", "she is my friend", "she goes to school at eight o'clock"]}
my_list=['two', 'three','five','ten']
regex = re.compile(r"(\w+) (?=o'clock)", re.IGNORECASE)
x = list(set([re.sub(regex, j + ' ', i) for j in my_list for i in data["sentences"]]))

【讨论】：

非常感谢您的回复。但是，我想看到的是数据框中的一个新列，包括新句子作为列表或字符串，就像问题中的那个一样。你知道是否有办法在 df 的列中做到这一点？
道歉 - 我错过了你说它是数据框的部分 - 我会试一试
非常感谢。是的，数据框是重要的部分，因为我喜欢看到类似于上面输出的句子。