在不改变单词的情况下替换 python 中的 unigrams 和 n-grams答案

【问题标题】：replacing unigrams and n-grams in python without changing words在不改变单词的情况下替换 python 中的 unigrams 和 n-grams
【发布时间】：2015-07-29 09:50:19
【问题描述】：

这看起来应该是直截了当的，其实不然单词。

例如：

x='hello world'
x.replace('llo','ll)

'hell world'

但我不希望这种情况发生。

在空格上拆分字符串适用于单个单词（unigrams），但我也想替换 n-grams

所以：

'this world is a happy place to be'

要转换为：

'this world is a miserable cesspit to be'

并且在空白处拆分不起作用。

Python3 中是否有一个内置函数允许我这样做？

我能做到：

if len(new_string.split(' '))>1:
    x.replace(old_string,new_string)
else:
    x_array=x.split(' ')
    x_array=[new_string if y==old_string else y for y in x_array]
    x=' '.join(x_array)

【问题讨论】：

用单词\boundaries 编写一个正则表达式并使用re.sub 而不是str.replace（参见re.sub(r'\bllo\b', 'll', 'hello world')）？
比我的解决方案更简洁，谢谢
Replace exact substring in python 的可能重复项

标签： python-3.x

【解决方案1】：

你可以这样做：

import re

re_search = '(?P<pre>[^ ])llo(?P<post>[^ ])'
re_replace = '\g<pre>ll\g<post>'

print(re.sub(re_search, re_replace, 'hello world'))
print(re.sub(re_search, re_replace, 'helloworld'))

输出：

hello world
hellworld

注意您需要如何再次添加pre 和post。

现在我看到了 cmets...\b 可能会更好。

【讨论】：