【问题标题】:Remove characters before and after particular subtring in a string in Python在Python中删除字符串中特定子字符串之前和之后的字符
【发布时间】:2016-11-16 16:20:07
【问题描述】:

我是 Python 的新手。也许这可以用正则表达式来完成。我想在字符串中搜索特定的子字符串并删除字符串中之前和之后的字符。

示例 1

Input:"This is the consignment no 1234578TP43789"
Output:"This is the consignment no TP"

示例 2

Input:"Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
Output:"Consignment no TP is on its way on vehicle no MP"

我有要在字符串中搜索的这些首字母缩写词(MP,TP)的列表。

【问题讨论】:

  • 看看regex模块的替代函数re.sub
  • TP 前后的任何内容。它可以包含数字和字符。这个东西1234578TP43789应该在输出中替换为TP。

标签: python regex regex-lookarounds


【解决方案1】:

您可以使用re.sub

>>> string="This is the consignment no 1234578TP43789"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'This is the consignment no TP'

>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\d+(TP|MP)\d+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'

它有什么作用?

  • \d+ 匹配一位或多位数字。
  • (TP|MP) 匹配 TPMP。在\1 中捕获它。我们使用这个捕获的字符串来替换整个匹配的字符串。

如果任何字符可以出现在 TP/MP 之前和之后,我们可以使用\S 来匹配除空格之外的任何内容。例如,

>>> string="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
>>> re.sub(r'\S+(TP|MP)\S+', r'\1', string)
'Consignment no TP is on its way on vehicle no MP'

编辑

使用list comprehension,您可以遍历列表并将所有字符串替换为,

>>> list_1=["TP","MP","DCT"]
>>> list_2=["This is the consignment no 1234578TP43789","Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"]
>>> [ re.sub(r'\d+(' +  '|'.join(list_1) + ')\d+', r'\1', string) for string in list_2 ]
['This is the consignment no TP', 'Consignment no TP is on its way on vehicle no MP']

【讨论】:

  • @nu11p01n73R 非常感谢 还有一件事 list_1=["TP","MP","DCT"] list_2=["这是 1234578TP43789 号的货物","1234578TP43789 号的货物正在运送中在车辆号 3456MP567890"] 现在我必须从 list_1 中获取 TP、MP 在 list_2 的字符串中搜索并替换它们。该怎么做?
  • @SalmanBaqri 您可以使用join 作为'|'.join(["TP","MP","DCT"]) 生成正则表达式,并使用它来迭代list_2 以生成所需的输出。你也可以使用list comprehensions
  • 能否再解释一下?
  • 我会添加一个单词边界并将 \d+ 变成 \w+regex101.com/r/bCTP9R/1 - 尽管如此 +1
  • @nu11p01n73R 你能分享资源吗,我可以从中了解更多关于正则表达式的信息。
【解决方案2】:

您可以使用strip 去除字符串前后的字符。

strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
strg=' '.join([word.strip('0123456789') for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP

如果包含保留字,则将其放入循环中

strg="Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890 200DG"
reserved=['MP','TP']
for res in reserved:
    strg=' '.join([word.strip('0123456789') if (res in word) else word for word in strg.split()])
print(strg) # Consignment no TP is on its way on vehicle no MP 200DG

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-07-29
    • 1970-01-01
    • 1970-01-01
    • 2022-11-10
    • 2020-07-23
    • 2021-06-13
    • 2020-06-17
    相关资源
    最近更新 更多