python 正则表达式在标点符号和字母之间插入空格答案

【问题标题】：python regex inserting a space between punctuation and letterspython 正则表达式在标点符号和字母之间插入空格
【发布时间】：2013-12-20 14:29:49
【问题描述】：

我认为最好的方法是使用正则表达式，但我不知道该怎么做。我正在尝试解析字符串并仅在字母和标点符号之间放置一个空格。我想把标点符号放在一起。例如，如果我有字符串

“是的！！！”

我想结束

“是”、“！！！”。

如果我有字符串

!!!N00bs，

我想结束

“！！！”，“N00bs”

这可能吗？做这个的最好方式是什么？现在我正在解析每个字母，这是一种愚蠢的做法。

感谢您的帮助。

【问题讨论】：

我知道你说过你只是想去掉标点符号，但以防万一你想这样做并且还想一次性去掉其他非字母数字字符，例如 (@#$% ^&*(()+=) 或其他，你可以使用\w 和\W。非常好。如果你使用的是 Python 3.x，它会自动处理 Unicode 字符。跨度>

标签： python regex string

【解决方案1】：

类似这样的：

txt = re.sub( r'([a-zA-Z])([,.!])', r'\1 \2', '!!!this, .is, .a .test!!!' )

你可以切换其他方向的顺序

re.sub( r'([,.!])([a-zA-Z])', r'\1 \2', txt )

也许你也可以让它在一个正则表达式中工作

【讨论】：

快到了。如果标点符号像您更改它之前的“！is”一样引导，它似乎不起作用。我正在尝试在标点符号和字母字符之间留一个空格
我改变了它，因为我不知道你是否想要那个案例。我将编辑我的答案。
谢谢，这完全是我的不明确。
如果您想忽略某些格式，例如“例如”，您将如何修改它？还是其他缩写？
有兴趣知道@donkey问的问题是否有任何解决方案

【解决方案2】：

如果你只想添加一个空格，可以使用替换？

x = x.replace('!',' ')

您可能必须使用更多替换来删除标点符号和标点符号之间的空格。

【讨论】：

【解决方案3】：

我会使用：

(.+)\b(.+)

它适用于yes!!! 和!!!N00bs

说明：

The regular expression:

(?-imsx:(.+)\b(.+))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .+                       any character except \n (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .+                       any character except \n (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

【讨论】：