正则表达式不识别 '#' 以从以“#”开头的单词中删除“#”答案

【问题标题】：Regex does not identify '#' for removing "#' from words starting with "#'正则表达式不识别 '#' 以从以“#”开头的单词中删除“#”
【发布时间】：2025-12-09 10:50:01
【问题描述】：

如果# 是单词中的第一个字符，如何从字符串中的单词中删除它。如果它单独出现、位于单词的中间或单词的末尾，则应该保留它。

目前我正在使用正则表达式：

test = "# #DataScience"
test = re.sub(r'\b#\w\w*\b', '', test)

用于从以# 开头的单词中删除#，但它根本不起作用。它按原样返回字符串

谁能告诉我为什么# 没有被识别和删除？

示例 -

test - "# #DataScience"
Expected Output - "# DataScience"

Test - "kjndjk#jnjkd"
Expected Output - "kjndjk#jnjkd"

Test - "# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#""
Expected Output -"# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#"

【问题讨论】：

您的问题难以阅读。您至少可以正确格式化它吗？

标签： python regex python-3.x data-science

【解决方案1】：

a = '# #DataScience'
b = 'kjndjk#jnjkd'
c = "# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#"
regex = '(\s+)#(\S)'

import re
print re.sub(regex, '\\1\\2', a)
print re.sub(regex, '\\1\\2', b)
print re.sub(regex, '\\1\\2', c)

【讨论】：

非常感谢您的帮助！这非常有效！

【解决方案2】：

您可以用空格' ' 分割字符串，以列出字符串中的所有单词。然后在该列表中循环，检查给定条件的每个单词，并在必要时替换散列。之后，您可以通过空格' ' 加入列表以创建一个字符串并返回它。

def remove_hash(str):
    words = str.split(' ')  # Split the string into a list
    without_hash = []  # Create a list for saving the words after removing hash
    for word in words:
        if re.match('^#[a-zA-Z]+', word) is not None:  # check if the word starts with hash('#') and contains some characters after it.
            without_hash.append(word[1:])  # it true remove the hash and append it your the ther list
        else:
            without_hash.append(word)  # otherwise append the word as is in new list
    return ' '.join(without_hash)  # join the new list(without hash) by space and return it.

输出：

>>> remove_hash('# #DataScience')
'# DataScience'
>>> remove_hash('kjndjk#jnjkd')
'kjndjk#jnjkd'
>>> remove_hash("# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#")
'# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#'

通过避免 if else 像这样使你的代码更短（但更难理解）：

def remove_hash(str):
words = str.split(' ' )
    without_hash = []
    for word in words:
        without_hash.append(re.sub(r'^#+(.+)', r'\1', word))
    return ' '.join(without_hash)

这将得到相同的结果

【讨论】：

这是一个非常聪明的方法！感谢您的帮助！

【解决方案3】：

请尝试以下模式。它查找位于字符串开头的 '#' 和空格序列，并将其替换为 '#'

import re

test = "# #DataScience"
test = re.sub(r'(^[#\s]+)', '# ', test)

>>>test
# DataScience

您可以在此处进一步使用该模式：https://regex101.com/r/6hfw4t/1

【讨论】：

我明白了！这对我有很大帮助！谢谢！