如何在python2中使用数字正则表达式主题标签？答案

【问题标题】：How to regex hashtag with digits after in python2?如何在python2中使用数字正则表达式主题标签？
【发布时间】：2015-11-25 04:04:39
【问题描述】：

我想在 python 2.7 之后找到一个 1 到 6 位数字的主题标签，但我的正则表达式不正确匹配。

这是我的例子：

chaine = "[url=http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132:<UID>]http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132[/url:<UID>]"
regex = re.compile('http://forum.darkgyver.fr/(.*)\#(\d{1-6})')
match = regex.search(chaine)
if match:
        pos1 = match.start()
        pos2 = match.end()
else:    
        pos1 = -1
        pos2 = -1

print "pos1 %d" % pos1
print "pos2 %d" % pos2
url_tempo = chaine[pos1:pos2]
print "url_tempo %s" % url_tempo            
posPost = pos1 + url_tempo.find('#') + 1
numPost = chaine[posPost:pos2]
print "numPost %s" % numPost

第一个正则表达式返回“不匹配”。也许主题标签没有正确声明。

所以我改变了我的正则表达式如下：

regex = re.compile('http://forum.darkgyver.fr/(.*)\#([0-9]+(:| |    |\n|\[|$))')

匹配错误位置pos2=161应该是pos2=80

如何将我的正则表达式修复到主题标签和后面的 1 到 6 位数字？

【问题讨论】：

\d{1,6} ......不是\d{1-6}
感谢马丁的第一个回答。我试过 regex = re.compile('forum.darkgyver.fr(.*)\#(\d{1,6})') 部分匹配，pos2 应该是“80”而不是“160”，numpost 没有包含好的字符串应该只有 "265132" pos2=160 numPost 265132:]forum.darkgyver.fr/…

标签： python regex python-2.7 hashtag digit

【解决方案1】：

您正在尝试从 URL 中提取主题标签。从您给出的字符串中，尝试简单地提取 # 和 : 字符之间的所有数字似乎更合乎逻辑。如果您有一个包含 7 位数字的主题标签，您想要全部 7 位数字还是不匹配它？无论如何，我猜你不会只想要前 6 位数字。

通过使用分组运算符()，如果匹配，只需使用qroup(1) 命令即可看到您的主题标签，无需尝试使用字符串切片提取它。

以下向您展示了一种提取主题标签的可能方法：

chaine = "[url=http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132:<UID>]http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132[/url:<UID>]"

re_hashtag = re.search(re.escape("http://forum.darkgyver.fr") + ".*?#(\d+):", chaine)

print re_hashtag.start()    
print re_hashtag.end()
print re_hashtag.group(1)

这将显示以下内容：

5
80
265132

5 的起始位置是因为它以匹配您选择的http 开始。

注意，我使用了escape() 函数来确保您的 URL 被正确转义。如果您打印以下内容，您将看到您的初始正则表达式应该是如何编写的：

print re.escape("http://forum.darkgyver.fr")

给予：

http\:\/\/forum\.darkgyver\.fr

【讨论】：

【解决方案2】：

感谢马丁的回答。

我必须使用转义函数正确转义我的网址
我将使用 re.search 将标签后面的数字直接查找到数组中：re.search(re.escape("http://forum.darkgyver.fr") + ".*?#(\d+) :", 链）

我同意你的看法，这是最好的解决方案。弗朗索瓦

【讨论】：