Python正则表达式从字符串中提取MAC地址答案

【问题标题】：Python regex extract MAC addresses from stringPython正则表达式从字符串中提取MAC地址
【发布时间】：2014-11-12 16:31:03
【问题描述】：

我需要帮助编写正则表达式，使用 python re 引擎：

从文本文件中提取所有 MAC 地址
提取以下格式的所有字符串：foo bar ... MAC:ADDRESS ... baz bat \r\n

提前致谢！

我尝试了以下方法来提取 MAC 地址，但没有成功：

import re
p = re.compile(ur'((?:(\d{1,2}|[a-fA-F]{1,2}){2})(?::|-*)){6}')
test_str = u"TEXT WITH SOME MAC ADDRESSES 00:24:17:b1:cc:cc TEXT CONTINUES WITH SOME MORE TEXT 20:89:86:9a:86:24"

found = re.findall(p, test_str)
for a in found:
    print a

【问题讨论】：

如果您有正则表达式问题，提供几个示例输入和预期输出会有很大帮助。
谢谢@vks ...有什么想法吗？：/我真的对正则表达式感到困惑..

标签： python regex

【解决方案1】：

我编造了以下内容：([0-9a-fA-F]:?){12} 以匹配文本中的 MAC 地址。

这是它应该如何工作的：

[0-9a-fA-F] 匹配用于表示十六进制数字的字符
:? 匹配可选冒号
(...){12} - 然后将所有这些分组并重复 12 次。 12，因为 MAC 地址由 6 对十六进制数字组成，以冒号分隔

您可以在here 中看到它。

Python 代码就变成了：

import re
p = re.compile(r'(?:[0-9a-fA-F]:?){12}')
test_str = u"TEXT WITH SOME MAC ADDRESSES 00:24:17:b1:cc:cc TEXT CONTINUES WITH SOME MORE TEXT 20:89:86:9a:86:24"

re.findall(p, test_str)

生产结果：

[u'00:24:17:b1:cc:cc', u'20:89:86:9a:86:24']

【讨论】：

谢谢，但是为什么我在 findall 之后不能打印整个 mac？
将其更改为使用不匹配的组。现在应该可以工作了。
一旦您可以匹配 MAC 地址，您也可以解决问题的第二部分。 FYI this is a great website to test your regexps.
@SlothGR 这对于匹配 mac 地址不正确。请参阅此处regex101.com/r/kP8uF5/7
mac 地址具有特定格式。您的正则表达式无法正确验证它。它也验证了太多其他字符串。请参阅我发布的链接。它们不是 mac 地址，但仍会被解析跨度>

【解决方案2】：

([0-9a-f]{2}(?::[0-9a-f]{2}){5})

试试这个。查看演示。

http://regex101.com/r/kP8uF5/5

import re
p = re.compile(ur'([0-9a-f]{2}(?::[0-9a-f]{2}){5})', re.IGNORECASE)
test_str = u"TEXT WITH SOME MAC ADDRESSES 00:24:17:b1:cc:cc TEXT CONTINUES WITH SOME MORE TEXT 20:89:86:9a:86:24"

re.findall(p, test_str)

【讨论】：

【解决方案3】：

我也必须匹配 MAC 地址，这很有效：((?:[\da-fA-F]{2}[:\-]){5}[\da-fA-F]{2})

我用这个实时正则表达式测试器对其进行了测试：https://regex101.com/#python 它对每条正则表达式的作用进行了很好的细分。

【讨论】：

【解决方案4】：

text = "this is aa:bb:cc:dd:01:02 test for aa-bb-cc-dd-ee-ff and AABBCCDDEEFF is a mac address without separator"

让我们提取mac地址

def extract_mac_address(text):
    pattern = '(([0-9a-fA-F]{2}[:]){5}([0-9a-fA-F]{2})|([0-9a-fA-F]{2}[-]){5}([0-9a-fA-F]{2})|[0-9a-fA-F]{12})'
    mac_addr_list = re.findall(pattern, text)
    return list(map(lambda x: x[0], mac_addr_list))


print(extract_mac_address(text))

输出为['aa:bb:cc:dd:01:02', 'aa-bb-cc-dd-ee-ff', 'AABBCCDDEEFF']

【讨论】：

【解决方案5】：

单个MAC地址的最佳正则表达式匹配，最后没有溢出：

import re

regex = r"^((([a-f0-9]{2}:){5})|(([a-f0-9]{2}-){5}))[a-f0-9]{2}$"

test_str = "89:89:89:89:89:89"

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 1, re.IGNORECASE)

if result:
    print (result)

参考：https://regexpattern.com/mac-address/

【讨论】：

您的答案可以通过额外的支持信息得到改进。请edit 添加更多详细信息，例如引用或文档，以便其他人可以确认您的答案是正确的。你可以找到更多关于如何写好答案的信息in the help center。

【解决方案6】：

import re
print(re.search("([a-f0-9A-F]{4}[.]){2}[a-f0-9A-F]{4}", "0000.aaaa.bbbb").group())

[a-f0-9A-F]{4} -> matches for four occurrences of a-f or 0-9, A-F 

here it will search only for one mac in a string.if you want to search for more than one occurance we need to use re.findall.

【讨论】：

请补充说明。如果没有说明它们如何解决问题，纯代码答案的用处有限。
如果你想搜索多个 mac 地址，我们需要使用 re.findall 并编辑正则表达式以避免匹配非十六进制数字