Python正则表达式搜索十六进制字节答案

【问题标题】：Python regex search for hexadecimal bytesPython正则表达式搜索十六进制字节
【发布时间】：2014-12-29 22:54:56
【问题描述】：

我正在尝试在二进制文件中搜索一系列十六进制值，但是，我遇到了一些我无法完全解决的问题。 (1) 我不确定如何搜索整个文件并返回所有匹配项。目前我 f.seek 只在我认为的价值范围内进行，这是不好的。 (2) 我想以十进制或十六进制返回可能匹配的偏移量，尽管我每次都得到 0，所以我不确定我做错了什么。

example.bin

AA BB CC DD EE FF AB AC AD AE AF BA BB BC BD BE
BF CA CB CC CD CE CF DA DB DC DD DE DF EA EB EC

代码：

# coding: utf-8
import struct
import re

with open("example.bin", "rb") as f:
    f.seek(30)
    num, = struct.unpack(">H", f.read(2))
hexaPattern = re.compile(r'(0xebec)?')
m = re.search(hexaPattern, hex(num))
if m:
   print "found a match:", m.group(1)
   print " match offset:", m.start()

也许有更好的方法来完成这一切？

【问题讨论】：

文件有多大？
文件大小范围为 100 KB 到 10 MB。

标签： python regex binary seek

【解决方案1】：

我不确定如何搜索整个文件并返回所有匹配项。

我想以十进制或十六进制返回偏移量

import re

f = open('data.txt', 'wb')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.close()

f = open('data.txt', 'rb')
data = f.read()
f.close()

pattern = "\xEB\xEC"
regex = re.compile(pattern)

for match_obj in regex.finditer(data):
    offset = match_obj.start()
    print "decimal: {}".format(offset)
    print "hex(): " + hex(offset)
    print 'formatted hex: {:02X} \n'.format(offset)

--output:--
decimal: 2
hex(): 0x2
formatted hex: 02 

decimal: 6
hex(): 0x6
formatted hex: 06 

decimal: 10
hex(): 0xa
formatted hex: 0A 

decimal: 14
hex(): 0xe
formatted hex: 0E 

decimal: 18
hex(): 0x12
formatted hex: 12 

decimal: 22
hex(): 0x16
formatted hex: 16 

decimal: 26
hex(): 0x1a
formatted hex: 1A

文件中的位置使用基于 0 的索引，如列表。

e.finditer(pattern, string, flags=0)
返回产生 MatchObject 实例的迭代器字符串中 RE 模式的非重叠匹配。字符串是从左到右扫描，并按找到的顺序返回匹配项。

匹配对象支持以下方法和属性：
开始（[组]）
结束（[组]）
返回开始和结束的索引组匹配的子字符串；组默认为零（意味着整个匹配的子字符串）。

https://docs.python.org/2/library/re.html

【讨论】：

很好，效果很好。感谢您的解释，非常有帮助！
@DIF，如果您不想在偏移十六进制字符串前出现“0x”，您可以使用 format():print '{:02X}'.format(offset)（也为所有十六进制代码创建两位数字）。

【解决方案2】：

试试

import re

with open("example.bin", "rb") as f:
    f1 = re.search(b'\xEB\xEC', f.read())

print "found a match:", f1 .group()
print " match offset:", f1 .start()

【讨论】：

谢谢，这几乎是完美的。有没有办法让 f1.group() 显示为十六进制？
将f1.group() 打印为十六进制：print(f1.group().hex())。