【发布时间】:2025-11-27 17:05:01
【问题描述】:
所以我有一个 ZIP 阅读器库,我首先通过找出 EOCD 记录的位置来阅读 ZIP 文件(“从尾部”的标准方式)。我必须寻找一个大致是这样的模式:
4byte_magic_number, fixed_n_bytes, 2_bytes_of_comment_size, comment
comment 的字节大小在2_bytes_of_comment_size 中提供。仅扫描幻数是不够的,因为我急切地阅读了文件尾部的大部分内容 - 基本上是 ZIP EOCD 记录的最大大小,然后在其中查找此模式。
到目前为止,我想出了这个
def locate_eocd_signature(in_str)
# We have to scan from the _very_ tail. We read the very minimum size
# the EOCD record can have (up to and including the comment size), using
# a sliding window. Once our end offset matches the comment size we found our
# EOCD marker.
eocd_signature_int = 0x06054b50
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
end_location = minimum_record_size * -1
loop do
# If the window is nil, we have rolled off the start of the string, nothing to do here.
# We use negative values because if we used positive slice indices
# we would have to detect the rollover ourselves
break unless window = in_str[end_location, minimum_record_size]
window_location = in_str.bytesize + end_location
unpacked = window.unpack(unpack_pattern)
# If we found the signature, pick up the comment size, and check if the size of the window
# plus that comment size is where we are in the string. If we are - bingo.
if unpacked[0] == 0x06054b50 && comment_size = unpacked[-1]
assumed_eocd_location = in_str.bytesize - comment_size - minimum_record_size
# if the comment size is where we should be at - we found our EOCD
return assumed_eocd_location if assumed_eocd_location == window_location
end
end_location -= 1 # Shift the window back, by one byte, and try again.
end
end
但它只是对我尖叫丑。有没有更好的方法来做这样的事情?是否有一个 pack 说明符表示“直到字符串末尾的所有二进制字节”我不知道?然后我可以将它附加到包说明符的末尾......这里有点不知所措。
【问题讨论】:
-
也许你可以使用正则表达式,但如果你想避免丑陋,那可能是错误的引导方式。清理它的一种方法是将常量移动到实际常量中并将其封装在类或模块中。还要使用你的常量,而不是在你的代码中撒上相同的神奇数字。
-
我在module 中使用了足够多的常量,它来自:-),但要点是。在这种情况下,正则表达式实际上似乎是一个传递应用程序......
-
请注意不要使用正则表达式来表示死亡。 It can happen to the best of us.
标签: ruby algorithm zip substring