【发布时间】:2015-05-29 15:42:13
【问题描述】:
我是 pyparsing 模块的新手,我正在尝试从超过 60000 行的文件中解析下面显示的示例字符串。我需要从每一行中提取数据。然而,目前的实施似乎太慢了。我的代码中是否有一些多余的东西可以优化? 到目前为止,对于具有多处理功能的 23MB 文件,我大约需要 2 分钟,而我的主要性能瓶颈是解析例程。
Example strings :
Mar 16 14:12:25.989 [ABC] [ID=0][core#16] [65536][3131927075092] random: message
or
Mar 23 13:57:07.888 [123] [core#2 ] [00][3851708823] random message 2
Grammar:
nums:: '0'...'9'
num:: (nums+)
words:: 'a'..'z' 'A'...'Z'
word:: (words+)
colon:: ':'
time:: ((num) + colon)+ '.' + (num)
date:: (word) + (num) + (time)
open brace:: "["
close brace:: "]"
is AP:: (open brace) + (word) + (close brace)
is BP:: (open brace) + (num) + (close brace)
oct id:: (open brace) + (word) + ("=") + (num) + (close brace)
core id:: (open brace) + (word) + ("#") + (num) + (close brace)
ppm id:: (open brace) + (num) + (close brace)
oct timestamp:: (open brace) + (num) + (close brace)
hexnum:: (hexnums+)
pcap dump:: (hexnum +(":")) + (hexnum)+
tags:: (date) + (is AP|is BP)? + (oct id)? + (core id) + (ppm id)? + (oct timestamp)? + (pcap dump)?
'''
self.num = Word(nums)
self.word = Word(alphas)
self.open_brace = Suppress(Literal("["))
self.close_brace = Suppress(Literal("]"))
self.colon = Literal(":")
self.stime = Combine(OneOrMore(self.num + self.colon) + self.num + Literal(".") + self.num)
self.date = OneOrMore(self.word) + self.num + self.stime
self.is_ap = self.open_brace + self.word + self.close_brace
self.is_bp = self.open_brace + self.num + self.close_brace
self.oct_id = self.open_brace + Suppress(self.word) + Suppress(Literal("=")) \
+ self.num + self.close_brace
self.core_id = self.open_brace + Suppress(self.word) + Suppress(Literal("#")) \
+ self.num + self.close_brace
self.ppm_id = self.open_brace + self.num + self.close_brace
self.oct_ts = self.open_brace + self.num + self.close_brace
self.dump = Suppress(Word(hexnums) + Literal(":")) + OneOrMore(Word(hexnums))
self.opening = Suppress(self.date) \
+ Optional(self.is_ap.setResultsName("AP")|self.is_bp.setResultsName("BP")) \
+ Optional(self.oct_id.setResultsName("oct").setParseAction(lambda toks:int(toks[0]))) \
+ self.core_id.setResultsName("core").setParseAction(lambda toks:int(toks[0])) \
+ Optional(self.ppm_id.setResultsName("ppm").setParseAction(lambda toks:int(toks[0])) \
+ self.oct_ts.setResultsName("timestamp").setParseAction(lambda toks:int(toks[0]))) \
+ Optional(self.dump.setResultsName("pcap"))
【问题讨论】:
标签: multiprocessing python-2.6 pyparsing