【发布时间】:2019-07-13 21:56:53
【问题描述】:
我有以下格式化数据:
testing 25 `this is a test`
hello `world hello world`
log "log1" "log2" `third log`
我目前正在使用正则表达式和shlex的组合,但我遇到了问题,如上所示
import re, shlex
def tokenize(line):
graveKeyPattern = re.compile(r'^ *(.*) (`.*`) *')
if '`' in line:
tokens = re.split(graveKeyPattern, line)
tokens = tokens[1:3]
else:
tokens = shlex.split(line)
#end if/else
print(tokens)
return tokens
#end tokenize
lines = []
lines.append('testing 25 `this is a test`')
lines.append('hello `world hello world`')
lines.append('log "log1" "log2" `third log`')
lines.append('testing2 "testing2 in quotes" 5')
for line in lines:
tokenize(line)
这是我得到的输出:
['testing 25', '`this is a test`']
['hello', '`world hello world`']
['log "log1" "log2"', '`third log`']
['testing2', 'testing2', 'in', 'quotes', '5']
这是我需要的输出:
['testing', '25', '`this is a test`']
['hello', '`world hello world`']
['log', 'log1', 'log2', '`third log`']
['testing2', 'testing2 in quotes', '5']
【问题讨论】:
-
试试
["{}{}{}{}".format(x.group(1),x.group(2),x.group(3),x.group(4)) for x in re.finditer(r'''`([^`]*)`|"([^"]*)"|'([^']*)'|(\S+)''', line)]或["{}{}{}{}".format(a,b,c,d) for a,b,c,d in re.findall(r'''`([^`]*)`|"([^"]*)"|'([^']*)'|(\S+)''', line)] -
啊,
"`([^`]*)`"实际上必须是(`[^`]*`),因为反引号应该在里面。
标签: python regex python-3.x lexical-analysis