您可以找到子查询名称和关联的字段,然后构建所需的字典:
import re, collections
qry = '\nwith\n\nqry_1 as ( select some code, var as var_1 from apple where code.. and code..\n),\nqry_2 as ( select some code, where var as var_2 from pear where code.. and code..\n),\nqry_3 as ( select some code from strawberry join some code, from apple where var as var_3, )\n)\n'
d, d1 = collections.defaultdict(list), {}
for i in re.split('(?<=\),)\n', qry):
a, *_b = re.findall('\w+(?=\sas\s\()|(?<=from\s)\w+', i)
b = [i for i in _b if i in sub]
for k in b:
d[k].append(a)
d1[a] = b
print(dict(d))
print(dict(d1))
输出:
{'apple': ['qry_1', 'qry_3'], 'pear': ['qry_2'], 'strawberry': ['qry_3']}
{'qry_1': ['apple'], 'qry_2': ['pear'], 'qry_3': ['strawberry', 'apple']}
编辑:由于您的查询很复杂,我建议使用sqlparse 包。 sqlparse 将创建一个可导航的结构,可以遍历该结构以获取所需的信息。
首先,安装sqlparse:
pip3 install sqlparse
然后,解析并遍历查询。函数get_fields 搜索出现在from 或join 关键字之后的标识符。这些标识符可以是表名或查询。参数all_identifiers 将获取任何标识符语句,无论它是否执行from 或join。在解析问题的上下文中,将此参数设置为True 将搜索select 块选择的字段,以及from 或join 之后的标识符:
import sqlparse
from sqlparse import tokens as T
sub = ['apple.apple','event.pear','strawberry']
qry = """
with qry_1 as (
select a.* from apple.apple a
),
with qry_2 as (
select a.* from apple a join strawberry s on a.id = s.id
),
with qry_3 as (
select a.* from (select k.* from event.pear p) l join apple.apple a on l.id = a.id join (select x.* s from strawberry s where s.m = (select max(l) from ignore_field where l.id = s.id)) k3 on k3 = a.id
)
"""
def get_fields(block, all_identifiers = False):
seen_id = all_identifiers
for i in getattr(block, 'tokens', []):
if i.ttype == T.Keyword and i.value.lower() in {'from', 'join'}:
seen_id = True
if seen_id and isinstance(i, sqlparse.sql.Identifier):
yield i.get_alias()
if any(isinstance(k, sqlparse.sql.Parenthesis) for k in getattr(i, 'tokens', [])):
yield from get_fields(i, all_identifiers = seen_id)
else:
yield from re.findall('^[\w+\.]+|\w+', str(i))
elif seen_id:
yield from get_fields(i, all_identifiers = seen_id)
p = sqlparse.parse(qry)
k = {i.tokens[0].value:list(get_fields(i.tokens[-1])) for j in p for i in j.tokens if isinstance(i, sqlparse.sql.Identifier)}
d1, d2 = collections.defaultdict(list), {}
for a, _b in k.items():
for i in (b:=[j for j in _b if j in sub]):
d1[i].append(a)
d2[a] = b
print(dict(d1))
print(dict(d2))
输出:
{'apple.apple': ['qry_1', 'qry_3'], 'strawberry': ['qry_2', 'qry_3'], 'event.pear': ['qry_3']}
{'qry_1': ['apple.apple'], 'qry_2': ['strawberry'], 'qry_3': ['event.pear', 'apple.apple', 'strawberry']}
注意事项:
- 目前只搜索
from/join关键字之后的标识符。要搜索在 select 关键字之后选择的字段名称,请使用 list(get_fields(i.tokens[-1], True))。
-
get_fields 也将产生 yield 子查询/表别名,即如果 apple.apple a 存在,那么 a 也将与 apple.apple 一起产生。如果您不希望出现这种行为,只需将 yield i.get_alias() 注释掉即可。