【发布时间】:2021-12-12 00:17:27
【问题描述】:
我必须从 21 个 SQL 表中提取数百万行以获得字符串列表。此外,我的查询有分组连接等。 我需要使用块以内存友好的方式运行它。我正在使用python2。 任何人都可以提出解决方案吗? 说这里是我的查询,只是一个例子。
query = """
SELECT
v.id AS varid,
v.chrom AS chrom,
v.vcf_pos AS vcf_pos,
v.vcf_ref AS vcf_ref,
v.vcf_alt AS vcf_alt,
group_concat(distinct term.term) as HPO_terms,
group_concat(distinct term.name) as HPO_names,
group_concat(distinct pp.patientid,"-",pp.person_status,"-",pp.affected_status) as family_label,
if(vcc.AF_Pat between 0 AND 1, vcc.AF_Pat, NULL) AS AF_Pat,
replace(vcc.analysistypelist,',',';') AS analysistypelist,
vcc2.HomPatCount AS AC,
vcc2.HetPatCount AS HAC,
vcc2.TotalPatCount AS TAAC,
if(vcc2.AF_Pat between 0 AND 1, vcc2.AF_Pat, NULL) AS AF_Assay,
vcc3.HomUnaffCount AS HCC,
vcc3.HetUnaffCount AS HCC1,
vcc3.TotalUnaffCount AS TTCC,
if(vcc3.AF_healthy between 0 AND 1, vcc3.AF_healthy, NULL) AS AF_Control,
g.gene_name as gene_name,
t.tx_name as tx_name,
ta.*,
va.*,
vc.*,
ga.*,
group_concat(
concat_ws(':', ifnull(g.gene_name,'.'), ifnull(t.tx_name,'.'), ifnull(ta.hgvsc,'.'), ifnull(ta.hgvsp,'.'))
SEPARATOR '|'
) as `AllTranscriptAnnotations`
FROM {} AS v
LEFT JOIN table1 vcc ON vcc.variant_id=v.id
LEFT JOIN table2 vcc3 ON vcc3.variant_id=v.id
LEFT JOIN table3 va on v.id=va.variant_id and va.status='active'
LEFT JOIN table4 vc on v.id=vc.variant_id
LEFT JOIN table5 ta on v.id=ta.variant_id and ta.status='active'
LEFT JOIN table6 t on t.id=ta.transcript_id and t.status='active'
LEFT JOIN table7 g on g.id=t.gene_id and g.status='active'
.
.
.
LEFT JOIN table21 pt on pt.term_id=term.id
GROUP BY v.id,s.id
HAVING 1 {}
""".format(v1,sq)
其中v1 和sq 是搜索字符串。
现在,我需要使上述查询内存高效或优化,目前完成提取需要 4 个多小时。
我正在寻找分而治之的东西。
【问题讨论】:
-
v控制行吗?也就是说,所有LEFT JOIN是否都提供了 1 行(可选地全部为 NULL)?