【问题标题】:What is the best way to join the same table twice in PostgreSQL?在 PostgreSQL 中两次加入同一个表的最佳方法是什么?
【发布时间】:2020-11-10 14:40:53
【问题描述】:

同一张表上第二次连接的性能下降了近一半

SELECT * FROM party_party_relationship AS ppr 
    LEFT JOIN party_role AS r1 ON r1.party_role_uid = ppr.party_role_uid
    LEFT JOIN party_role AS r2 ON r2.party_role_uid = ppr.party_role_uid_related

首次加入时的表现

"Hash Left Join  (cost=288.18..547.72 rows=10972 width=144) (actual time=5.281..17.781 rows=11192 loops=1)"
"  Hash Cond: (ppr.party_role_uid = r1.party_role_uid)"
"  ->  Seq Scan on party_party_relationship ppr  (cost=0.00..230.72 rows=10972 width=98) (actual time=0.020..2.438 rows=11192 loops=1)"
"  ->  Hash  (cost=181.97..181.97 rows=8497 width=46) (actual time=5.186..5.187 rows=9946 loops=1)"
"        Buckets: 16384  Batches: 1  Memory Usage: 823kB"
"        ->  Seq Scan on party_role r1  (cost=0.00..181.97 rows=8497 width=46) (actual time=0.010..2.073 rows=9946 loops=1)"
"Planning Time: 0.472 ms"
"Execution Time: 18.765 ms"

在同一张表上进行第二次连接的性能几乎翻了一番

"Hash Left Join  (cost=576.37..864.71 rows=10972 width=190) (actual time=9.871..31.986 rows=11192 loops=1)"
"  Hash Cond: (ppr.party_role_uid_related = r2.party_role_uid)"
"  ->  Hash Left Join  (cost=288.18..547.72 rows=10972 width=144) (actual time=5.163..18.437 rows=11192 loops=1)"
"        Hash Cond: (ppr.party_role_uid = r1.party_role_uid)"
"        ->  Seq Scan on party_party_relationship ppr  (cost=0.00..230.72 rows=10972 width=98) (actual time=0.015..2.735 rows=11192 loops=1)"
"        ->  Hash  (cost=181.97..181.97 rows=8497 width=46) (actual time=5.091..5.092 rows=9946 loops=1)"
"              Buckets: 16384  Batches: 1  Memory Usage: 823kB"
"              ->  Seq Scan on party_role r1  (cost=0.00..181.97 rows=8497 width=46) (actual time=0.008..2.030 rows=9946 loops=1)"
"  ->  Hash  (cost=181.97..181.97 rows=8497 width=46) (actual time=4.644..4.644 rows=9946 loops=1)"
"        Buckets: 16384  Batches: 1  Memory Usage: 823kB"
"        ->  Seq Scan on party_role r2  (cost=0.00..181.97 rows=8497 width=46) (actual time=0.014..1.810 rows=9946 loops=1)"
"Planning Time: 0.925 ms"
"Execution Time: 32.920 ms"

以上查询只是整个查询的一部分。

SELECT * FROM party_party_relationship AS ppr 
    INNER JOIN party_role AS r1 ON r1.party_role_uid = ppr.party_role_uid
        INNER JOIN party AS p1 ON p1.party_uid = r1.party_uid
                LEFT JOIN party_name AS n1 ON n1.party_uid = p1.party_uid AND n1.end_date IS NULL
                LEFT JOIN business_number AS b1 ON b1.party_uid = p1.party_uid AND b1.business_number_cd = p1.business_number_cd AND b1.end_date IS NULL

    INNER JOIN party_role AS r2 ON r2.party_role_uid = ppr.party_role_uid_related
        INNER JOIN party AS p2 ON p2.party_uid = r2.party_uid
                LEFT JOIN party_name AS n2 ON n2.party_uid = p2.party_uid AND n2.end_date IS NULL
                LEFT JOIN business_number AS b2 ON b2.party_uid = p2.party_uid AND b2.business_number_cd = p2.business_number_cd AND b2.end_date IS NULL
                
                WHERE ppr.case_uid = 9

执行计划

"Nested Loop Left Join  (cost=1113.46..3576.37 rows=915 width=772) (actual time=19.687..76.911 rows=919 loops=1)"
"  ->  Nested Loop Left Join  (cost=1113.31..3270.33 rows=915 width=694) (actual time=19.616..56.253 rows=919 loops=1)"
"        Join Filter: (n1.end_date IS NULL)"
"        ->  Hash Left Join  (cost=1113.03..2415.51 rows=915 width=547) (actual time=19.588..51.236 rows=915 loops=1)"
"              Hash Cond: (r1.party_uid = p2.party_uid)"
"              ->  Hash Left Join  (cost=856.60..2156.68 rows=915 width=481) (actual time=15.192..45.391 rows=915 loops=1)"
"                    Hash Cond: (ppr.party_role_uid_related = r2.party_role_uid)"
"                    ->  Nested Loop Left Join  (cost=568.42..1866.09 rows=915 width=435) (actual time=9.743..38.415 rows=915 loops=1)"
"                          ->  Nested Loop Left Join  (cost=568.27..1560.05 rows=915 width=357) (actual time=9.665..17.956 rows=915 loops=1)"
"                                ->  Hash Left Join  (cost=567.99..705.23 rows=915 width=210) (actual time=9.639..12.460 rows=915 loops=1)"
"                                      Hash Cond: (r1.party_uid = p1.party_uid)"
"                                      ->  Hash Left Join  (cost=311.56..446.40 rows=915 width=144) (actual time=5.314..7.056 rows=915 loops=1)"
"                                            Hash Cond: (ppr.party_role_uid = r1.party_role_uid)"
"                                            ->  Bitmap Heap Scan on party_party_relationship ppr  (cost=23.38..155.81 rows=915 width=98) (actual time=0.111..0.536 rows=915 loops=1)"
"                                                  Recheck Cond: (insolvency_case_uid = 9)"
"                                                  Heap Blocks: exact=18"
"                                                  ->  Bitmap Index Scan on ixfk_party_party_relationship_insolvency_case  (cost=0.00..23.15 rows=915 width=0) (actual time=0.097..0.097 rows=926 loops=1)"
"                                                        Index Cond: (insolvency_case_uid = 9)"
"                                            ->  Hash  (cost=181.97..181.97 rows=8497 width=46) (actual time=5.149..5.149 rows=9960 loops=1)"
"                                                  Buckets: 16384  Batches: 1  Memory Usage: 824kB"
"                                                  ->  Seq Scan on party_role r1  (cost=0.00..181.97 rows=8497 width=46) (actual time=0.009..1.979 rows=9960 loops=1)"
"                                      ->  Hash  (cost=161.19..161.19 rows=7619 width=66) (actual time=4.290..4.290 rows=7449 loops=1)"
"                                            Buckets: 8192  Batches: 1  Memory Usage: 701kB"
"                                            ->  Seq Scan on party p1  (cost=0.00..161.19 rows=7619 width=66) (actual time=0.013..1.680 rows=7449 loops=1)"
"                                ->  Index Scan using ixfk_party_name_party on party_name n1  (cost=0.28..0.92 rows=1 width=147) (actual time=0.004..0.005 rows=1 loops=915)"
"                                      Index Cond: (party_uid = p1.party_uid)"
"                                      Filter: (end_date IS NULL)"
"                                      Rows Removed by Filter: 0"
"                          ->  Index Scan using ex_business_number_end_date on business_number b1  (cost=0.15..0.32 rows=1 width=78) (actual time=0.020..0.021 rows=1 loops=915)"
"                                Index Cond: ((party_uid = p1.party_uid) AND (business_number_cd = p1.business_number_cd))"
"                    ->  Hash  (cost=181.97..181.97 rows=8497 width=46) (actual time=5.293..5.293 rows=9960 loops=1)"
"                          Buckets: 16384  Batches: 1  Memory Usage: 824kB"
"                          ->  Seq Scan on party_role r2  (cost=0.00..181.97 rows=8497 width=46) (actual time=0.010..1.799 rows=9960 loops=1)"
"              ->  Hash  (cost=161.19..161.19 rows=7619 width=66) (actual time=4.313..4.314 rows=7449 loops=1)"
"                    Buckets: 8192  Batches: 1  Memory Usage: 701kB"
"                    ->  Seq Scan on party p2  (cost=0.00..161.19 rows=7619 width=66) (actual time=0.011..1.587 rows=7449 loops=1)"
"        ->  Index Scan using ixfk_party_name_party on party_name n2  (cost=0.28..0.92 rows=1 width=147) (actual time=0.003..0.003 rows=1 loops=915)"
"              Index Cond: (party_uid = p2.party_uid)"
"  ->  Index Scan using ex_business_number_end_date on business_number b2  (cost=0.15..0.32 rows=1 width=78) (actual time=0.020..0.020 rows=1 loops=919)"
"        Index Cond: ((party_uid = p2.party_uid) AND (business_number_cd = p2.business_number_cd))"
"Planning Time: 4.499 ms"
"Execution Time: 77.433 ms"

在图表中计划

有没有更好的方法呢?该表预计会增长得非常快。

【问题讨论】:

  • 您的查询从两个表中检索所有行。如果表增长,这将很慢。您无能为力。您可以尝试的一件事是只加入一次LEFT JOIN party_role AS r ON r.party_role_uid = any[ppr.party_role_uid, ppr.party_role_uid_related] - 结果略有不同,但它至少删除了一次 Seq Scan
  • 哪个执行计划属于完整查询?
  • 刚刚添加的最后一个区块 (Planning Time: 4.499 ms & Execution Time: 77.433 ms)。
  • 整个查询通过“INNER JOIN”连接两个party_role,但查询部分使用“LEFT JOIN”。查询使用“LEFT JOIN”是否正确? party_role 的party_role_uid 中是否有不匹配的数据?在“INNER JOIN party AS p2 ON p2.party_uid = r1.party_uid”和“LEFT JOIN party_name AS n2 ON n2.party_uid = p2.party_uid AND n1.end_date IS NULL”中,使用了 r1 和 n1。这是正确的吗?如果这些不正确,那么有一种方法可以使用 WITH 子句提前加入表 party_role、party、party_name 和 business_number。这样你就可以减少连接的数量。
  • 表有哪些索引?

标签: postgresql performance join database-design database-performance


【解决方案1】:

加入同一张桌子额外的时间(因此做两倍的工作)需要两倍的时间,这并不奇怪。然而,你的问题有点问题,因为它总是取决于 - 很多。不过,我有几点意见,希望对您有所帮助:

  • 您正在选择 SELECT 中的所有 (*)。这可能是为了这个示例,但对我来说,将* 更改为我想要的特定列可以减少多达 20% 的查询,具体取决于列数和表数。我自己的数据库中的一个示例:
Gather  (cost=43909.56..166695.17 rows=160724 width=704) (actual time=767.822..2555.382 rows=159158 loops=1)

becomes

Gather  (cost=38729.56..136383.17 rows=160724 width=65) (actual time=354.540..1603.087 rows=159158 loops=1)
  • 你加入party 是为了去party_namebusiness_number,我猜一个聚会可能只有一个。您还过滤了类似于id 的内容,因此您可能只想返回几行。假设您只需要 party_namebusiness_number 中的一件事,您可以将其移至 SELECT 内的标量查询。那么,你就不需要JOIN那里了,因为没有结果的查询返回NULL,你也不需要LEFT JOIN
SELECT ppr.case_uid
       (SELECT b.business_number
        FROM business_number AS b
        WHERE b.business_number_cd = p1.business_number_cd
          AND b.end_date IS NULL
       ) as p1_business_number,
       (SELECT n.party_name
        FROM party_uid AS n
        WHERE n.party_uid = p1.party_uid
          AND n.end_date IS NULL
       ) as p1_party_name,
       (SELECT b.business_number
        FROM business_number AS b
        WHERE b.business_number_cd = p2.business_number_cd
          AND b.end_date IS NULL
       ) as p2_business_number,
       (SELECT n.party_name
        FROM party_uid AS n
        WHERE n.party_uid = p2.party_uid
          AND n.end_date IS NULL
       ) as p2_party_name
FROM party_party_relationship AS ppr
INNER JOIN party_role AS r1 ON r1.party_role_uid = ppr.party_role_uid
INNER JOIN party AS p1 ON p1.party_uid = r1.party_uid
INNER JOIN party_role AS r2 ON r2.party_role_uid = ppr.party_role_uid_related
INNER JOIN party AS p2 ON p2.party_uid = r2.party_uid
WHERE ppr.case_uid = 9
  • 如果您不需要来自party 的任何其他内容,您也可以删除该JOIN 并将其移动到business_number 的标量子查询中。毕竟,你已经从party_role 认识了party_uid

【讨论】:

    猜你喜欢
    • 2023-03-21
    • 2017-12-10
    • 2020-09-27
    • 2020-03-27
    • 2011-05-03
    • 1970-01-01
    • 2010-10-16
    • 2010-12-23
    相关资源
    最近更新 更多