优化使用子选择进行分页的一对多查询答案

【问题标题】：Optimizing a one to many query that's using a subselect to paginate优化使用子选择进行分页的一对多查询
【发布时间】：2018-08-13 23:42:14
【问题描述】：

我希望得到一些专家的关注，看看我的查询为什么会收到不同的性能。

我要解决的问题是我需要可以包含一对多项目的订单。这些订单需要分页。

为此，我采用了以下方法。我正在使用子查询按所需的项目属性过滤订单。然后，我将重新加入这些项目以获取其必填字段。这意味着在分页时，当订单包含 2 个或更多商品时，我不会错误地过滤订单行。

我看到间歇性缓慢的查询。第二次运行时，它们的速度要快得多。我想这是因为 Postgres 正在将索引等加载到内存中？

从解释中我不完全理解发生了什么。看起来它需要扫描每个订单以查看他们是否有适合子查询的项目？我对以下行有点困惑。是说它需要扫描 286853 行但也只有 165 行？

Index Scan Backward using orders_created_at_idx on orders  (cost=0.42..2708393.65 rows=286853 width=301) (actual time=64.598..2114.676 rows=165 loops=1)

有没有办法让 Postgres 先按项目过滤，还是我读错了，它已经这样做了？

查询：

SELECT 
  "orders"."id_orders" as "orders.id_orders", 
  "items"."id_items" as "items"."id_items",
  ..., 
  orders.created_at, orders.updated_at 
FROM (
  SELECT 
    orders.id_orders,
    orders.created_at,
    orders.updated_at
  FROM orders 
  WHERE orders.status in ('completed','pending') AND 
  (
    SELECT fk_vendor_id FROM items
    WHERE (
      items.fk_order_id = orders.id_orders AND
      items.fk_vendor_id = '0012800001YVccUAAT' AND
      items.fk_offer = '0060I00000RAKFYQA5' AND
      items.status IN ('completed','cancelled')
    ) LIMIT 1
  ) IS NOT NULL ORDER BY orders.created_at DESC LIMIT 50 OFFSET 150
) as orders INNER JOIN items ON items.fk_order_id = orders.id_orders;

第一次解释：

Nested Loop  (cost=1417.11..2311.77 rows=67 width=1705) (actual time=2785.221..17025.325 rows=17 loops=1)
  ->  Limit  (cost=1416.68..1888.77 rows=50 width=301) (actual time=2785.216..17024.918 rows=15 loops=1)
        ->  Index Scan Backward using orders_created_at_idx on orders  (cost=0.42..2708393.65 rows=286853 width=301) (actual time=1214.013..17024.897 rows=165 loops=1)
              Filter: ((status = ANY ('{completed,pending}'::orders_status_enum[])) AND ((SubPlan 1) IS NOT NULL))
              Rows Removed by Filter: 313631
              SubPlan 1
                ->  Limit  (cost=0.42..8.45 rows=1 width=19) (actual time=0.047..0.047 rows=0 loops=287719)
                      ->  Index Scan using items_fk_order_id_index on items items_1  (cost=0.42..8.45 rows=1 width=19) (actual time=0.047..0.047 rows=0 loops=287719)
                            Index Cond: (fk_order_id = orders.id_orders)
                            Filter: ((status = ANY ('{completed,cancelled}'::items_status_enum[])) AND (fk_vendor_id = '0012800001YVccUAAT'::text) AND (fk_offer = '0060I00000RAKFYQA5'::text))
                            Rows Removed by Filter: 1
  ->  Index Scan using items_fk_order_id_index on items  (cost=0.42..8.44 rows=1 width=1404) (actual time=0.002..0.026 rows=1 loops=15)
        Index Cond: (fk_order_id = orders.id_orders)
Planning time: 1.791 ms
Execution time: 17025.624 ms
(15 rows)

第二次解释：

Nested Loop  (cost=1417.11..2311.77 rows=67 width=1705) (actual time=115.659..2114.739 rows=17 loops=1)
  ->  Limit  (cost=1416.68..1888.77 rows=50 width=301) (actual time=115.654..2114.691 rows=15 loops=1)
        ->  Index Scan Backward using orders_created_at_idx on orders  (cost=0.42..2708393.65 rows=286853 width=301) (actual time=64.598..2114.676 rows=165 loops=1)
              Filter: ((status = ANY ('{completed,pending}'::orders_status_enum[])) AND ((SubPlan 1) IS NOT NULL))
              Rows Removed by Filter: 313631
              SubPlan 1
                ->  Limit  (cost=0.42..8.45 rows=1 width=19) (actual time=0.006..0.006 rows=0 loops=287719)
                      ->  Index Scan using items_fk_order_id_index on items items_1  (cost=0.42..8.45 rows=1 width=19) (actual time=0.006..0.006 rows=0 loops=287719)
                            Index Cond: (fk_order_id = orders.id_orders)
                            Filter: ((status = ANY ('{completed,cancelled}'::items_status_enum[])) AND (fk_vendor_id = '0012800001YVccUAAT'::text) AND (fk_offer = '0060I00000RAKFYQA5'::text))
                            Rows Removed by Filter: 1
  ->  Index Scan using items_fk_order_id_index on items  (cost=0.42..8.44 rows=1 width=1404) (actual time=0.002..0.002 rows=1 loops=15)
        Index Cond: (fk_order_id = orders.id_orders)
Planning time: 2.011 ms
Execution time: 2115.052 ms
(15 rows)

订单索引：

"cart_pkey" PRIMARY KEY, btree (id_orders)
"orders_legacy_id_uindex" UNIQUE, btree (legacy_id_orders)
"orders_transaction_key_uindex" UNIQUE, btree (transaction_key)
"orders_created_at_idx" btree (created_at)
"orders_customer_email_idx" gin (customer_email gin_trgm_ops)
"orders_customer_full_name_idx" gin (customer_full_name gin_trgm_ops)
Referenced by:
TABLE "items" CONSTRAINT "items_fk_order_id_fkey" FOREIGN KEY (fk_order_id) REFERENCES orders(id_orders) ON DELETE RESTRICT
TABLE "items_log" CONSTRAINT "items_log_fk_order_id_fkey" FOREIGN KEY (fk_order_id) REFERENCES orders(id_orders)

项目索引：

"items_pkey" PRIMARY KEY, btree (id_items)
"items_fk_vendor_id_booking_number_unique" UNIQUE, btree (fk_vendor_id, booking_number) WHERE legacy_id_items IS NULL
"items_legacy_id_uindex" UNIQUE, btree (legacy_id_items)
"items_transaction_key_uindex" UNIQUE, btree (transaction_key)
"items_booking_number_index" btree (booking_number)
"items_fk_order_id_index" btree (fk_order_id)
"items_fk_vendor_id_index" btree (fk_vendor_id)
"items_status_index" btree (status)

Foreign-key constraints:
"items_fk_order_id_fkey" FOREIGN KEY (fk_order_id) REFERENCES orders(id_orders) ON DELETE RESTRICT

【问题讨论】：

我自己的理解有些有限，但我会沿着您指出的“索引向后扫描”行的方向前进，这是优化器期望为查询返回的内容与实际返回的内容。看看查询提示，看看你能不能告诉安静的星球你只从 orders_created_at_idx 表中返回了几行

标签： sql postgresql optimization query-optimization

【解决方案1】：

执行时间的差异可能真的是缓存的效果。您可以使用EXPLAIN (ANALYZE, BUFFERS) 查看在数据库缓存中找到了多少页。

为了使您的查询更具可读性，您应该重写

WHERE (
   SELECT fk_vendor_id FROM items
   WHERE (
     items.fk_order_id = orders.id_orders AND
     items.fk_vendor_id = '0012800001YVccUAAT' AND
     items.fk_offer = '0060I00000RAKFYQA5' AND
     items.status IN ('completed','cancelled')
   ) LIMIT 1
) IS NOT NULL

到

WHERE NOT EXISTS
   (SELECT 1 FROM items
    WHERE items.fk_order_id = orders.id_orders
      AND items.fk_vendor_id = '0012800001YVccUAAT'
      AND items.fk_offer = '0060I00000RAKFYQA5'
      AND items.status IN ('completed','cancelled')
   )

加快查询速度的最佳方法是创建索引：

CREATE INDEX ON items(fk_order_id, fk_vendor_id, fk_offer);

【讨论】：

感谢劳伦兹。我将测试添加多个索引。就子查询而言，我是否正确地假设 SQL 足够聪明，可以通过首先过滤项目来限制扫描的订单？还是检查每个订单是否有有效商品？
使用[NOT] EXISTS 子句，扫描items 将在找到第一个结果后立即停止。
干杯，这是有道理的。如果我错了，请纠正我，但这是我认为计算查询的方式。首先处理子计划并找到一组项目。接下来，对子计划中具有项目外键的每个订单进行排序，并根据限制和偏移量返回一个子集。然后加入发生。我只是想确认我的想法是正确的，并且首先完成子计划以过滤订单。
差不多。该计划是在执行开始之前创建的。从orders_created_at_idx 扫描一行。如果相应的表行通过过滤器，则执行子计划并扫描items items_1 以查找匹配项。如果找到匹配项，则扫描items 以查找匹配行并形成结果行。重复此过程，直到找不到更多行。 PostgreSQL 不会具体化整个结果集，它会在每个结果行可用时立即发送它。
啊，好吧，所以如果通常只有少量项目会匹配子查询，比如 300,000 行中的 100 行，那么编写一个首先过滤项目的查询会更有效率然后加入回订单？