【发布时间】:2017-04-22 16:50:06
【问题描述】:
我有一个表targeting,其中有一列marital_status,类型为text[],另一列data,类型为jsonb。这两列的内容是一样的,只是格式不同(只是为了演示)。示例数据:
id | marital_status | data
----+--------------------------+---------------------------------------------------
1 | null | {}
2 | {widowed} | {"marital_status": ["widowed"]}
3 | {never_married,divorced} | {"marital_status": ["never_married", "divorced"]}
...
表中有超过690K条记录随机组合。
在 text[] 列上查找
EXPLAIN ANALYZE SELECT marital_status
FROM targeting
WHERE marital_status @> '{widowed}'::text[]
无索引
通常需要
Seq Scan on targeting (cost=0.00..172981.38 rows=159061 width=28) (actual time=0.017..840.084 rows=158877 loops=1)
Filter: (marital_status @> '{widowed}'::text[])
Rows Removed by Filter: 452033
Planning time: 0.150 ms
Execution time: 845.731 ms
带索引
使用索引通常需要
CREATE INDEX targeting_marital_status_idx ON targeting ("marital_status");
结果:
Index Only Scan using targeting_marital_status_idx on targeting (cost=0.42..23931.35 rows=159061 width=28) (actual time=3.528..143.848 rows=158877 loops=1)"
Filter: (marital_status @> '{widowed}'::text[])
Rows Removed by Filter: 452033
Heap Fetches: 0
Planning time: 0.217 ms
Execution time: 148.506 ms
在 jsonb 列上查找
EXPLAIN ANALYZE SELECT data
FROM targeting
WHERE (data -> 'marital_status') @> '["widowed"]'::jsonb
无索引
通常需要
Seq Scan on targeting (cost=0.00..174508.65 rows=611 width=403) (actual time=0.095..5399.112 rows=158877 loops=1)
Filter: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Rows Removed by Filter: 452033
Planning time: 0.172 ms
Execution time: 5408.326 ms
带索引
使用索引通常需要
CREATE INDEX targeting_data_marital_status_idx ON targeting USING GIN ((data->'marital_status'));
结果:
Bitmap Heap Scan on targeting (cost=144.73..2482.75 rows=611 width=403) (actual time=85.966..3694.834 rows=158877 loops=1)
Recheck Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Rows Removed by Index Recheck: 201080
Heap Blocks: exact=33723 lossy=53028
-> Bitmap Index Scan on targeting_data_marital_status_idx (cost=0.00..144.58 rows=611 width=0) (actual time=78.851..78.851 rows=158877 loops=1)"
Index Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Planning time: 0.257 ms
Execution time: 3703.492 ms
问题
- 为什么
text[]列的性能如此出色,即使不使用索引也是如此? - 为什么向
jsonb列添加索引只能将性能提高 35%? - 有没有更高效的方法来查找
jsonb列?
【问题讨论】:
-
一个不同之处在于返回的数据。在一个上选择文本,在另一个上选择 jsonb。用
SELECT 1 FROM ...运行它们怎么样,所以输出完全相同。有什么影响吗? -
GIN 索引的效率通常低于 b-tree,因此可以预期。令我惊讶的是没有索引的速度有多慢。是否将所有时间都花在 CPU 上?
标签: postgresql indexing postgresql-9.4 jsonb indices