【发布时间】:2020-12-16 10:42:06
【问题描述】:
我收到了患者和治疗师的申请。它们都在同一个users 表中。患者应该能够看到他们的治疗师,而治疗师应该能够看到他们的患者。
我已经设置了一个带有用户 ID 对的物化视图 (user_access_pairs),如果两个用户在视图中有一行,那么这意味着他们应该可以相互访问。
database> \d user_access_pairs
+----------+---------+-------------+
| Column | Type | Modifiers |
|----------+---------+-------------|
| id1 | integer | |
| id2 | integer | |
+----------+---------+-------------+
Indexes:
"index_user_access_pairs" UNIQUE, btree (id1, id2)
这是users 表的定义,它有很多与这个问题无关的列。
database> \d users
+-----------------------------+-----------------------------+-----------------------------------------------------+
| Column | Type | Modifiers |
|-----------------------------+-----------------------------+-----------------------------------------------------|
| id | integer | not null default nextval('users_id_seq'::regclass) |
| first_name | character varying(255) | |
| last_name | character varying(255) | |
+-----------------------------+-----------------------------+-----------------------------------------------------+
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
我创建了一个 RLS 策略,该策略限制了哪些人可以使用 jwt 令牌读取 users。
create policy select_users_policy
on public.users
for select using (
(current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
select id1, id2 from user_access_pairs
)
);
这似乎合乎逻辑,但我的表现很糟糕。尽管那里有索引,但查询规划器会对 user_access_pairs 进行顺序扫描。
database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
select first_name, last_name
from users
+------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users (cost=231.84..547.19 rows=2386 width=14) (actual time=5.481..6.418 rows=2 loops=1) |
| Output: users.first_name, users.last_name |
| Filter: (hashed SubPlan 1) |
| Rows Removed by Filter: 4769 |
| SubPlan 1 |
| -> Seq Scan on public.user_access_pairs (cost=0.00..197.67 rows=13667 width=8) (actual time=0.005..1.107 rows=13667 loops=1) |
| Output: user_access_pairs.id1, user_access_pairs.id2 |
| Planning Time: 0.072 ms |
| Execution Time: 6.521 ms |
+------------------------------------------------------------------------------------------------------------------------------------+
但是,如果我切换到绕过 RLS 的超级用户角色并手动应用相同的过滤器,我会获得更好的性能。不应该是一样的吗?
database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
select first_name, last_name
from users
where (current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
select id1, id2 from user_access_pairs
)
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| QUERY PLAN
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Nested Loop (cost=4.59..27.86 rows=2 width=14) (actual time=0.041..0.057 rows=2 loops=1)
| Output: users.first_name, users.last_name
| Inner Unique: true
| -> Bitmap Heap Scan on public.user_access_pairs (cost=4.31..11.26 rows=2 width=4) (actual time=0.029..0.036 rows=2 loops=1)
| Output: user_access_pairs.id1, user_access_pairs.id2
| Filter: ((current_setting('jwt.claims.user_id'::text, true))::integer = user_access_pairs.id1)
| Heap Blocks: exact=2
| -> Bitmap Index Scan on index_user_access_pairs (cost=0.00..4.31 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1)
| Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)
| -> Index Scan using users_pkey on public.users (cost=0.28..8.30 rows=1 width=18) (actual time=0.008..0.008 rows=1 loops=2)
| Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask, users.reset_password_token, users.reset_password_sent_at, users.remember_created_at, users.sign_in_count, users.current_sign_in_at, users.last_sign_in_at,
| Index Cond: (users.id = user_access_pairs.id2)
| Planning Time: 0.526 ms
| Execution Time: 0.116 ms
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
为什么在进行查询时 RLS 不使用索引?
PS 我使用的是 PostgreSQL 版本 12.4
database> select version()
+-------------------------------------------------------------------------------------------------------------------------------+
| version |
|-------------------------------------------------------------------------------------------------------------------------------|
| PostgreSQL 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, 64-bit |
+-------------------------------------------------------------------------------------------------------------------------------+
编辑
感谢劳伦兹的回复。它大大提高了性能。 但我仍在进行一些 seq 扫描。
这是 Laurenz 建议的更新政策。
create policy select_users_policy
on public.users
for select using (
exists (
select 1
from user_access_pairs
where
id1 = current_setting('jwt.claims.user_id'::text, true)::integer
and id2 = users.id
)
);
即使策略中的 exists 查询正在使用索引,使用 RLS 查询此表仍然可以对 users 表进行 seq 扫描。
database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
select first_name, last_name
from users
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users (cost=0.00..40048.81 rows=2394 width=14) (actual time=0.637..1.216 rows=2 loops=1) |
| Output: users.first_name, users.last_name |
| Filter: (alternatives: SubPlan 1 or hashed SubPlan 2) |
| Rows Removed by Filter: 4785 |
| SubPlan 1 |
| -> Index Only Scan using index_user_access_pairs on public.user_access_pairs (cost=0.29..8.31 rows=1 width=0) (never executed) |
| Index Cond: ((user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) AND (user_access_pairs.id2 = users.id)) |
| Heap Fetches: 0 |
| SubPlan 2 |
| -> Bitmap Heap Scan on public.user_access_pairs user_access_pairs_1 (cost=4.31..11.26 rows=2 width=4) (actual time=0.075..0.083 rows=2 loops=1) |
| Output: user_access_pairs_1.id2 |
| Recheck Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) |
| Heap Blocks: exact=2 |
| -> Bitmap Index Scan on index_user_access_pairs_on_id1 (cost=0.00..4.31 rows=2 width=0) (actual time=0.064..0.064 rows=2 loops=1) |
| Index Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) |
| Planning Time: 0.572 ms |
| Execution Time: 1.295 ms |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
这是“手动”完成的相同查询,没有 RLS 进行比较。这次没有 seq 扫描,性能明显更好(尤其是在更大的数据集上运行时)
database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
select first_name, last_name
from users
where exists (
select 1
from user_access_pairs
where
id1 = current_setting('jwt.claims.user_id'::text, true)::integer
and id2 = users.id
)
+---------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|---------------------------------------------------------------------------------------------------------------------------------------------|
| Nested Loop (cost=4.59..27.86 rows=2 width=14) (actual time=0.020..0.033 rows=2 loops=1) |
| Output: users.first_name, users.last_name |
| Inner Unique: true |
| -> Bitmap Heap Scan on public.user_access_pairs (cost=4.31..11.26 rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1) |
| Output: user_access_pairs.id1, user_access_pairs.id2 |
| Recheck Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) |
| Heap Blocks: exact=2 |
| -> Bitmap Index Scan on index_user_access_pairs_on_id1 (cost=0.00..4.31 rows=2 width=0) (actual time=0.010..0.010 rows=2 loops=1) |
| Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) |
| -> Index Scan using users_pkey on public.users (cost=0.28..8.30 rows=1 width=18) (actual time=0.006..0.006 rows=1 loops=2) |
| Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask |
| Index Cond: (users.id = user_access_pairs.id2) |
| Planning Time: 0.464 ms |
| Execution Time: 0.075 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------+
我猜想查询规划器会将这两个查询视为相同。为什么它们会有所不同?如何避免 seq 扫描?
【问题讨论】:
-
您正在对 USERS 进行全面扫描,因为这是您告诉它要做的。您的策略中没有任何内容限制它需要在 USERS 中查看的行 - 您已经说过“使用 current_setting('jwt.claims.user_id'...),将其与 USERS 表中的每个 ID 配对,然后查看USER_ACCESS_PAIRS 中存在哪些对”。该策略使用 USERS 中的每一行,所以它当然是一个完整的扫描。我希望到那时数据库会说:“去他妈的 - 我正在读取整个 USERS 表,所以我不妨对 USER_ACCESS_PAIRS 进行全面扫描”。
-
在问题的最后一个示例中,查询计划器显然能够在没有 seq 扫描的情况下完成此操作。在这里它快了 17 倍。在更大的数据集中,它的速度要快数千倍。我很好奇为什么规划者认为这两个查询由相同的部分构造时不相等。
标签: postgresql performance row-level-security