【问题标题】：Why is Row Level Security (RLS) not using indexes?为什么行级安全性 (RLS) 不使用索引？
【发布时间】：2020-12-16 10:42:06
【问题描述】：

我收到了患者和治疗师的申请。它们都在同一个users 表中。患者应该能够看到他们的治疗师，而治疗师应该能够看到他们的患者。

我已经设置了一个带有用户 ID 对的物化视图 (user_access_pairs)，如果两个用户在视图中有一行，那么这意味着他们应该可以相互访问。

database> \d user_access_pairs
+----------+---------+-------------+
| Column   | Type    | Modifiers   |
|----------+---------+-------------|
| id1      | integer |             |
| id2      | integer |             |
+----------+---------+-------------+
Indexes:
    "index_user_access_pairs" UNIQUE, btree (id1, id2)

这是users 表的定义，它有很多与这个问题无关的列。

database> \d users
+-----------------------------+-----------------------------+-----------------------------------------------------+
| Column                      | Type                        | Modifiers                                           |
|-----------------------------+-----------------------------+-----------------------------------------------------|
| id                          | integer                     |  not null default nextval('users_id_seq'::regclass) |
| first_name                  | character varying(255)      |                                                     |
| last_name                   | character varying(255)      |                                                     |
+-----------------------------+-----------------------------+-----------------------------------------------------+
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)

我创建了一个 RLS 策略，该策略限制了哪些人可以使用 jwt 令牌读取 users。

create policy select_users_policy
  on public.users
  for select using (
    (current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
      select id1, id2 from user_access_pairs
    )
  );

这似乎合乎逻辑，但我的表现很糟糕。尽管那里有索引，但查询规划器会对 user_access_pairs 进行顺序扫描。

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
    select first_name, last_name
    from users
+------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                         |
|------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users  (cost=231.84..547.19 rows=2386 width=14) (actual time=5.481..6.418 rows=2 loops=1)                       |
|   Output: users.first_name, users.last_name                                                                                        |
|   Filter: (hashed SubPlan 1)                                                                                                       |
|   Rows Removed by Filter: 4769                                                                                                     |
|   SubPlan 1                                                                                                                        |
|     ->  Seq Scan on public.user_access_pairs  (cost=0.00..197.67 rows=13667 width=8) (actual time=0.005..1.107 rows=13667 loops=1) |
|           Output: user_access_pairs.id1, user_access_pairs.id2                                                                     |
| Planning Time: 0.072 ms                                                                                                            |
| Execution Time: 6.521 ms                                                                                                           |
+------------------------------------------------------------------------------------------------------------------------------------+

但是，如果我切换到绕过 RLS 的超级用户角色并手动应用相同的过滤器，我会获得更好的性能。不应该是一样的吗？

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
   select first_name, last_name
   from users
   where (current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
     select id1, id2 from user_access_pairs
   )
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| QUERY PLAN
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Nested Loop  (cost=4.59..27.86 rows=2 width=14) (actual time=0.041..0.057 rows=2 loops=1)
|   Output: users.first_name, users.last_name
|   Inner Unique: true
|   ->  Bitmap Heap Scan on public.user_access_pairs  (cost=4.31..11.26 rows=2 width=4) (actual time=0.029..0.036 rows=2 loops=1)
|         Output: user_access_pairs.id1, user_access_pairs.id2
|         Filter: ((current_setting('jwt.claims.user_id'::text, true))::integer = user_access_pairs.id1)
|         Heap Blocks: exact=2
|         ->  Bitmap Index Scan on index_user_access_pairs  (cost=0.00..4.31 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1)
|               Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)
|   ->  Index Scan using users_pkey on public.users  (cost=0.28..8.30 rows=1 width=18) (actual time=0.008..0.008 rows=1 loops=2)
|         Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask, users.reset_password_token, users.reset_password_sent_at, users.remember_created_at, users.sign_in_count, users.current_sign_in_at, users.last_sign_in_at,
|         Index Cond: (users.id = user_access_pairs.id2)
| Planning Time: 0.526 ms
| Execution Time: 0.116 ms
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

为什么在进行查询时 RLS 不使用索引？

PS 我使用的是 PostgreSQL 版本 12.4

database> select version()
+-------------------------------------------------------------------------------------------------------------------------------+
| version                                                                                                                       |
|-------------------------------------------------------------------------------------------------------------------------------|
| PostgreSQL 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, 64-bit |
+-------------------------------------------------------------------------------------------------------------------------------+

编辑

感谢劳伦兹的回复。它大大提高了性能。但我仍在进行一些 seq 扫描。

这是 Laurenz 建议的更新政策。

create policy select_users_policy
  on public.users
  for select using (
    exists (
      select 1
      from user_access_pairs
      where
        id1 = current_setting('jwt.claims.user_id'::text, true)::integer
        and id2 = users.id
    )
  );

即使策略中的 exists 查询正在使用索引，使用 RLS 查询此表仍然可以对 users 表进行 seq 扫描。

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
  select first_name, last_name
  from users
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                            |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users  (cost=0.00..40048.81 rows=2394 width=14) (actual time=0.637..1.216 rows=2 loops=1)                                          |
|   Output: users.first_name, users.last_name                                                                                                           |
|   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)                                                                                               |
|   Rows Removed by Filter: 4785                                                                                                                        |
|   SubPlan 1                                                                                                                                           |
|     ->  Index Only Scan using index_user_access_pairs on public.user_access_pairs  (cost=0.29..8.31 rows=1 width=0) (never executed)                  |
|           Index Cond: ((user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) AND (user_access_pairs.id2 = users.id)) |
|           Heap Fetches: 0                                                                                                                             |
|   SubPlan 2                                                                                                                                           |
|     ->  Bitmap Heap Scan on public.user_access_pairs user_access_pairs_1  (cost=4.31..11.26 rows=2 width=4) (actual time=0.075..0.083 rows=2 loops=1) |
|           Output: user_access_pairs_1.id2                                                                                                             |
|           Recheck Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                      |
|           Heap Blocks: exact=2                                                                                                                        |
|           ->  Bitmap Index Scan on index_user_access_pairs_on_id1  (cost=0.00..4.31 rows=2 width=0) (actual time=0.064..0.064 rows=2 loops=1)         |
|                 Index Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                  |
| Planning Time: 0.572 ms                                                                                                                               |
| Execution Time: 1.295 ms                                                                                                                              |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

这是“手动”完成的相同查询，没有 RLS 进行比较。这次没有 seq 扫描，性能明显更好（尤其是在更大的数据集上运行时）

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
    select first_name, last_name
    from users
    where exists (
       select 1
       from user_access_pairs
       where
         id1 = current_setting('jwt.claims.user_id'::text, true)::integer
         and id2 = users.id
     )

+---------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                  |
|---------------------------------------------------------------------------------------------------------------------------------------------|
| Nested Loop  (cost=4.59..27.86 rows=2 width=14) (actual time=0.020..0.033 rows=2 loops=1)                                                   |
|   Output: users.first_name, users.last_name                                                                                                 |
|   Inner Unique: true                                                                                                                        |
|   ->  Bitmap Heap Scan on public.user_access_pairs  (cost=4.31..11.26 rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1)             |
|         Output: user_access_pairs.id1, user_access_pairs.id2                                                                                |
|         Recheck Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                |
|         Heap Blocks: exact=2                                                                                                                |
|         ->  Bitmap Index Scan on index_user_access_pairs_on_id1  (cost=0.00..4.31 rows=2 width=0) (actual time=0.010..0.010 rows=2 loops=1) |
|               Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                            |
|   ->  Index Scan using users_pkey on public.users  (cost=0.28..8.30 rows=1 width=18) (actual time=0.006..0.006 rows=1 loops=2)              |
|         Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask                        |
|         Index Cond: (users.id = user_access_pairs.id2)                                                                                      |
| Planning Time: 0.464 ms                                                                                                                     |
| Execution Time: 0.075 ms                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------+

我猜想查询规划器会将这两个查询视为相同。为什么它们会有所不同？如何避免 seq 扫描？

【问题讨论】：

您正在对 USERS 进行全面扫描，因为这是您告诉它要做的。您的策略中没有任何内容限制它需要在 USERS 中查看的行 - 您已经说过“使用 current_setting('jwt.claims.user_id'...)，将其与 USERS 表中的每个 ID 配对，然后查看USER_ACCESS_PAIRS 中存在哪些对”。该策略使用 USERS 中的每一行，所以它当然是一个完整的扫描。我希望到那时数据库会说：“去他妈的 - 我正在读取整个 USERS 表，所以我不妨对 USER_ACCESS_PAIRS 进行全面扫描”。
在问题的最后一个示例中，查询计划器显然能够在没有 seq 扫描的情况下完成此操作。在这里它快了 17 倍。在更大的数据集中，它的速度要快数千倍。我很好奇为什么规划者认为这两个查询由相同的部分构造时不相等。

标签： postgresql performance row-level-security

【解决方案1】：

在没有 RLS 策略的情况下，您没有看到与看似等效的查询相同的计划的原因是子查询 pullup 发生在在考虑 RLS 策略之前。这是规划师的怪癖。

总而言之，不幸的是，RLS 策略与子查询相结合在性能方面并不是彼此的朋友。

供您参考，比较以下两个查询时可以看到类似的表现：

SELECT ... FROM my_table WHERE                     EXISTS(SELECT ...);
SELECT ... FROM my_table WHERE CASE WHEN true THEN EXISTS(SELECT ...) END;

这里，虽然两个查询是等效的，但第二个查询会为子查询生成一个（散列）子计划，因为不必要的 CASE WHEN true 的折叠是在在子查询 pullup 之后完成的。

免责声明：我在 IRC #postgresql 上从 RhodiumToad 获得此信息，但用我自己的话解释/简化了它。

【讨论】：

【解决方案2】：

我无法指出其中的区别，但我认为您应该通过更明智的政策制定更好的计划：

CREATE POLICY select_users_policy ON public.users
  FOR SELECT
  USING (
     EXISTS (SELECT 1 FROM user_access_pairs
             WHERE id1 = current_setting('jwt.claims.user_id'::text, true)
               AND id2 = users.id)
  );

我想提一下，将行级安全性基于用户可以随时更改的占位符变量是有问题的安全性。

【讨论】：

非常感谢您的回复！它确实提高了很多性能，但仍然没有我预期的那么好。我已更新问题以包含您的建议。
@Laurenz Albe 在这种情况下“用户”无权访问数据库（因此无法更改变量），此策略用于 PostgREST 的上下文中

【解决方案3】：

this comment 的作者（通过反复试验）提出了将子查询转换为ARRAY 的解决方案。完全不确定它是否适用于您的情况，但只是表明非常出乎意料的技巧显然会吓跑优化器完成其工作。

所以你可以试试：

create policy select_users_policy
on public.users
for select using (
  users.id = any (
    array(
        select id1
        from user_access_pairs
        where 
            id1 = current_setting('jwt.claims.user_id'::text, true)::integer
            and id2 = users.id
        )
    )
);

很尴尬，但谁知道...

【讨论】：

【解决方案4】：

问题中没有说明，但我假设来自 public.users 的读取是从另一个面向 API 的架构（我们称之为 api）触发的。

subZero Slack 上的一个人分享了：

我遇到了同样的问题，并根据我的api 视图定义了 RLS，它解决了 seq 扫描问题。但是在对这些视图进行更改时维护起来有点痛苦，因为对于迁移，我必须首先删除 RLS 策略，更改视图，然后重新创建策略。 ...当 RLS 中涉及子查询时，我使用 api 视图。

因此，他们使用完全相同的规则，但引用了 api.foo 和 api.bar views instead of public.fooandpublic.bar` 表。

在你的情况下，你可以尝试：

create policy select_users_policy
  on public.users
  for select using (
    exists (
      select 1
      from api.user_access_pairs
      where
        id1 = current_setting('jwt.claims.user_id'::text, true)::integer
        and id2 = api.users.id
    )
  );

所以这是假设您在 api 架构中有一个 users 视图镜像 public.users，并将 user_access_pairs 移动到 api 以及（或创建一个引用它的视图）。

我不清楚这是否有效，因为查询首先是从 api 架构中的视图/函数触发的，因此在该架构中引用视图对于查询优化器来说不会那么混乱，或者如果这只是让优化器启动的一个技巧，不管查询是如何产生的。（在我看来，后者的可能性更大，但谁知道呢。）

【讨论】：

【解决方案5】：

subZero Slack 上的另一个用户分享了一个解决方案，该解决方案基于将当前用户权限的查找包装在一个函数中。在您的情况下，类似于：

create policy select_users_policy
  on public.users
  for select using (
    id IN (
      select * from current_user_read_users()
   )
  );

您将创建一个 current_user_read_users() 函数，该函数从 jwt 中查找 user_id，并根据 user_access_pairs 返回当前用户可能读取的用户集。

此函数是否具有与user_access_pairs 视图相同的所有者，或者该函数使用SECURITY DEFINER 声明（以便绕过RLS）可能很重要，也可能不重要。可能重要的部分只是将子查询拉出到一个函数中（以某种方式帮助优化器），但报告了其他事情以帮助解决其他性能问题。

最后，您可能想尝试将其放入 api 视图中，就像我报告的 the other solution 一样。

一个警告：

权限表本身存在循环依赖问题，因此我必须执行一项特殊情况策略。不过，那个没有任何性能问题，所以没问题。

（请注意，在他们的情况下，权限保存在 table 中，可由管理员用户编辑，而不是像您的情况那样生成。）

【讨论】：

【解决方案6】：

一种解决方案（基于this post，它还有其他几个很好的建议和基准）是根本不使用RLS，而是将过滤构建到视图中：

create view api.allowed_users
with (security_barrier)
as
  select id, first_name, last_name, favorite_color
  from public.users
  join user_access_pairs uap
    on uap.id1 = current_setting('jwt.claims.user_id'::text, true)::integer

您已经在 user_access_pairs 视图中表达了您的访问策略，因此可以说 RLS 规则并没有真正添加任何内容。

（security_barrier 是为了防止潜在的信息泄漏，但会带来性能成本，因此请查看您的情况是否有必要。）

【讨论】：