【问题标题】:PostgreSQL: count query takes too much timePostgreSQL:计数查询需要太多时间
【发布时间】:2018-05-30 17:26:51
【问题描述】:

我的查询有一些问题 - 花费了太多时间(2636124 毫秒!):

 SELECT COUNT(*) AS "__count" 
 FROM "dictionary_dictionary" 
 WHERE NOT ("dictionary_dictionary"."id" IN (SELECT U1."word_id" AS Col1 
                                             FROM "dictionary_frequencydata" U1 
                                             WHERE U1."user_id" = 1));

此查询由 ORM (Django) 生成。当我尝试(使用 ORM)执行它时,我的应用程序挂起,当我输入 psql 时 - psql 挂起。

解释分析:

Aggregate  (cost=329583550.40..329583550.41 rows=1 width=8) (actual 
time=2636109.932..2636109.933 rows=1 loops=1)
   ->  Seq Scan on dictionary_dictionary  (cost=0.00..329583390.76 
       rows=63856 width=0) (actual time=2636109.922..2636109.922 rows=0 loops=1)
           Filter: (NOT (SubPlan 1))
           Rows Removed by Filter: 127712
           SubPlan 1
             ->  Materialize  (cost=0.00..4821.74 rows=135828 width=4) (actual time=0.006..12.453 rows=63856 loops=127712)
                ->  Seq Scan on dictionary_frequencydata u1  (cost=0.00..3611.60 rows=135828 width=4) (actual time=0.299..95.915 rows=127712 loops=1)
                     Filter: (user_id = 1)
                     Rows Removed by Filter: 28054
 Planning time: 0.277 ms
 Execution time: 2636124.744 ms
 (11 wierszy)`

我的 Django 模型

class Dictionary(DateTimeModel):
    base_word = models.ForeignKey(BaseDictionary, related_name=_('dict_words'))
    word = models.CharField(max_length=64)
    version = models.ForeignKey(Version)

class FrequencyData(DateTimeModel):
    word = models.ForeignKey(Dictionary, related_name=_('frequency_data'))
    count = models.BigIntegerField(null=True, blank=True)
    source = models.ForeignKey(Source, related_name=_('frequency_data'), null=True, blank=True)
    user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name=_('frequency_data'))
    user_ip_address = models.GenericIPAddressField(null=True, blank=True)
    date_of_checking = models.DateTimeField(null=True, blank=True)
    is_checked = models.BooleanField(default=False)

表定义:

\d+ dictionary_dictionary
                                                                 Tabela "public.dictionary_dictionary"
       Kolumna        |           Typ            | Porównanie | Nullowalne |                             Domyślnie                              | Przechowywanie | Cel statystyk | Opis 
----------------------+--------------------------+------------+------------+--------------------------------------------------------------------+----------------+---------------+------
 id                   | integer                  |            | not null   | nextval('dictionary_dictionary_id_seq'::regclass) | plain          |               | 
 date_created         | timestamp with time zone |            | not null   |                                                                    | plain          |               | 
 date_modified        | timestamp with time zone |            | not null   |                                                                    | plain          |               | 
 word                 | character varying(64)    |            | not null   |                                                                    | extended       |               | 
 algorithm_version_id | integer                  |            | not null   |                                                                    | plain          |               | 
 base_word_id         | integer                  |            | not null   |                                                                    | plain          |               | 

Indeksy:
    "dictionary_dictionary_pkey" PRIMARY KEY, btree (id)
    "dictionary_phonet_algorithm_version_id_0f0af100" btree (algorithm_version_id)
    "dictionary_dictionary_base_word_id_8db15cb4" btree (base_word_id)

Ograniczenia kluczy obcych:
    "dictionary__algorithm_version_id_0f0af100_fk_phonetic_" FOREIGN KEY (algorithm_version_id) REFERENCES dictionary_algorithmversion(id) DEFERRABLE INITIALLY DEFERRED
    "dictionary__base_word_id_8db15cb4_fk_phonetic_" FOREIGN KEY (base_word_id) REFERENCES dictionary_grammaticaldictionary(id) DEFERRABLE INITIALLY DEFERRED

Wskazywany przez:
    TABLE "dictionary_frequencydata" CONSTRAINT "dictionary__word_id_c231110d_fk_phonetic_" FOREIGN KEY (word_id) REFERENCES dictionary_dictionary(id) DEFERRABLE INITIALLY DEFERRED

=========
\d+ dictionary_frequencydata
                                                               Tabela "public.dictionary_frequencydata"
     Kolumna      |           Typ            | Porównanie | Nullowalne |                           Domyślnie                           | Przechowywanie | Cel statystyk | Opis 
------------------+--------------------------+------------+------------+---------------------------------------------------------------+----------------+---------------+------
 id               | integer                  |            | not null   | nextval('dictionary_frequencydata_id_seq'::regclass) | plain          |               | 
 date_created     | timestamp with time zone |            | not null   |                                                               | plain          |               | 
 date_modified    | timestamp with time zone |            | not null   |                                                               | plain          |               | 
 count            | bigint                   |            |            |                                                               | plain          |               | 
 user_ip_address  | inet                     |            |            |                                                               | main           |               | 
 date_of_checking | timestamp with time zone |            |            |                                                               | plain          |               | 
 is_checked       | boolean                  |            | not null   |                                                               | plain          |               | 
 source_id        | integer                  |            |            |                                                               | plain          |               | 
 user_id          | integer                  |            | not null   |                                                               | plain          |               | 
 word_id          | integer                  |            | not null   |                                                               | plain          |               | 

Indeksy:
    "dictionary_frequencydata_pkey" PRIMARY KEY, btree (id)
    "dictionary_frequencydata_source_id_38bb205a" btree (source_id)
    "dictionary_frequencydata_user_id_c6dfedce" btree (user_id)
    "dictionary_frequencydata_word_id_c231110d" btree (word_id)

Ograniczenia kluczy obcych:
    "dictionary__source_id_38bb205a_fk_phonetic_" FOREIGN KEY (source_id) REFERENCES dictionary_frequencysource(id) DEFERRABLE INITIALLY DEFERRED
    "dictionary__user_id_c6dfedce_fk_auth_user" FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
    "dictionary__word_id_c231110d_fk_phonetic_" FOREIGN KEY (word_id) REFERENCES dictionary_dictionary(id) DEFERRABLE INITIALLY DEFERRED

这是共享主机。 字典数据库表 - 120k 行 FrequencyData - 160k 行

【问题讨论】:

  • 执行以下两个查询需要多长时间:select count(*) from dictionary_dictionary;select count(DISTINCT d.id) from dictionary_dictionary d join f dictionary_frequencydata on d.id = f.word_id WHERE f.user_id = 1
  • 第一个:54 ms,第二个:345 ms 解释:pastebin.com/T96Q3ipt
  • SELECT U1."word_id" AS Col1 FROM "dictionary_frequencydata" U1 WHERE U1."user_id" = 1 运行多长时间?
  • 您是否尝试过使用 DISTINCT 关键字对其进行操作?例如SELECT COUNT(*) AS "__count" FROM "dictionary_dictionary" WHERE NOT ("dictionary_dictionary"."id" IN (SELECT distinct U1."word_id" AS Col1 FROM "dictionary_frequencydata" U1 WHERE U1."user_id" = 1));
  • DISTINCT 有效。谢谢!如果您写下此答案,我会将其标记为已接受。

标签: sql postgresql


【解决方案1】:

尝试添加 DISTINCT 关键字,这应该会缩小检查的 id 子集:

SELECT COUNT(*) AS "__count" 
FROM "dictionary_dictionary" 
WHERE NOT ("dictionary_dictionary"."id" IN (SELECT distinct U1."word_id" AS Col1
                                            FROM "dictionary_frequencydata" U1 
                                            WHERE U1."user_id" = 1));

【讨论】:

【解决方案2】:

在这种情况下,如果您像下面这样重写它,您的查询应该会快很多,因为两个子查询都很快。最终结果相当于django生成的查询。

似乎对dictionary_dictionary 进行过滤操作的seq 扫描非常昂贵,但普通seq 扫描非常快。我不知道为什么会这样。

SELECT 
tot - excl
from (select count(*) tot
      from dictionary_dictionary) t1
, (select count(DISTINCT d.id) excl
   from dictionary_dictionary d 
   join dictionary_frequencydata f
     on d.id = f.word_id 
   where f.user_id = 1 ) t2

如果不经常将行插入dictionary_dictionary,则计数不应经常更改。那么缓存 select count(*) from dictionary_dictionary 的结果并从中减去排除的 id 的计数会更有效。当从 dictionary_dictionary 中插入/删除行时,需要更新缓存。可以使用 dictoinary_dictionary 的插入和删除触发器自动维护此缓存

【讨论】:

  • 谢谢,它更快(288 毫秒)。 Count 只是一个简单的例子,但主要问题不是 COUNT 而是 SELECT 一些记录 - 我会尝试用你的建议重写(原始帖子:stackoverflow.com/questions/50604128/django-orm-exclude-fails)。但是为什么过滤操作如此昂贵?
  • 请用适当的表描述\d+ tablename 更新另一个问题,并从解释分析您尝试运行的查询中输出。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-07-27
  • 1970-01-01
  • 1970-01-01
  • 2014-08-15
相关资源
最近更新 更多