如何在 postgresql 中进行分页？答案

【问题标题】：How to do pagination in postgres sql?如何在 postgresql 中进行分页？
【发布时间】：2019-01-15 16:52:45
【问题描述】：

我有一个用于进行 sql 查询的 python 脚本。问题是我的虚拟机只有 2GB 的 RAM，并且一些 sql 查询的 RAM 太密集，因此内核会自动终止脚本。我怎样才能让这段代码更有效率？我想在我的 postgres sql 代码中实现分页。我该怎么做？有谁知道一个简单的实现？非常感谢您的帮助！

更新代码

from __future__ import print_function

try:
    import psycopg2
except ImportError:
    raise ImportError('\n\033[33mpsycopg2 library missing. pip install psycopg2\033[1;m\n')
    sys.exit(1)


import re
import sys
import json
import pprint
import time

outfilepath = "crtsh_output/crtsh_flat_file"

DB_HOST = 'crt.sh'
DB_NAME = 'certwatch'
DB_USER = 'guest'

# DELAY = 0


def connect_to_db():
    start = 0
    offset = 10
    flag = True
    while flag:
        filepath = 'forager.txt'
        with open(filepath) as fp:
            unique_domains = ''
            try:
                conn = psycopg2.connect("dbname={0} user={1} host={2}".format(DB_NAME, DB_USER, DB_HOST))
                cursor = conn.cursor()
                cursor.itersize = 10000
                for cnt, domain_name in enumerate(fp):
                    print("Line {}: {}".format(cnt, domain_name))
                    print(domain_name)
                    domain_name = domain_name.rstrip()

                    cursor.execute('''SELECT c.id, x509_commonName(c.certificate), x509_issuerName(c.certificate), x509_notBefore(c.certificate), x509_notAfter(c.certificate), x509_issuerName(c.certificate), x509_keyAlgorithm(c.certificate), x509_keySize(c.certificate), x509_publicKeyMD5(c.certificate), x509_publicKey(c.certificate), x509_rsaModulus(c.certificate), x509_serialNumber(c.certificate), x509_signatureHashAlgorithm(c.certificate), x509_signatureKeyAlgorithm(c.certificate), x509_subjectName(c.certificate), x509_name(c.certificate), x509_name_print(c.certificate), x509_commonName(c.certificate), x509_subjectKeyIdentifier(c.certificate), x509_extKeyUsages(c.certificate), x509_certPolicies(c.certificate), x509_canIssueCerts(c.certificate), x509_getPathLenConstraint(c.certificate), x509_altNames(c.certificate), x509_altNames_raw(c.certificate), x509_cRLDistributionPoints(c.certificate), x509_authorityInfoAccess(c.certificate), x509_print(c.certificate), x509_anyNamesWithNULs(c.certificate), x509_extensions(c.certificate), x509_tbscert_strip_ct_ext(c.certificate), x509_hasROCAFingerprint(c.certificate)
                    FROM certificate c, certificate_identity ci WHERE
                    c.id= ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) =
                    lower(%s) AND x509_notAfter(c.certificate) > statement_timestamp()''', (domain_name,))


                # query db with start and offset
                unique_domains = cursor.fetchall()
                if not unique_domains:
                    flag = False
                else:
                        # do processing with your data

                    pprint.pprint(unique_domains)

                    outfilepath = "crtsh2" + ".json"
                    with open(outfilepath, 'a') as outfile:
                            outfile.write(json.dumps(unique_domains, sort_keys=True, indent=4, default=str, ensure_ascii = False))
                    offset += limit


            except Exception as error:
                print(str(error))

if __name__ == "__main__":
    connect_to_db()

【问题讨论】：

使用类似 cur.fetchmany(n) 的东西。从您的查询中返回接下来的 'n' 行。
@Mokadillion 感谢您的回复！我应该在代码的哪个部分实现 cur.fetchmany(n)？
一件奇怪的事情是cursor.fetchall() 被称为在循环之外 - 这意味着数据库永远没有机会关闭之前运行查询所消耗的资源。您应该处理每个 execute() 调用的查询结果 - 追加到列表、更新集合等。
@bimsapi 感谢您的帮助！为了让 cursor.fetchall() 在循环内，它需要缩进一次吗？
是 - 缩进到与循环中其他语句相同的级别。我也会避免在每次迭代时打开和关闭crtsh2.json。为简单起见，通过with open(filepath) as fp, open('crtsh2.json') as outfile: 同时管理两个文件

标签： python sql python-3.x postgresql pagination

【解决方案1】：

可能是这样的：

limit = 10
offset = 0
flag = True
while flag:
    # query db with start and offset, example: select * from domains limit %start% offset %offset%
    unique_domains = cursor.fetchall()
    if not unique_domains:
        flag = False
    else:
        # do processing with your data
        offset += limit

【讨论】：

@Ajay_Gupta 感谢您的回复。你是说我应该把这段代码放在我有“unique_domains = cursor.fetchall()”的地方吗？
@bedford 这是一个伪代码，您必须根据限制中定义的要获取的记录数进行查询。
@Ajay_Gupta 再次感谢您的回复！这是伪代码到底是什么意思？
@bedford 这只是意味着你必须将你的代码插入到这个逻辑代码中
@Ajay_Gupta 我更新了上面的代码。但是，我的代码不起作用。它只是停在第一个查询上，什么也不做。任何帮助将不胜感激！

【解决方案2】：

我在 Postgres 中找到了一个分页链接。 Five ways to paginate in Postgres, from the basic to the exotic

这是一个例子：键集分页上述技术可以对任何类型的查询进行分页，包括没有顺序子句的查询。如果我们愿意放弃这种普遍性，我们就会获得优化。特别是当按索引列排序时，客户端可以使用当前页面中的值来选择在下一页中显示哪些项目。这称为键集分页。

例如，让我们回到混合泳的例子：

-- Add an index for keyset pagination (btrees support inequality)
CREATE INDEX n_idx ON medley USING btree (n);
SELECT * FROM medley ORDER BY n ASC LIMIT 5;

【讨论】：