【问题标题】:Optimize session.commit with Sqlalchemy使用 Sqlalchemy 优化 session.commit
【发布时间】:2020-12-17 04:42:16
【问题描述】:

我在使用 SQLAlchemy 编写 SQL 数据库时遇到了性能问题。我们有数千条记录要写,每条记录都有很多关系。通过调查,我们意识到每条记录都添加了一个“插入”。这里以一个小数据模型为例:

模型声明

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy.orm import relationship,backref,sessionmaker
from sqlalchemy import create_engine

engine= create_engine('sqlite:///model.db',echo=True)

Session=sessionmaker(bind=engine)
session=Session()

Base=declarative_base()
metagate=Base.metadata

class Parent(Base):
    __tablename__='PARENT'

    PK_ID=Column(Integer,primary_key=True)
    attribute=Column(String(10))

    relation1=relationship('Child', cascade="all, delete-orphan" ,single_parent=True, back_populates="relation2")

    def __init__(self, attribute=None):
        self.attribute=attribute

class Child(Base):
    __tablename__='CHILD'

    PK_ID=Column(Integer,primary_key=True)
    attribute=Column(String(10))    

    relation2_ID=Column(Integer, ForeignKey('PARENT.PK_ID'))
    relation2 = relationship("Parent", cascade="all, delete-orphan",single_parent=True, back_populates="relation1") 

    def __init__(self, attribute=None):
        self.attribute=attribute

Base.metadata.create_all(engine)

运行 sn-p

obj1=Parent('foo')
    
for attribute in range(10):
    obj2=Child(str(attribute))
    obj1.relation1.append(obj2)
        
session.add(obj1)
session.commit()

生成的 SQL

SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('0', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('1', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('2', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('3', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('4', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('5', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('6', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('7', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('8', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('9', 3)
COMMIT

我们期待的是这样的:

SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES 
('0', 3),
('1', 3),
...
('9', 3)
COMMIT

我们最初对每个孩子使用Session.add,然后是Session.bulk_save_objects(),现在是“级联保存更新”,但没有看到任何性能优势。我想知道是否有办法在单个查询中插入所有相关关系?

如果有帮助,最终的数据库将是 SQL Server 2012。我们的第一次尝试花费了 1 多小时来保存 1 条记录:

  • 数据大小:800kB左右
  • 涉及的表数:15
  • 相关记录数:20000左右

提前致谢,

BLH

【问题讨论】:

    标签: python sql sql-server orm sqlalchemy


    【解决方案1】:

    根据 SQLAlchemy 文档,pyodbc 驱动程序支持用于 MSSQL 的 fast executemany mode。因此,您可能会发现在启用此模式的 MSSQL 服务器上获得更好的性能测试。

    engine = create_engine(
        "mssql+pyodbc://scott:tiger@mssql2017:1433/test?driver=ODBC+Driver+13+for+SQL+Server",
        fast_executemany=True)
    

    文档警告说,此功能不适用于“非常大批量”的数据,因为数据存储在内存中,但 800KB 应该没问题。

    【讨论】:

    • 嗨,snakecharmerb,它没有按原样工作,但它驱使我去this post 执行这个 fast_executemany 方法。它已将计算时间缩短了约 20%。它仍然需要大约 4400 秒。打印回显仍然显示大量插入,因此它并没有改善 SQL 请求本身。我们可能会以另一种方式存储这些数据。谢谢您的提示。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-08-20
    • 2017-03-17
    • 2014-04-16
    • 2019-01-16
    • 2015-07-17
    • 1970-01-01
    • 2013-12-10
    相关资源
    最近更新 更多