如何计算 SQLAlchemy 中组的百分比？答案

【问题标题】：How to calculate percentage of a group in SQLAlchemy?如何计算 SQLAlchemy 中组的百分比？
【发布时间】：2021-06-25 06:56:57
【问题描述】：

我正在用 Python 构建一个“测验应用程序”，我需要将结果存储在 SQL 数据库中。我想使用 SQLAlchemy Python 库与数据库进行交互。我的应用程序的每个用户将被问到从预先确定的 100 个可能问题中随机选择的 3 个问题。每个问题只能回答“是”或“否”（即True 或False）。我将答案存储在一个定义如下的表中：

class Answer(Base):
    __tablename__ = "Answers"
    
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("Users.id"), nullable=False)
    question_id = Column(Integer)
    answer = Column(Boolean, nullable=False)
    
    user = relationship("User", back_populates="answers")

在所有用户完成测验后，我计算某个问题被用户回答的次数：

tot_each_question = (db_session
                     .query(Answer.question_id,
                            count_questions.label("tot_answers_for_question"))
                     .group_by(Answer.question_id)
                     )

我还可以计算某个问题被用户回答“是”（即True）的次数：

tot_true_for_question = (db_session
                         .query(Answer.question_id,
                                count_questions.label("tot_true_for_question"))
                         .filter(Answer.answer == True)
                         .group_by(Answer.question_id)
                         )

如何使用 SQLAlchemy 计算用户回答“是”的每个问题的百分比？我可以使用基本的 Python 字典轻松做到这一点：

dict_tot_each_question = {row.question_id: row.tot_answers_for_question
                          for row in tot_each_question.all()}

dict_tot_true_for_question = {row.question_id: row.tot_true_for_question
                              for row in tot_true_for_question.all()}

dict_percent_true_for_question = {}
for question_id, tot_answers in dict_tot_each_question.items():
    tot_true = dict_tot_true_for_question.get(question_id, 0)
    percent_true = tot_true / tot_answers * 100
    dict_percent_true_for_question[question_id] = percent_true

但我更喜欢使用 SQLAlchemy 功能来获得相同的结果。是否可以在 SQLAlchemy 中做到这一点？在 SQLAlchemy 中这样做是否方便高效，或者我基于 Python 字典的解决方案是否会更好？

【问题讨论】：

标签： python sql sqlalchemy

【解决方案1】：

只需将您已有的两个查询中的两个表达式组合成一个即可获得所需的结果：

q = (
    session.query(
        Question.id,
        (100 * func.sum(cast(Answer.answer, Integer)) / func.count(Answer.answer)).label("perc_true"),
    )
    .outerjoin(Answer)
    .group_by(Question.id)
)

如您在上面看到的，我使用COUNT 函数来回答所有问题。

要注意的另一项是我的查询以Question 和JOINs Answer 表开头。这样做的原因是，如果 Question 没有答案，如果您只使用 Answers 表，您仍然会看到返回的 (#id, NULL) 而不是看不到一行。但是，如果您不关心我所看到的这种极端情况，您可以按照自己的方式处理：

q = (
    session.query(
        Answer.question_id,
        (100 * func.sum(Answer.answer) / func.count(Answer.answer)).label("perc_true"),
    )
    .group_by(Answer.question_id)
)

最后，我做出的另一个假设是，您的数据库将在转换为Integer 后将true 处理为1，以便正确处理SUM。如果不是这样，请参考这个问题中的多个答案关于如何处理这个问题：postgresql - sql - count of `true` values

奖励：

当我发现自己在模型级别询问一些与聚合相关的问题时，我经常使用 Hybrid Attributes 扩展直接在模型上实现这些。

下面的代码将为您提供并说明如何将其用于您的案例：

class Answer(Base):
    __tablename__ = "answers"

    id = Column(Integer, primary_key=True)
    # user_id = Column(Integer, ForeignKey("users.id"), nullable=False)
    question_id = Column(Integer, ForeignKey("questions.id"))
    answer = Column(Boolean, nullable=False)

    # user = relationship("User", back_populates="answers")
    question = relationship("Question", back_populates="answers")


class Question(Base):
    __tablename__ = "questions"

    id = Column(Integer, primary_key=True)
    question = Column(String, nullable=False)

    answers = relationship("Answer", back_populates="question")

    @hybrid_property
    def answers_cnt(self):
        return len(list(self.answers))

    @hybrid_property
    def answers_yes(self):
        return len(list(_ for _ in self.answers if _.answer))

    @hybrid_property
    def answers_yes_percentage(self):
        return (
            100.0 * self.answers_yes / self.answers_cnt if self.answers_cnt != 0 else None
        )

    @answers_cnt.expression
    def answers_cnt(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .label("answers_cnt")
        )

    @answers_yes.expression
    def answers_yes(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .where(Answer.answer == True)
            .label("answers_yes")
        )

    @answers_yes_percentage.expression
    def answers_yes_percentage(cls):
        return (
            case(
                [(cls.answers_cnt == 0, None)],
                else_=(
                    100
                    * cast(cls.answers_yes, Numeric)
                    / cast(cls.answers_cnt, Numeric)
                ),
            )
        ).label("answers_yes_percentage")

在这种情况下，您可以在 python 中或使用查询进行计算。

Python（这将从数据库中加载所有答案，因此如果数据尚未加载到内存中则效率不高）
```
 q = session.query(Question)
 for question in q:
     print(question, question.answers_yes_percentage)
```
数据库：这非常有效，因为您只需运行一个查询，类似于您正在查看的答案中的单独查询，但结果单独返回并作为模型上的属性
```
 q = session.query(Question, Question.answers_yes_percentage)
 for question, percentage in q:
     print(question, percentage)
```

请注意，上述方法适用于 1.4 版本的 sqlalchemy，但可能需要其他语法才能用于之前的版本。

【讨论】：

感谢@van，它运行良好！我非常感谢解释和实现预期结果的两个选项。请注意：您的方法执行“整数除法”：数字的小数部分被丢弃，数字始终向下舍入。有没有办法执行“浮点除法”，获得与我使用 Python / 运算符获得的相同数字？
尝试将100 替换为100.1。如果这还不够，我将使用CAST 运算符更改答案。
是的，效果很好！使用100.0001 将正确的数字更改为可以忽略不计的数量。
出于好奇，我也尝试了cast函数：from sqlalchemy import cast, Float。然后，将func.sum(Answer.answer) 替换为cast(func.sum(Answer.answer), Float) 将返回与基本Python 获得的完全相同的浮点数！
不客气。我将用您可能觉得有用的另一种用法修改答案，请随时使用 sqlalchemy 文档进一步探索它。