django 使用查询集进行注释答案

【问题标题】：django annotate with querysetdjango 使用查询集进行注释
【发布时间】：2019-12-30 19:37:21
【问题描述】：

我有定期参加调查的用户。该系统有多个调查，从该特定类型的最后一次发布调查的提交日期开始按设定的时间间隔发布。

class Survey(Model):
    name = CharField()
    description = TextField()
    interval = DurationField()  
    users = ManyToManyField(User, related_name='registered_surveys')
    ...

class SurveyRun(Model):
    ''' A users answers for 1 taken survey '''
    user = ForeignKey(User, related_name='runs')
    survey = ForeignKey(Survey, related_name='runs')
    created = models.DateTimeField(auto_now_add=True)
    submitted = models.DateTimeField(null=True, blank=True)
    # answers = ReverseForeignKey...

因此，对于上述模型，应提醒用户在此日期下一次参加调查A：

A.interval + SurveyRun.objects.filter(
    user=user, 
    survey=A
).latest('submitted').submitted

我想运行一个每日定期任务，该任务查询所有用户并为根据此标准进行调查的所有用户创建新的运行：

对于用户注册的每个调查：

如果该用户-调查组合不存在任何运行，则为该用户-调查组合创建第一个运行并提醒用户
如果该调查有运行但没有打开（已创建打开运行但未提交，所以submitted=None）并且最近一个提交的日期加上调查的间隔是

理想情况下，我可以创建一个使用 surveys_due 字段注释的管理器方法，例如：

users_with_surveys_due = User.objects.with_surveys_due().filter(surveys_due__isnull=False)

注释字段将是Survey 对象的查询集，用户需要为其提交新一轮的答案。我可以发出这样的警报：

for user in users_with_surveys_due.all():
    for survey in user.surveys_due:
        new_run = SurveyRun.objects.create(
            user=user,
            survey=survey
        )
        alert_user(user, run)

但是，我愿意在 User 对象上使用布尔标志注释，指示 registered_surveys 之一需要创建新的运行。

我将如何实施类似with_surveys_due() 管理器方法，以便 Postgres 完成所有繁重的工作？是否可以使用集合对象进行注释，例如反向 FK？

更新：

为了清楚起见，这是我目前在 python 中的任务：

def make_new_runs_and_alert_users():
    runs = []
    Srun = apps.get_model('surveys', 'SurveyRun')
    for user in get_user_model().objects.prefetch_related('registered_surveys', 'runs').all():
        for srvy in user.registered_surveys.all():
            runs_for_srvy = user.runs.filter(survey=srvy)
            # no runs exist for this registered survey, create first run
            if not runs_for_srvy.exists():
                runs.append(Srun(user=user, survey=srvy))
                ...

            # check this survey has no open runs
            elif not runs_for_srvy.filter(submitted=None).exists():
                latest = runs_for_srvy.latest('submitted')
                if (latest.submitted + qnr.interval) <= timezone.now():
                    runs.append(Srun(user=user, survey=srvy))
    Srun.objects.bulk_create(runs)

更新 #2：

在尝试使用 Dirk 的解决方案时，我有一个简单的例子：

In [1]: test_user.runs.values_list('survey__name', 'submitted')                                                                                                                                     
Out[1]: <SurveyRunQuerySet [('Test', None)]>
In [2]: test_user.registered_surveys.values_list('name', flat=True)                                                                                                                                 
Out[2]: <SurveyQuerySet ['Test']>

用户有一个针对 Test 调查的公开运行 (submitted=None) 并已注册到一项调查 (Test)。他/她不应被标记为新的运行，因为对于他/她注册的唯一调查有未提交的未提交运行。所以我创建了一个封装 Dirk 解决方案的函数，名为get_users_with_runs_due：

In [10]: get_users_with_runs_due()                                                                                                                                                                  
Out[10]: <UserQuerySet [<User: test@gmail.com>]> . # <-- should be an empty queryset

In [107]: for user in _: 
              print(user.email, i.has_survey_due)  
test@gmail.com True  # <-- should be false

更新 #3：

在我之前的更新中，我对逻辑进行了一些更改以正确匹配我想要的内容，但忽略了提及或显示更改。下面是通过更改的 cmets 查询函数：

def get_users_with_runs_due():
    today = timezone.now()

    survey_runs = SurveyRun.objects.filter(
        survey=OuterRef('pk'),
        user=OuterRef(OuterRef('pk'))
    ).order_by('-submitted')

    pending_survey_runs = survey_runs.filter(submitted__isnull=True)

    surveys = Survey.objects.filter(
        users=OuterRef('pk')
    ).annotate(
        latest_submission_date=Subquery(
            survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
        )
    ).annotate(
        has_survey_runs=Exists(survey_runs)
    ).annotate(
        has_pending_runs=Exists(pending_survey_runs)
    ).filter(
        Q(has_survey_runs=False) | # either has no runs for this survey or
        ( # has no pending runs and submission date meets criteria
            Q(has_pending_runs=False, latest_submission_date__lte=today - F('interval'))
        )
    )

    return User.objects.annotate(has_survey_due=Exists(surveys)).filter(has_survey_due=True)

更新 #4：

我试图通过创建一个函数来隔离问题，该函数将由用户对调查进行大部分注释，以尝试在使用它查询用户模型之前检查该级别的注释。

def annotate_surveys_for_user(user):
    today = timezone.now()

    survey_runs = SurveyRun.objects.filter(
        survey=OuterRef('pk'),
        user=user
    ).order_by('-submitted')

    pending_survey_runs = survey_runs.filter(submitted=None)

    return Survey.objects.filter(
            users=user
        ).annotate(
            latest_submission_date=Subquery(
                survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
            )
        ).annotate(
            has_survey_runs=Exists(survey_runs)
        ).annotate(
            has_pending_runs=Exists(pending_survey_runs)
        )

这按预期工作。注释准确和过滤的地方：

result.filter(
    Q(has_survey_runs=False) |
        (
           Q(has_pending_runs=False) &
           Q(latest_submission_date__lte=today - F('interval'))
        )
    )

产生了预期的结果：一个空的查询集，用户不应该有任何运行到期，反之亦然。为什么将其设置为子查询并从用户模型查询时不起作用？

【问题讨论】：

你在使用 postgreSQL 吗？
@dirkgroten 是的
您使用的是哪个版本的 Django？其中一个答案提供了对子查询的引用，但这是一个相对较新的功能。
@JulienKieffer 2.2
has_pending_runs 的查询正在检查 True 而不是 False，已在下面修复。

标签： django django-models

【解决方案1】：

要注释用户是否有调查到期，我建议使用Subquery expression：

from django.db.models import Q, F, OuterRef, Subquery, Exists
from django.utils import timezone

today = timezone.now()

survey_runs = SurveyRun.objects.filter(survey=OuterRef('pk'), user=OuterRef(OuterRef('pk'))).order_by('-submitted')

pending_survey_runs = survey_runs.filter(submitted__isnull=True)

surveys = Survey.objects.filter(users=OuterRef('pk'))
   .annotate(latest_submission_date=Subquery(survey_runs.filter(submitted__isnull=False).values('submitted')[:1]))
   .annotate(has_survey_runs=Exists(survey_runs))
   .annotate(has_pending_runs=Exists(pending_survey_runs))
   .filter(Q(has_survey_runs=False) | Q(latest_submission_date__lte=today - F('interval')) & Q(has_pending_runs=False))

User.objects.annotate(has_survey_due=Exists(surveys))
    .filter(has_survey_due=True)

我仍在试图弄清楚如何做另一件事。您不能用另一个查询集注释查询集，值必须是字段等价物。不幸的是，您也不能将Subquery 用作queryset 参数到Prefetch。但是由于您使用的是 PostgreSQL，因此您可以使用 ArrayField 在包装值中列出调查的 id，但我还没有找到这样做的方法，因为您不能在 @987654328 中使用 aggregate @。

【讨论】：

获取TypeError: cannot unpack non-iterable Exists object。我怀疑是因为这个：~Q(Exists(survey_runs))。不指定字段可以使用 Q 吗？
我做了一个测试用户，有 1 个注册调查 A：间隔：2 周，2 次运行 'Run1'：提交：2019-08-15 17:10:54.096004+00:00 和 Run2 提交：None 和这个查询注释has_survey_due 为True
啊错过了。但这很容易添加。一秒
晚上剩下的时间都不会在线。明天回来。也许你对我的回答有一些想法，然后自己解决。
还在第一个注释中添加了一个过滤器，因为日期的空值实际上可能会弄乱排序，可能会有所作为。我添加了文档的链接，以便您自己查看是否有更多线索。