【问题标题】:Django ORM individual query raw SQLs are different when they are bound with ORDjango ORM单个查询原始SQL与OR绑定时是不同的
【发布时间】:2019-08-21 22:33:47
【问题描述】:

对于提前格式化碍眼的问题,我深表歉意。随意编辑我的问题以获得更好的可读性。

我有四个模型:

class Datasets(models.Model):
    name = models.CharField(max_length=150)
    description = models.TextField()

class Assay(models.Model):
    dataset = models.ForeignKey(Datasets)
    name = models.CharField(max_length=150)
    type = models.CharField(max_length=150)

class Compounds(models.Model):
    dataset = models.ForeignKey(Datasets)
    name = models.TextField()
    deleted = models.BooleanField(default=False)

class Assays(models.Model):
    compound = models.ForeignKey(Compounds)
    assay = models.ForeignKey(Assay)
    value = models.DecimalField(max_digits=30, decimal_places=16)
    deleted = models.BooleanField(default=False)

我正在使用用户输入构建查询,具体取决于所选的 Assay。我正在使用JOINs 根据反向关系过滤结果。用户选择Assay,然后我根据选择过滤化合物。用户还可以选择“No Assay”选项,该选项应返回未注册化验的化合物(即,Assays 模型中没有该化合物的条目)。

selected_assay_id = 5 # Received from frontend
no_assay_option_selected = True/False # Received from frontend
dataset_id = 1
filter_query = Q()

filter_query.add(Q(**{
        'assays__assay__id': selected_assay_id,
        'assays__compound__id': F('id'),
        'assays__deleted': False
    }), Q.OR)

if no_assay_option_selected:
     filter_query.add(~Q(**{
            'assays__deleted': False,
            'assays__compound__id': F('id')
         }), Q.OR)

compounds = Compounds.objects.filter(filter_query, dataset__id=dataset_id).distinct()

当我选择一种检测方法时,它的效果很好。当我选择“No Assay”时,效果很好。但是,当我选择一个化验和“无化验”时,所有化合物都会返回,而不是选择化验和“无化验”的化合物。当我检查原始 SQL 查询时,我意识到后面的查询有一个额外的部分:

-- Only 'No Assay' option selected
SELECT DISTINCT * FROM "compounds" WHERE (
  NOT ("compounds"."id" IN
        (SELECT U1."compound_id" FROM "compounds" U0
          INNER JOIN "assays" U1 ON (U0."id" = U1."compound_id")
          WHERE U1."compound_id" = (U0."id")) AND "compounds"."id" IN
            (SELECT U1."compound_id" FROM "assays" U1
              WHERE U1."deleted" = False)
      )
  AND "compounds"."dataset_id" = 1 AND "compounds"."deleted" = False
)


-- An assay is selected
SELECT DISTINCT * FROM "compounds" 
INNER JOIN "assays" ON ("compounds"."id" = "assays"."compound_id") 
WHERE (
  "assays"."assay_id" = 5
  AND "assays"."compound_id" = ("compounds"."id") 
  AND "assays"."deleted" = False 
  AND "compounds"."dataset_id" = 1 
  AND "compounds"."deleted" = False
)


-- An assay and 'No Assay' option selected
SELECT DISTINCT * FROM "compounds" 
LEFT OUTER JOIN "assays" ON ("compounds"."id" = "assays"."compound_id") 
WHERE (
  (
    (
      "assays"."assay_id" = 5 
      AND "assays"."compound_id" = ("compounds"."id") 
      AND "assays"."deleted" = False
    ) OR NOT (
      "compounds"."id" IN 
      (SELECT U1."compound_id" FROM "compounds" U0 
        INNER JOIN "assays" U1 ON (U0."id" = U1."compound_id") 
        WHERE (U1."compound_id" = (U0."id") AND U1."id" = ("assays"."id"))
      ) AND "compounds"."id" IN 
        (SELECT U1."compound_id" FROM "assays" U1 
          WHERE (U1."deleted" = False AND U1."id" = ("assays"."id"))
      )
    )
  )
  AND "compounds"."dataset_id" = 1 AND "compounds"."deleted" = False
)

这是最后一个查询中的额外部分:AND U1."id" = ("assays"."id") 会导致奇怪的结果。当我删除它并运行原始查询时,我得到了想要的结果。

我的问题是:为什么 Django 会这样做,我该如何解决?

【问题讨论】:

    标签: sql django


    【解决方案1】:

    您可能在这里偶然发现了一个实际的 Django 错误。即使执行INNER JOIN,该行为仍然存在。如果您在no_assay_option_selected if 块中修改您的过滤器构造函数以使用Q.AND(即使进行这样的查询没有逻辑意义),您会发现生成了一个INNER JOIN 查询,其中AND U1."id" = ("assays"."id") 语句仍然存在。

    但是,您可以使用一种解决方法:

    filter_query = Q(
        assays__assay__id=selected_assay_id,
        assays__deleted=False
    )
    
    if no_assay_option_selected:
         filter_query |= ~Q(
             id__in=Assays.objects.filter(deleted=False).values_list('compound_id'))
         )
    
    compounds = Compounds.objects.filter(filter_query, dataset__id=dataset_id).distinct()
    

    请注意,我还删除了您在构建查询时使用的 'assays__compound__id': F('id')。这是不必要的,因为这是加入 ON 的条件,所以将其附加为 WHERE 没有任何用处。

    上面生成的SQL应该是:

    SELECT DISTINCT * FROM "compounds" 
    LEFT OUTER JOIN "assays" ON ("compounds"."id" = "assays"."compound_id") 
    WHERE (
      (
        (
          "assays"."assay_id" = 5 
          AND "assays"."deleted" = False
        ) OR NOT (
          "compounds"."id" IN
          (SELECT U0."compound_id" FROM "assays" U0 WHERE U0."deleted" = false)
        )
      )
      AND "compounds"."dataset_id" = 1 AND "compounds"."deleted" = False
    )
    

    据我所知,这就是你想要的。连接仍然需要是 LEFT OUTER 才能包含没有检测的化合物。

    【讨论】:

    • 它有效,谢谢!不过,我仍然需要我的解决方案 (stackoverflow.com/a/57600739) 才能使其正常工作。我认为我需要INNER JOIN,我认为它不适用于OUTER JOIN
    • 很高兴听到它有效!我很惊讶您需要使用assays__isnull=False(或一般的INNER JOIN),因为这会排除没有检测的化合物(但它会包括删除检测的化合物)。
    • 其实我收回我之前的评论,我不需要.filter(assays__isnull=False)。我昨天犯了一个错误。您的解决方案完美运行!
    【解决方案2】:

    我认为 Django ORM 更喜欢使用 OUTER JOIN 而不是 INNER JOIN,即使内部连接正是您想要的,就像我的情况一样。幸运的是,有一种方法可以使用内部连接来强制 ORM:

    Compounds.objects.filter(assays__isnull=False).filter(filter_query, dataset__id=dataset_id).distinct()
    

    第一个过滤器告诉使用 INNER JOIN 是安全的 (Reference)。

    【讨论】:

      猜你喜欢
      • 2018-05-15
      • 2019-08-05
      • 2020-10-21
      • 2022-08-18
      • 2017-04-03
      • 2019-06-11
      • 2021-09-23
      • 2017-08-24
      • 1970-01-01
      相关资源
      最近更新 更多