【问题标题】:PostgreSQL - Select only 1 row for each IDPostgreSQL - 每个 ID 只选择 1 行
【发布时间】:2017-02-09 06:06:02
【问题描述】:

情况

我正在开发一个旅游引擎网站,并正在编写一个复杂的查询,以根据 IP 地址目的地 将访问者的搜索查询与他们的预订相匹配日期,以便我稍后计算转化率。

问题

需要有多个基于参数的转化率(在这种情况下,我从存储在搜索表中的 RequestUrl 中提取的 utm_source)。问题是一些用户从不同的位置进行多次搜索..有时我们在请求中得到 utm_source 有时没有......当然我们只需要匹配 1 个预订。请参阅下面的查询结果截图以更好地理解:

请参阅第 3 行和第 4 行具有相同的预订 ID 等。但 Value 列的值不同。我只需要选择其中之一,但不能同时选择两者。基本上,如果超过1,我需要选择不是“N/A”的1。

我的查询:

SELECT DISTINCT "B"."Id" AS "BookingId", "PQ"."IPAddress", "PQ"."To", "PQ"."SearchDate", "PQ"."Value"
FROM
(
    SELECT DISTINCT "IPAddress", "To", "CreatedAt"::date AS "SearchDate", COALESCE(SUBSTRING("RequestUrl", 'utm_source=([^&]*)'), 'N/A') AS "Value"
    FROM dbo."PackageQueries"
    WHERE "SiteId" = '<The ID>'
    AND "CreatedAt" >= '<Start Date>'
    AND "CreatedAt" < '<End Date>'
) AS "PQ"
INNER JOIN dbo."Bookings" AS "B"
    ON "PQ"."IPAddress" = "B"."IPAddress"
    AND "B"."To" = "PQ"."To"
    AND "B"."BookingDate"::date = "PQ"."SearchDate"
WHERE "B"."SiteId" = '<The ID>'
AND "B"."BookingStatus" = 2
AND "B"."BookingDate" >= '<Start Date>'
AND "B"."BookingDate" < '<End Date>'
ORDER BY "B"."Id", "PQ"."IPAddress", "PQ"."To";

【问题讨论】:

  • @a_horse_with_no_name,谢谢你的链接.. 而不是反对投票。 :-D 。这是一个比那些稍微复杂的情况。一方面,我不能只按一些现成的整数或日期/时间值排序,因此我认为它不值得投反对票。但就这样吧。我确实找到了解决方案,稍后我会发布我自己的答案...
  • 我没有投反对票
  • @a_horse_with_no_name,我很抱歉......我假设不正确。

标签: sql postgresql greatest-n-per-group


【解决方案1】:

我找到了一个解决方案,并基于我在这里找到的:Return rows that are max of one column in Postgresql 和这里:Postgres CASE in ORDER BY using an alias

我的解决方法如下:

SELECT "BookingId", "IPAddress", "To", "SearchDate", "Value"
FROM
(
    SELECT DISTINCT
        "B"."Id" AS "BookingId",
        "PQ"."IPAddress",
        "PQ"."To",
        "PQ"."SearchDate",
        "PQ"."Value",
        RANK() OVER
        (
            PARTITION BY "B"."Id"
            ORDER BY
            CASE
                WHEN "PQ"."Value" = 'N/A' THEN 1
                ELSE 0
            END
        ) AS "RowNumber"
    FROM
    (
        SELECT DISTINCT "IPAddress", "To", "CreatedAt"::date AS "SearchDate", COALESCE(SUBSTRING("RequestUrl", 'utm_source=([^&]*)'), 'N/A') AS "Value"
        FROM dbo."PackageQueries"
        WHERE "SiteId" = '<Site ID>'
        AND "CreatedAt" >= '<Start Date>'
        AND "CreatedAt" < '<End Date>'
    ) AS "PQ"
    INNER JOIN dbo."Bookings" AS "B"
        ON "PQ"."IPAddress" = "B"."IPAddress"
        AND "B"."To" = "PQ"."To"
        AND "B"."BookingDate"::date = "PQ"."SearchDate"
    WHERE "B"."SiteId" = '<Site ID>'
    AND "B"."BookingStatus" = 2
    AND "B"."BookingDate" >= '<Start Date>'
    AND "B"."BookingDate" < '<End Date>'
) T
WHERE "RowNumber" = 1
ORDER BY "BookingId", "IPAddress", "To";

有点啰嗦,但效果很好。我希望它可以帮助其他人。

编辑

故事还没有结束:在某些情况下,我得到的值超过 1。答案是修改 CASE 语句,为每个文本值生成一个唯一编号。可以在这里找到解决方案:PostgreSQL - Assign integer value to string in case statement

【讨论】:

    猜你喜欢
    • 2021-11-25
    • 1970-01-01
    • 2013-07-21
    • 2018-06-14
    • 2021-06-24
    • 2016-06-08
    • 1970-01-01
    • 1970-01-01
    • 2016-02-17
    相关资源
    最近更新 更多