【问题标题】:Cross join remaining combinations交叉连接剩余组合
【发布时间】:2021-03-14 10:41:02
【问题描述】:

我正在尝试构建一个表格,该表格将根据当前的产品组合我可以销售的所有产品。

产品状态表

+-------------+--------------+----------------+
| customer_id | product_name | product_status |
+-------------+--------------+----------------+
|           1 | A            | Active         |
|           2 | B            | Active         |
|           2 | C            | Active         |
|           3 | A            | Cancelled      |
+-------------+--------------+----------------+

现在我正在尝试使用硬代码表进行交叉连接,该表将根据我们产品组合中的所有 4 种产品以及我想申请的状态为每个 customer_id 提供 4 行。

投资组合表


+--------------+------------+----------+
| product_name |  status_1  | status_2 |
+--------------+------------+----------+
| A            | Inelegible | Inactive |
| B            | Inelegible | Inactive |
| C            | Ineligible | Inactive |
| D            | Inelegible | Inactive |
+--------------+------------+----------+


在我的代码中,我尝试使用 CROSS JOIN 来实现每个 customer_id 4 行。不幸的是,对于拥有不止一种产品的客户,我有两排/三排。

这是我的代码:

SELECT
    p.customer_id,
    CASE WHEN p.product_name = pt.product_name THEN p.product_name ELSE pt.product_name END AS product_name,
    CASE 
        WHEN p.product_name = pt.product_name THEN p.product_status 
        ELSE pt.status_1
    END AS product_status
FROM 
    products AS p
CROSS JOIN
    portfolio as pt

这是我当前的输出:


+----+-------------+--------------+----------------+
| #  | customer_id | product_name | product_status |
+----+-------------+--------------+----------------+
|  1 |           1 | A            | Active         |
|  2 |           1 | B            | Inelegible     |
|  3 |           1 | C            | Inelegible     |
|  4 |           1 | D            | Inelegible     |
|  5 |           2 | A            | Ineligible     |
|  6 |           2 | A            | Ineligible     |
|  7 |           2 | B            | Active         |
|  8 |           2 | B            | Ineligible     |
|  9 |           2 | C            | Active         |
| 10 |           2 | C            | Ineligible     |
| 11 |           2 | D            | Ineligible     |
| 12 |           2 | D            | Ineligible     |
| 13 |           3 | A            | Cancelled      |
| 14 |           3 | B            | Ineligible     |
| 15 |           3 | C            | Ineligible     |
| 16 |           3 | D            | Ineligible     |
+----+-------------+--------------+----------------+

如您所见,对于 customer_id 2,每个产品有两行,产品 B 和 C 的状态与 product_status 表中的状态不同。

在这种情况下,我想要实现的是一个有 12 行的表,其中显示了 product_status 表中的当前产品/状态,其余的产品/状态来自投资组合表已添加。

预期输出


+----+-------------+--------------+----------------+
| #  | customer_id | product_name | product_status |
+----+-------------+--------------+----------------+
|  1 |           1 | A            | Active         |
|  2 |           1 | B            | Inelegible     |
|  3 |           1 | C            | Inelegible     |
|  4 |           1 | D            | Inelegible     |
|  5 |           2 | A            | Ineligible     |
|  6 |           2 | B            | Active         |
|  7 |           2 | C            | Active         |
|  8 |           2 | D            | Ineligible     |
|  9 |           3 | A            | Cancelled      |
| 10 |           3 | B            | Ineligible     |
| 11 |           3 | C            | Ineligible     |
| 12 |           3 | D            | Ineligible     |
+----+-------------+--------------+----------------+

不确定 CROSS JOIN 是否是最佳选择,但现在我的想法已经不多了。

【问题讨论】:

  • 预期输出 ??
  • @Srinivas 我使用的是纯蜂巢。不火花

标签: hive hiveql cross-join


【解决方案1】:

编辑:

我想到了另一种更清洁的解决方案。先进行交叉连接,然后在 customer_id 和 product_name 上进行右连接,然后合并产品状态。

SELECT customer_id, product_name, coalesce(product_status, status_1)
FROM products p
RIGHT JOIN (
    SELECT * 
    FROM (SELECT DISTINCT customer_id FROM products) pro
    CROSS JOIN portfolio
) pt
USING (customer_id, product_name)
ORDER BY customer_id, product_name

旧答案: 想法是将 customer_id 的所有产品名称信息包含在一个列表中,并检查产品组合中的产品是否在该列表中。

(SELECT customer_id, pt_product_name as product_name, first(status_1) as product_status
FROM (
    SELECT
        customer_id,
        p.product_name as p_product_name,
        pt.product_name as pt_product_name,
        product_status,
        status_1,
        status_2,
        collect_list(p.product_name) over (partition by customer_id) AS product_list
    FROM products p
    CROSS JOIN portfolio pt
    )
WHERE NOT array_contains(product_list, pt_product_name)
GROUP BY customer_id, product_name)

UNION ALL

(SELECT customer_id, p_product_name as product_name, first(product_status) as product_status
FROM (
    SELECT
        customer_id,
        p.product_name as p_product_name,
        pt.product_name as pt_product_name,
        product_status,
        status_1,
        status_2,
        collect_list(p.product_name) over (partition by customer_id) AS product_list 
    FROM products p
    CROSS JOIN portfolio pt)
WHERE array_contains(product_list, pt_product_name)
GROUP BY customer_id, product_name)

ORDER BY customer_id, product_name;

给了

+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
|          1|           A|        Active|
|          1|           B|    Inelegible|
|          1|           C|    Ineligible|
|          1|           D|    Inelegible|
|          2|           A|    Inelegible|
|          2|           B|        Active|
|          2|           C|        Active|
|          2|           D|    Inelegible|
|          3|           A|     Cancelled|
|          3|           B|    Inelegible|
|          3|           C|    Ineligible|
|          3|           D|    Inelegible|
+-----------+------------+--------------+

仅供参考UNION ALL 之前的块给出:

+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
|          1|           B|    Inelegible|
|          1|           C|    Ineligible|
|          1|           D|    Inelegible|
|          2|           A|    Inelegible|
|          2|           D|    Inelegible|
|          3|           B|    Inelegible|
|          3|           C|    Ineligible|
|          3|           D|    Inelegible|
+-----------+------------+--------------+

UNION ALL 之后的块给出:

+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
|          1|           A|        Active|
|          2|           B|        Active|
|          2|           C|        Active|
|          3|           A|     Cancelled|
+-----------+------------+--------------+

希望有帮助!

【讨论】:

  • 对数组和 FIRST 的洞察力非常好。我不得不稍微调整一下,因为在 hive 上只有 first_value 可用,但我的结果符合预期!谢谢
  • @HeberBrandao 我添加了一个更好的解决方案 - 看看是否有帮助!
猜你喜欢
  • 2012-10-07
  • 2019-05-31
  • 2015-07-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-05-01
  • 1970-01-01
  • 2011-03-14
相关资源
最近更新 更多