【发布时间】:2020-08-02 09:50:14
【问题描述】:
我的销售员数据框看起来像:总共有 54 销售员,我只举了 3 列的例子
Schema of SalesPerson table.
root
|-- col: struct (nullable = false)
| |-- SalesPerson_1: string (nullable = true)
| |-- SalesPerson_2: string (nullable = true)
| |-- SalesPerson_3: string (nullable = true)
销售员视图的数据。
SalesPerson_1|SalesPerson_2|SalesPerson_3
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Customer_1793, Customer_202, Customer_2461]
[Customer_2424, Customer_130, Customer_787]
[Customer_1061, Customer_318, Customer_706]
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
我的 salesplace 数据框看起来像
Schema of salesplace
root
|-- Place: string (nullable = true)
|-- Customer: string (nullable = true)
Data of salesplace
Place|Customer
Online| Customer_1793
Retail| Customer_1793
Retail| Customer_130
Online| Customer_130
Online| Customer_2461
Retail| Customer_2461
Online| Customer_2461
我正在尝试检查 Salesperson 表中的哪些客户在 SalesPlace 表中可用。
有两个additional column shows customer belong to salesperson
以及在 SalesPlace 表中出现的客户计数,用于
预期输出:
CustomerBelongstoSalesperson|Customer |occurance|
SalesPerson_1 |Customer_1793|2
SalesPerson_2 |Customer_130 |2
SalesPerson_3 |Customer_2461|3
SalesPerson_2 |Customer_202 |0
SalesPerson_1 |Customer_2424|0
SalesPerson_1 |Customer_1061|0
SalesPerson_2 |Customer_318 |0
SalesPerson_3 |Customer_787 |0
代码:
Error:
The number of aliases supplied in the AS clause does not match the number of columns output by the UDTF expected 54 aliases but got Salesperson,Customer ;
在 spark 中似乎没有那么重要。 我不确定是否可以将列名作为值带入列中...... 可能有人请帮我一些想法如何做到这一点............ 谢谢
【问题讨论】:
标签: apache-spark apache-spark-sql