【发布时间】:2020-06-16 11:26:22
【问题描述】:
我需要从 Hive 表中选择所有非空列并将它们插入 Hbase。例如,考虑下表:
Name Place Department Experience
==============================================
Ram | Ramgarh | Sales | 14
Lakshman | Lakshmanpur |Operations |
Sita | Sitapur | | 14
Ravan | | | 25
我必须将上表中的所有非空列写入 Hbase。所以我写了一个逻辑来在数据框的一列中获取非空列,如下所示。那里的名称列是强制性的。
Name Place Department Experience Not_null_columns
================================================================================
Ram Ramgarh Sales 14 Name, Place, Department, Experience
Lakshman Lakshmanpur Operations Name, Place, Department
Sita Sitapur 14 Name, Place, Experience
Ravan 25 Name, Experience
现在我的要求是在数据框中创建一个列,其中所有非空列的值都在一个列中,如下所示。
Name Place Department Experience Not_null_columns_values
Ram Ramgarh Sales 14 Name: Ram, Place: Ramgarh, Department: Sales, Experince: 14
Lakshman Lakshmanpur Operations Name: Lakshman, Place: Lakshmanpur, Department: Operations
Sita Sitapur 14 Name: Sita, Place: Sitapur, Experience: 14
Ravan 25 Name: Ravan, Experience: 25
一旦超过 df,我会将其写入 Hbase,名称为键,最后一列为值。
如果有更好的方法可以做到这一点,请告诉我。
【问题讨论】:
标签: apache-spark apache-spark-sql