【发布时间】:2020-09-22 14:56:25
【问题描述】:
我正在尝试从下面的两个列表中创建多个 DataFrame,
val paths = ListBuffer("s3://abc_xyz_tableA.json",
"s3://def_xyz_tableA.json",
"s3://abc_xyz_tableB.json",
"s3://def_xyz_tableB.json",
"s3://abc_xyz_tableC.json",....)
val tableNames = ListBuffer("tableA","tableB","tableC","tableD",....)
我想使用表名创建不同的数据框,方法是将所有以 s3 路径结尾的公共表名放在一起,因为它们具有唯一的架构。
so for example if the tables and paths related to it are brought together then -
"tableADF" will have all the data from these paths "s3://abc_xyz_tableA.json", "s3://def_xyz_tableA.json" as they have "tableA" in the path
"tableBDF" will have all the data from these paths "s3://abc_xyz_tableB.json", "s3://def_xyz_tableB.json" as they have "tableB" in the path
and so on there can be many tableNames and Paths
我正在尝试不同的方法,但还没有成功。 实现所需解决方案的任何线索都将大有帮助。谢谢!
【问题讨论】:
-
我已添加解决方案并检查一次。
标签: json scala apache-spark amazon-s3