【发布时间】:2026-02-21 08:50:02
【问题描述】:
我有两个相同对象的 JavaRDD,我想将数据合并为一个。 它们是:
域
public class User {
String name;
String email;
String profession;
Integer age;
// constructor
// setters and getters
}
RDD 1
User user1 = new User ("Name", "email@email.com");
User user2 = new User ("Name2", "email2@email.com");
List<User> userList = new ArrayList<>();
userList.add(user1);
userList.add(user2);
JavaRDD<User> leftUserJavaRDD = sc.parallelize(userList);
RDD 2
User user3 = new User ("email@email.com", "Software Engineer", 26);
User user4 = new User ("email2@email.com", "Lawyer", 35);
List<User> userList2 = new ArrayList<>();
userList.add(user3);
userList.add(user4);
JavaRDD<User> rightUserJavaRDD = sc.parallelize(userList2);
我想将两个 RDD 与通用电子邮件地址结合起来。 我想要的组合 RDD 是:
User user1and3 = new User (
"Name",
"email@email.com",
"Software Engineer",
26);
User user2and4 = new User (
"Name2",
"email2@email.com",
"Lawyer",
35);
如何在 Spark 中使用 Java 做到这一点?
我尝试了union 和cartesian,但没有成功。
【问题讨论】:
标签: java apache-spark rdd