【问题标题】:Spark: Combine two Java object RDDs into oneSpark:将两个 Java 对象 RDD 合二为一
【发布时间】:2026-02-21 08:50:02
【问题描述】:

我有两个相同对象的 JavaRDD,我想将数据合并为一个。 它们是:

public class User {
    String name;
    String email;
    String profession;
    Integer age;

    // constructor

    // setters and getters
}

RDD 1

User user1 = new User ("Name", "email@email.com");
User user2 = new User ("Name2", "email2@email.com");

List<User> userList = new ArrayList<>();
userList.add(user1);
userList.add(user2);

JavaRDD<User> leftUserJavaRDD = sc.parallelize(userList);

RDD 2

User user3 = new User ("email@email.com", "Software Engineer", 26);
User user4 = new User ("email2@email.com", "Lawyer", 35);

List<User> userList2 = new ArrayList<>();
userList.add(user3);
userList.add(user4);

JavaRDD<User> rightUserJavaRDD = sc.parallelize(userList2);

我想将两个 RDD 与通用电子邮件地址结合起来。 我想要的组合 RDD 是:

User user1and3 = new User (
        "Name",
        "email@email.com",
        "Software Engineer",
        26);

User user2and4 = new User (
        "Name2",
        "email2@email.com",
        "Lawyer",
        35);

如何在 Spark 中使用 Java 做到这一点? 我尝试了unioncartesian,但没有成功。

【问题讨论】:

    标签: java apache-spark rdd


    【解决方案1】:

    我从一位同事那里得到了帮助,这是我们得到的解决方案。

    import org.apache.spark.api.java.JavaPairRDD;
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.function.Function2;
    import scala.Tuple2;
    
    import java.util.List;
    
    public JavaRDD<User> getCombinedUsers(JavaRDD<User> leftUserJavaRDD, JavaRDD<User> rightUserJavaRDD) {
    
         JavaPairRDD<String, User> leftUserJavaPairRDD =
                    leftUserJavaRDD.mapToPair(user -> new Tuple2<>(user.getEmail(), user));
    
         JavaPairRDD<String, User> rightUserJavaPairRDD =
                    rightUserJavaRDD.mapToPair(user -> new Tuple2<>(user.getEmail(), user));
    
         return leftUserJavaPairRDD
                    .union(rightUserJavaPairRDD)
                    .reduceByKey(merge).values();
    }
    
    /**
     * Reduce Function for merging User with no profession and age information with the one that has profession and age information.
     */
    private static Function2<User, User, User> merge =
                (User left, User right) ->
                        new User(left.getName(), left.getEmail(), right.getProfession(), right.getAge());
    

    【讨论】:

      最近更新 更多