使用 java 流设置联合和交集答案

【问题标题】：Set union and intersection using java streams使用 java 流设置联合和交集
【发布时间】：2019-03-13 09:24:55
【问题描述】：

我目前有一个 java 程序，它使用嵌套的 for 循环来计算一组整数列表的并集和交集。如何使用 java parallel 流来做到这一点？我目前的代码如下

for(Set<Integer> x : listA) {
  for (Set<Integer> y : listB) {
       Set u = Sets.union(x,y); // Uses Guava library
       Set i = Sets.intersection(x,y);
  }
}

我想加快速度，因为 listA 和 listB 很大。

【问题讨论】：

由于您没有对结果做任何事情，因此仅删除整个代码是最快的解决方案。否则，您应该包含应该并行运行的实际操作。
@Holger 不可能使联合操作本身并行吗？
性能没有显着提升。并行执行union 或intersection 时，性能很可能会变差。正如您自己所说，listA 和 listB 很大，因此您应该专注于并行处理这些列表。
并行化并不容易。也许这种方法已经足够好了：stackoverflow.com/questions/7574311/…
@Holger 谢谢你的评论

标签： java java-8 set java-stream

【解决方案1】：

联合不需要流，但是，你可以将它用于交集，例如：

Set<Integer> setA = new HashSet<>(Arrays.asList(1,2,3));
Set<Integer> setB = new HashSet<>(Arrays.asList(2,3,4));
Set<Integer> union = new HashSet<>();
union.addAll(setA);
union.addAll(setB);

Set<Integer> intersection = setA.parallelStream()
        .filter(setB::contains)
        .collect(Collectors.toSet());

System.out.println("Union : " + union);
System.out.println("Intersection : " +intersection);

更新

以上代码使用 Java 的本机库和 streams 查找交集和联合。但是，如果您有一个集合列表，那么您可以将上述代码包装在函数中并从迭代两个列表的stream 调用它，例如：

private static void unionAndIntersection(Set<Integer> setA, Set<Integer> setB) {
    Set<Integer> union = new HashSet<>();
    union.addAll(setA);
    union.addAll(setB);

    Set<Integer> intersection = setA.parallelStream()
            .filter(setB::contains)
            .collect(Collectors.toSet());

    System.out.println("Union : " + union);
    System.out.println("Intersection : " +intersection);
}

public static void main(String[] args){ 
    List<Set<Integer>> listA = new ArrayList<>();
    List<Set<Integer>> listB = new ArrayList<>();
    listA.stream()
        .forEach(a -> {
            listB.stream()
            .forEach(b -> unionAndIntersection(a, b));
        });
}

【讨论】：

How to do this using java parallel streams ?这不是问题吗？
@DarshanMehta 好吧，更新后的答案用两个嵌套的 forEach 调用替换了两个嵌套的 for 循环......不知道这会如何表现得更好
@BackSlash 很好，不知道 OP 想要对结果做什么，没有更好的选择。

【解决方案2】：

如果您确保 y（和 x）已排序/变为已排序，如类 TreeSet，则以下使用特殊合并（内部方法 addAllForTreeSet）。

for (Set<Integer> x : listA) {
    for (SortedSet<Integer> y : listB) {
        SortedSet<Integer> u = new TreeSet(x);
        u.addAll(y);
        SortedSet<Integer> i = new TreeSet(x);
        i.retainAll(y);
    }
}

我不确定这是否真的更快。

如果整数不是太狂野，最好是 10_000。如果值为非负数，则可以立即使用BitSet 而不是Set<Integer>。

这是无与伦比的。使用具有可能容量（如 10_000）的 BitSet 构造函数。

for (BitSet x : listA) {
    for (BitSet y : listB) {
        BitSet u = x.clone();
        u.or(y);
        BitSet i = x.clone();
        i.and(y);
    }
}

您可能会使用并行流来节省等于处理器数量的因子。

listA.parallelStream().forEach(x -> {});

这是次要优化。

Guava 前几年没用过，是不是没有原始类型集int？

【讨论】：

BitSet 是否产生更好的性能取决于特定的价值分布。对于许多实际用例，它确实如此。但是考虑一个只包含零和Integer.MAX_VALUE的集合...
@Joop Eggen 如果您不介意，请给我看一个如何将一组整数转换为位集的示例？假设我的整数范围是 0 到 15000
@Koba BitSet bs = set.stream().collect(BitSet::new, BitSet::set, BitSet::or); 和 Set<Integer> set = bs.stream().boxed().collect(Collectors.toSet()); 用于另一个方向。
@Holger 谢谢，你打败了我，还有流，嗯。
@Holger 再次感谢您的帮助。很快就会尝试。

【解决方案3】：

值得注意的是，您不必将流用于联合和交集。有retainAll 方法只保留此集合中包含在指定集合中的元素：

Set<Integer> setA = new HashSet<>(Arrays.asList(1,2,3));
Set<Integer> setB = new HashSet<>(Arrays.asList(2,3,4));

setA.retainAll(setB);  // now setA has intersection

【讨论】：

这应该是一条评论。问题是：如何使用并行流而不是嵌套循环来计算联合和交集？ - 您不是在回答这个问题，而是在就如何避免使用外部库来完成这项工作提出建议。

【解决方案4】：

路口：

List<T> intersect = list1.stream()
                         .filter(list2::contains)
                         .collect(Collectors.toList());

联合：

List<T> union = Stream.concat(list1.stream(), list2.stream())
                                    .distinct()
                                    .collect(Collectors.toList());

【讨论】：