从两个不同的 ArrayLists 中找到唯一交集的最有效方法？答案

【问题标题】：Most efficient way to find unique intersections from two different ArrayLists?从两个不同的 ArrayLists 中找到唯一交集的最有效方法？
【发布时间】：2015-01-28 15:41:25
【问题描述】：

我有两个 Arraylist，A 和 B。

ArrayList A 由包含一组数据的类组成，其中包括一个名为categoryID 的标识符。 A 中的多个项目可以具有相同的categoryID。对于 A 中的每个项目，CategoryID 可以如下所示：[1, 1, 2, 2, 3, 4, 7]。

ArrayList B 由包含不同数据集的不同类组成，包括categoryID。 categoryID 对于此列表中的每个项目都是唯一的。示例：[1, 2, 3, 4, 5, 6, 7]。

两个列表都按categoryID 排序，希望这会更容易。

我要做的是提出一个新列表 C，它由 listB 中的项目组成，这些项目至少与 listA 有一个交集。所以列表 C 应该包含来自上面给定输入的项目[1, 2, 3, 4, 7]。

到目前为止，我的策略是遍历这两个列表。我不认为这是最有效的方法，所以我想问一下我可以考虑的其他替代方案是什么。

我的方法：

ArrayList<classB> results = new ArrayList<classB>();
for (classA itemA : listA){
  int categoryID = item.categoryID;
  for (classB itemB : listB){
    if (itemB.categoryID == categoryID){
      if (!results.contains(itemB)){
        results.add(itemB);
      }
      break;
    }
  }
}

我先遍历列表A，获取categoryID，然后遍历listB，找到匹配的categoryID。当我找到它时，我检查结果列表是否包含来自 listB 的项目。如果没有，那么我将其添加到结果中并跳出内部 for 循环并继续遍历 listA。如果结果列表已经包含 itemB，那么我将简单地跳出内部 for 循环并继续遍历 listA。这种方法是 O(n^2)，对于大数据集不是很好。有什么改进的想法吗？

【问题讨论】：

标签： java performance optimization arraylist

【解决方案1】：

将 ListA 中的所有 categoryID 添加到 Set，我们称之为 setACategories。然后，循环遍历 ListB，如果setACategories 包含 ListB 中某个元素的 categoryID，则将该 ListB 的元素添加到 results。

results 也应该是 Set，因为看起来您只希望 listB 中的一个匹配项进入 results 而不是多个匹配项（允许您避免调用 (!results.contains(itemB))。

【讨论】：

不需要将第二个数组放入集合中。
效率如何？我需要遍历 listA 以将元素添加到集合中。然后我需要遍历 listB 以检查匹配的 categoryID。所以 O(n)？
@h_k 好吧，我想现在取决于 categoryID 是否将 1 到 1 映射到 classB 或 classA，如果它们属于同一个父类（我假设它们是，如果不是，我的回答不是有用）。
@NESPowerGlove classB 的 categoryID 与 classA 是一对多的关系，如果这可以澄清的话。
@h_k 好的，让我编辑我的答案以适合我认为您需要的内容。

【解决方案2】：

将 listA 中的 categoryID 值添加到 Set 中，然后遍历 listB，选择 categoryId 在您的集合中的那些元素。

【讨论】：

【解决方案3】：

目前最好的方法是使用 java 流：

List<foo> list1 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<foo> list2 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
list1.stream().filter(f -> list2.contains(f)).collect(Collectors.toList());

但是，我自己使用 apache commons 库来处理这类东西：

https://commons.apache.org/proper/commons-collections/javadocs/api-3.2.1/org/apache/commons/collections/CollectionUtils.html

【讨论】：

【解决方案4】：

你试过了吗：

public void test() {
    Collection c1 = new ArrayList();
    Collection c2 = new ArrayList();

    c1.add("Text 1");
    c1.add("Text 2");
    c1.add("Text 3");
    c1.add("Text 4");
    c1.add("Text 5");

    c2.add("Text 3");
    c2.add("Text 4");
    c2.add("Text 5");
    c2.add("Text 6");
    c2.add("Text 7");

    c1.retainAll(c2);

    for (Iterator iterator = c1.iterator(); iterator.hasNext();) {
        Object next = iterator.next();
        System.out.println(next);  //Output: Text 3, Text 4, Text 5
    }
}

【讨论】：

但不是空初始化。你应该删除那些，它们只会误导。否则 +1 以获得最佳答案。

【解决方案5】：

尝试使用Google Guava 中的Sets.intersection(Set<E> set1,Set<?> set2)。

当然，您可以使用Sets.newHashSet(Iterable<? extends E> elements) 将数组转换为集合

【讨论】：

【解决方案6】：

请参阅以下代码。我已经实现了一个交集，它利用它们被排序的事实来改进最佳答案的方法。

它有点像合并排序中的合并步骤，除了它确保交叉点。可能还有待改进，我30分钟就写完了。

使用当前数据，它的运行速度比最佳答案快约 17 倍。它还节省了 O(n) 内存，因为它只需要一组

另见：The intersection of two sorted arrays

import java.util.*;

public class test {
    public static void main (String[] args) {
        List<Integer> a1 = new ArrayList<Integer>();
        List<Integer> a2 = new ArrayList<Integer>();
        Random r = new Random();

        for(int i = 0; i < 1000000; i++) {
            a1.add(r.nextInt(1000000));
            a2.add(r.nextInt(1000000));
        }

        Collections.sort(a1);
        Collections.sort(a2);

        System.out.println("Starting");

        long t1 = System.currentTimeMillis();
        Set<Integer> set1 = func1(a1, a2);
        long t2 = System.currentTimeMillis();

        System.out.println("Func1 done in: " + (t2-t1) + " milliseconds.");

        long t3 = System.currentTimeMillis();
        Set<Integer> set2 = func2(a1, a2);
        long t4 = System.currentTimeMillis();

        System.out.println("Func2 done in: " + (t4-t3) + " milliseconds.");

        if(set1.size() != set2.size()) {
            System.out.println("ERROR - sizes not equal");
            System.exit(1);
        }

        for(Integer t : set1) {
            if (!set2.contains(t)) {
                System.out.println("ERROR");
                System.exit(1);
            }
        }
    }

    public static Set<Integer> func1(List<Integer> a1, List<Integer> a2) {
        Set<Integer> intersection = new HashSet<Integer>();

        int index = 0;
        for(Integer a : a1) {

            while( index < a2.size() && a2.get(index) < a) {
                index++;
            } 

            if(index == a2.size()) { 
                break;
            }
            if (a2.get(index).equals(a)) {
                intersection.add(a);
            } else {
                continue;
            }

        }

        return intersection;
    }

    public static Set<Integer> func2(List<Integer> a1, List<Integer> a2) {
        Set<Integer> intersection = new HashSet<Integer>();
        Set<Integer> tempSet = new HashSet<Integer>();
        for(Integer a : a1) {
            tempSet.add(a);
        }

        for(Integer b : a2) {
            if(tempSet.contains(b)) {
                intersection.add(b);
            }
        }

        return intersection;
    }
}

【讨论】：