使用发布的方法运行一些测试,而不是构造一个新的 HashSet。也就是说,让A 是集合中较小的一个,B 是较大的集合,然后,对于A 中的每个项目,如果它也存在于 B 中,则将其添加到 C(一个新的 HashSet)——只是计数,中间的C集合可以跳过。
就像发布的方法一样,这应该是 O(|A|) 的成本,因为迭代是 O(|A|) 而对 B 的探测是 O(1)。我不知道它将如何与克隆和删除方法进行比较。
编码愉快——并发布一些结果;-)
实际上,进一步思考,我相信这比帖子中的方法有更好的界限:O(|A|) vs O(|A| + |B|)。我不知道这是否会在现实中产生任何影响(或改进),我只希望它在|A| <<< |B| 时才有意义。
好吧,我真的很无聊。至少在 JDK 7 (Windows 7 x64) 上,看来帖子中提出的方法比上述方法慢——好极了(尽管看起来大部分是恒定的)因素。我的眼球猜测表明,它比上述仅使用计数器的建议慢 四倍,而在创建新 HashSet 时,它慢 两倍。这似乎在不同的初始集合大小中“大致一致”。
(请记住,正如 Voo 指出的那样,上面的数字和这个微基准假设正在使用 HashSet! 而且,与往常一样,微基准存在危险。 YMMV。)
以下是丑陋的结果(以毫秒为单位):
1x1 运行测试
IntersectTest$PostMethod@6cc2060e 花了 13.9808544 计数=1000000
IntersectTest$MyMethod1@7d38847d 花了 2.9893732 count=1000000
IntersectTest$MyMethod2@9826ac5 花了 7.775945 计数=1000000
运行 1x10 的测试
IntersectTest$PostMethod@67fc9fee 花了 12.4647712 count=734000
IntersectTest$MyMethod1@7a67f797 花了 3.1567252 count=734000
IntersectTest$MyMethod2@3fb01949 花了 6.483941 count=734000
运行 1x100 的测试
IntersectTest$PostMethod@16675039 花了 11.3069326 count=706000
IntersectTest$MyMethod1@58c3d9ac 花了 2.3482693 count=706000
IntersectTest$MyMethod2@2207d8bb 花了 4.8687103 count=706000
运行 1x1000 的测试
IntersectTest$PostMethod@33d626a4 花了 10.28656 count=729000
IntersectTest$MyMethod1@3082f392 花了 2.3478658 count=729000
IntersectTest$MyMethod2@65450f1f 花了 4.109205 count=729000
10x2 运行测试
IntersectTest$PostMethod@55c4d594 花了 10.4137618 count=736000
IntersectTest$MyMethod1@6da21389 花了 2.374206 count=736000
IntersectTest$MyMethod2@2bb0bf9a 花了 4.9802039 count=736000
运行 10x10 的测试
IntersectTest$PostMethod@7930ebb 花了 25.811083 计数=4370000
IntersectTest$MyMethod1@47ac1adf 花了 6.9409306 count=4370000
IntersectTest$MyMethod2@74184b3b 花了 14.2603248 count=4370000
运行 10x100 的测试
IntersectTest$PostMethod@7f423820 花了 25.0577691 计数=4251000
IntersectTest$MyMethod1@5472fe25 花了 6.1376042 count=4251000
IntersectTest$MyMethod2@498b5a73 花了 13.9880385 count=4251000
运行 10x1000 的测试
IntersectTest$PostMethod@3033b503 花了 25.0312716 count=4138000
IntersectTest$MyMethod1@12b0f0ae 花了 6.0932898 count=4138000
IntersectTest$MyMethod2@1e893918 花了 13.8332505 count=4138000
运行 100x1 的测试
IntersectTest$PostMethod@6366de01 花了 9.4531628 count=700000
IntersectTest$MyMethod1@767946a2 花了 2.4284762 count=700000
IntersectTest$MyMethod2@140c7272 花了 4.7580235 count=700000
运行 100x10 的测试
IntersectTest$PostMethod@3351e824 花了 24.9788668 count=4192000
IntersectTest$MyMethod1@465fadce 花了 6.1462852 计数=4192000
IntersectTest$MyMethod2@338bd37a 花了 13.1742654 计数=4192000
运行 100x100 的测试
IntersectTest$PostMethod@297630d5 花了 193.0121077 count=41047000
IntersectTest$MyMethod1@e800537 花了 45.2652397 count=41047000
IntersectTest$MyMethod2@76d66550 花了 120.8494766 count=41047000
运行 100x1000 的测试
IntersectTest$PostMethod@33576738 花了 199.6269531 count=40966000
IntersectTest$MyMethod1@2f39a7dd 花了 45.5255814 count=40966000
IntersectTest$MyMethod2@723bb663 花了 122.1704975 count=40966000
1x1 运行测试
IntersectTest$PostMethod@35e3bdb5 花了 9.5598373 count=1000000
IntersectTest$MyMethod1@7abbd1b6 花了 2.6359174 count=1000000
IntersectTest$MyMethod2@40c542ad 花了 6.1091794 count=1000000
运行 1x10 的测试
IntersectTest$PostMethod@3c33a0c5 花了 9.4648528 count=733000
IntersectTest$MyMethod1@61800463 花了 2.302116 count=733000
IntersectTest$MyMethod2@1ba03197 花了 5.4803628 count=733000
运行 1x100 的测试
IntersectTest$PostMethod@71b8da5 花了 9.4971057 count=719000
IntersectTest$MyMethod1@21f04f48 花了 2.2983538 count=719000
IntersectTest$MyMethod2@27e51160 花了 5.3926902 count=719000
运行 1x1000 的测试
IntersectTest$PostMethod@2fe7106a 花了 9.4702331 count=692000
IntersectTest$MyMethod1@6ae6b7b7 花了 2.3013066 count=692000
IntersectTest$MyMethod2@51278635 花了 5.4488882 count=692000
10x2 运行测试
IntersectTest$PostMethod@223b2d85 花了 9.5660879 count=743000
IntersectTest$MyMethod1@5b298851 花了 2.3481445 count=743000
IntersectTest$MyMethod2@3b4ac99 花了 4.8268489 count=743000
运行 10x10 的测试
IntersectTest$PostMethod@51c700a0 花了 23.0709476 计数=4326000
IntersectTest$MyMethod1@5ffa3251 花了 5.5460785 count=4326000
IntersectTest$MyMethod2@22fd9511 花了 13.4853948 count=4326000
运行 10x100 的测试
IntersectTest$PostMethod@46b49793 花了 25.1295491 count=4256000
IntersectTest$MyMethod1@7a4b5828 花了 5.8520418 count=4256000
IntersectTest$MyMethod2@6888e8d1 花了 14.0856942 count=4256000
运行 10x1000 的测试
IntersectTest$PostMethod@5339af0d 花了 25.1752685 计数=4158000
IntersectTest$MyMethod1@7013a92a 花了 5.7978328 count=4158000
IntersectTest$MyMethod2@1ac73de2 花了 13.8914112 count=4158000
运行 100x1 的测试
IntersectTest$PostMethod@3d1344c8 花了 9.5123442 count=717000
IntersectTest$MyMethod1@3c08c5cb 花了 2.34665 计数=717000
IntersectTest$MyMethod2@63f1b137 花了 4.907277 count=717000
运行 100x10 的测试
IntersectTest$PostMethod@71695341 花了 24.9830339 count=4180000
IntersectTest$MyMethod1@39d90a92 花了 5.8467864 count=4180000
IntersectTest$MyMethod2@584514e9 花了 13.2197964 count=4180000
运行 100x100 的测试
IntersectTest$PostMethod@21b8dc91 花了 195.1796213 count=41060000
IntersectTest$MyMethod1@6f98c4e2 花了 44.5775162 count=41060000
IntersectTest$MyMethod2@16a60aab 花了 121.1754402 count=41060000
运行 100x1000 的测试
IntersectTest$PostMethod@20b37d62 花了 200.973133 count=40940000
IntersectTest$MyMethod1@67ecbdb3 花了 45.4832226 count=40940000
IntersectTest$MyMethod2@679a6812 花了 121.791293 count=40940000
1x1 运行测试
IntersectTest$PostMethod@237aa07b 花了 9.2210288 count=1000000
IntersectTest$MyMethod1@47bdfd6f 花了 2.3394042 计数=1000000
IntersectTest$MyMethod2@a49a735 花了 6.1688936 count=1000000
运行 1x10 的测试
IntersectTest$PostMethod@2b25ffba 花了 9.4103967 count=736000
IntersectTest$MyMethod1@4bb82277 花了 2.2976994 count=736000
IntersectTest$MyMethod2@25ded977 花了 5.3310813 count=736000
运行 1x100 的测试
IntersectTest$PostMethod@7154a6d5 花了 9.3818786 count=704000
IntersectTest$MyMethod1@6c952413 花了 2.3014931 count=704000
IntersectTest$MyMethod2@33739316 花了 5.3307998 count=704000
运行 1x1000 的测试
IntersectTest$PostMethod@58334198 花了 9.3831841 count=736000
IntersectTest$MyMethod1@d178f65 花了 2.3071236 count=736000
IntersectTest$MyMethod2@5c7369a 花了 5.4062184 count=736000
10x2 运行测试
IntersectTest$PostMethod@7c4bdf7c 花了 9.4040537 count=735000
IntersectTest$MyMethod1@593d85a4 花了 2.3584088 计数=735000
IntersectTest$MyMethod2@5610ffc1 花了 4.8318229 count=735000
运行 10x10 的测试
IntersectTest$PostMethod@46bd9fb8 花了 23.004925 计数=4331000
IntersectTest$MyMethod1@4b410d50 花了 5.5678172 count=4331000
IntersectTest$MyMethod2@1bd125c9 花了 14.6517184 count=4331000
运行 10x100 的测试
IntersectTest$PostMethod@75d6ecff 花了 25.0114913 count=4223000
IntersectTest$MyMethod1@716195c9 花了 5.798676 count=4223000
IntersectTest$MyMethod2@3db0f946 花了 13.8064737 count=4223000
运行 10x1000 的测试
IntersectTest$PostMethod@761d8e2a 花了 25.1910652 count=4292000
IntersectTest$MyMethod1@e60a3fb 花了 5.8621189 count=4292000
IntersectTest$MyMethod2@6aadbb1c 花了 13.8150282 count=4292000
运行 100x1 的测试
IntersectTest$PostMethod@48a50a6e 花了 9.4141906 count=736000
IntersectTest$MyMethod1@4b4fe104 花了 2.3507252 count=736000
IntersectTest$MyMethod2@693df43c 花了 4.7506854 count=736000
运行 100x10 的测试
IntersectTest$PostMethod@4f7d29df 花了 24.9574096 计数=4219000
IntersectTest$MyMethod1@2248183e 花了 5.8628954 计数=4219000
IntersectTest$MyMethod2@2b2fa007 花了 12.9836817 count=4219000
运行 100x100 的测试
IntersectTest$PostMethod@545d7b6a 花了 193.2436192 count=40987000
IntersectTest$MyMethod1@4551976b 花了 44.634367 count=40987000
IntersectTest$MyMethod2@6fac155a 花了 119.2478037 count=40987000
运行 100x1000 的测试
IntersectTest$PostMethod@7b6c238d 花了 200.4385174 count=40817000
IntersectTest$MyMethod1@78923d48 花了 45.6225227 count=40817000
IntersectTest$MyMethod2@48f57fcf 花了 121.0602757 count=40817000
1x1 运行测试
IntersectTest$PostMethod@102c79f4 花了 9.0931408 count=1000000
IntersectTest$MyMethod1@57fa8a77 花了 2.3309466 count=1000000
IntersectTest$MyMethod2@198b7c1 花了 5.7627226 count=1000000
运行 1x10 的测试
IntersectTest$PostMethod@8c646d0 花了 9.3208571 count=726000
IntersectTest$MyMethod1@11530630 花了 2.3123797 count=726000
IntersectTest$MyMethod2@61bb4232 花了 5.405318 count=726000
运行 1x100 的测试
IntersectTest$PostMethod@1a137105 花了 9.387384 count=710000
IntersectTest$MyMethod1@72610ca2 花了 2.2938749 count=710000
IntersectTest$MyMethod2@41849a58 花了 5.3865938 count=710000
运行 1x1000 的测试
IntersectTest$PostMethod@100001c8 花了 9.4289031 count=696000
IntersectTest$MyMethod1@7074f9ac 花了 2.2977923 count=696000
IntersectTest$MyMethod2@fb3c4e2 花了 5.3724119 count=696000
10x2 运行测试
IntersectTest$PostMethod@52c638d6 花了 9.4074124 count=775000
IntersectTest$MyMethod1@53bd940e 花了 2.3544881 计数=775000
IntersectTest$MyMethod2@43434e15 花了 4.9228549 count=775000
运行 10x10 的测试
IntersectTest$PostMethod@73b7628d 花了 23.2110252 计数=4374000
IntersectTest$MyMethod1@ca75255 花了 5.5877838 count=4374000
IntersectTest$MyMethod2@3d0e50f0 花了 13.5902641 count=4374000
运行 10x100 的测试
IntersectTest$PostMethod@6d6bbba9 花了 25.1999918 计数=4227000
IntersectTest$MyMethod1@3bed8c5e 花了 5.7879144 计数=4227000
IntersectTest$MyMethod2@689a8e0e 花了 13.9617882 count=4227000
运行 10x1000 的测试
IntersectTest$PostMethod@3da3b0a2 花了 25.1627329 count=4222000
IntersectTest$MyMethod1@45a17b4b 花了 5.8319523 count=4222000
IntersectTest$MyMethod2@6ca59ca3 花了 13.8885479 count=4222000
运行 100x1 的测试
IntersectTest$PostMethod@360202a5 花了 9.5115367 count=705000
IntersectTest$MyMethod1@3dfbba56 花了 2.3470254 count=705000
IntersectTest$MyMethod2@598683e4 花了 4.8955489 count=705000
运行 100x10 的测试
IntersectTest$PostMethod@21426d0d 耗时 25.8234298 count=4231000
IntersectTest$MyMethod1@1005818a 花了 5.8832067 count=4231000
IntersectTest$MyMethod2@597b933d 花了 13.3676148 计数=4231000
运行 100x100 的测试
IntersectTest$PostMethod@6d59b81a 花了 193.676662 count=41015000
IntersectTest$MyMethod1@1d45eb0c 花了 44.6519088 count=41015000
IntersectTest$MyMethod2@594a6fd7 花了 119.1646115 count=41015000
运行 100x1000 的测试
IntersectTest$PostMethod@6d57d9ac 花了 200.1651432 count=40803000
IntersectTest$MyMethod1@2293e349 花了 45.5311168 count=40803000
IntersectTest$MyMethod2@1b2edf5b 花了 120.1697135 count=40803000
这是丑陋的(可能有缺陷的)微基准:
import java.util.*;
public class IntersectTest {
static Random rng = new Random();
static abstract class RunIt {
public long count;
public long nsTime;
abstract int Run (Set<Long> s1, Set<Long> s2);
}
// As presented in the post
static class PostMethod extends RunIt {
public int Run(Set<Long> set1, Set<Long> set2) {
boolean set1IsLarger = set1.size() > set2.size();
Set<Long> cloneSet = new HashSet<Long>(set1IsLarger ? set2 : set1);
cloneSet.retainAll(set1IsLarger ? set1 : set2);
return cloneSet.size();
}
}
// No intermediate HashSet
static class MyMethod1 extends RunIt {
public int Run (Set<Long> set1, Set<Long> set2) {
Set<Long> a;
Set<Long> b;
if (set1.size() <= set2.size()) {
a = set1;
b = set2;
} else {
a = set2;
b = set1;
}
int count = 0;
for (Long e : a) {
if (b.contains(e)) {
count++;
}
}
return count;
}
}
// With intermediate HashSet
static class MyMethod2 extends RunIt {
public int Run (Set<Long> set1, Set<Long> set2) {
Set<Long> a;
Set<Long> b;
Set<Long> res = new HashSet<Long>();
if (set1.size() <= set2.size()) {
a = set1;
b = set2;
} else {
a = set2;
b = set1;
}
for (Long e : a) {
if (b.contains(e)) {
res.add(e);
}
}
return res.size();
}
}
static Set<Long> makeSet (int count, float load) {
Set<Long> s = new HashSet<Long>();
for (int i = 0; i < count; i++) {
s.add((long)rng.nextInt(Math.max(1, (int)(count * load))));
}
return s;
}
// really crummy ubench stuff
public static void main(String[] args) {
int[][] bounds = {
{1, 1},
{1, 10},
{1, 100},
{1, 1000},
{10, 2},
{10, 10},
{10, 100},
{10, 1000},
{100, 1},
{100, 10},
{100, 100},
{100, 1000},
};
int totalReps = 4;
int cycleReps = 1000;
int subReps = 1000;
float load = 0.8f;
for (int tc = 0; tc < totalReps; tc++) {
for (int[] bound : bounds) {
int set1size = bound[0];
int set2size = bound[1];
System.out.println("Running tests for " + set1size + "x" + set2size);
ArrayList<RunIt> allRuns = new ArrayList<RunIt>(
Arrays.asList(
new PostMethod(),
new MyMethod1(),
new MyMethod2()));
for (int r = 0; r < cycleReps; r++) {
ArrayList<RunIt> runs = new ArrayList<RunIt>(allRuns);
Set<Long> set1 = makeSet(set1size, load);
Set<Long> set2 = makeSet(set2size, load);
while (runs.size() > 0) {
int runIdx = rng.nextInt(runs.size());
RunIt run = runs.remove(runIdx);
long start = System.nanoTime();
int count = 0;
for (int s = 0; s < subReps; s++) {
count += run.Run(set1, set2);
}
long time = System.nanoTime() - start;
run.nsTime += time;
run.count += count;
}
}
for (RunIt run : allRuns) {
double sec = run.nsTime / (10e6);
System.out.println(run + " took " + sec + " count=" + run.count);
}
}
}
}
}