Java：从哈希集或字符串中搜索 ID答案

【问题标题】：Java : Searching Ids from hashset or StringJava：从哈希集或字符串中搜索 ID
【发布时间】：2014-08-11 02:33:47
【问题描述】：

我有大量可以存储在 HashSet 或 String 中的 ID 即

String strIds=",1,2,3,4,5,6,7,8,.,.,.,.,.,.,.,1000,";
    Or
HashSet<String> setOfids = new HashSet<String>();
setOfids.put("1");
setOfids.put("2");
.
.
.
setOfids.put("1000");

我还想对 ID 执行搜索

我应该使用哪个以获得更好的性能（更快和内存效率）

1) strIds.indexOf("someId");
    or
2) setOfids.contains("someId");

告诉我任何其他方式，我也可以这样做。谢谢你看这里:)

【问题讨论】：

哈希集是要走的路！

标签： java string search collections hashset

【解决方案1】：

Set 会是更好的选择。原因：

如果是Set，搜索将是O(1)。如果是String，它将是O(N)。
性能不会随着数据的增长而下降。
如果您想要执行任何类型的数据操作（添加或删除 ID），字符串将使用更多内存。
indexOf 也可能给你负面结果

说 1000 存在但 100 不存在，因此 indexOf 将返回 1000 的位置，因为 100 是 1000 的子字符串。

性能的简单 POC 代码：

import java.util.HashSet;
import java.util.Set;

public class TimeComputationTest {

  public static void main(String[] args) {
    String strIds = null;
    Set<String> setOfids = new HashSet<String>();
    StringBuffer sb = new StringBuffer();

    for (int i = 1;i <= 1000;i++) {
      setOfids.add(String.valueOf(i));
      if (sb.length() != 0) {
        sb.append(",");
      }
      sb.append(i);
    }
    strIds = sb.toString();

    testTime(strIds, setOfids, "1");
    testTime(strIds, setOfids, "100");
    testTime(strIds, setOfids, "500");
    testTime(strIds, setOfids, "1000");
  }

  private static void testTime(String strIds, Set<String> setOfids, String string) {
    long startTime = System.nanoTime();
    strIds.indexOf(string);
    long endTime = System.nanoTime();

    System.out.println("String search time for (" + string + ") is " + (endTime - startTime));

    startTime = System.nanoTime();
    setOfids.contains(string);
    endTime = System.nanoTime();

    System.out.println("HashSet search time for (" + string + ") is " + (endTime - startTime));
  }
}

输出将是（大约）：

String search time for (1) is 3000
HashSet search time for (1) is 7000
String search time for (100) is 6000
HashSet search time for (100) is 2000
String search time for (500) is 33000
HashSet search time for (500) is 2000
String search time for (1000) is 71000
HashSet search time for (1000) is 1000

【讨论】：

【解决方案2】：

我认为HashSet 是更好的选择。有两个好处：

不允许重复
HashSet 内部假定为 HashMap，因此检索速度更快。

【讨论】：

【解决方案3】：

它会工作得更快:::

String strIds=",1,2,3,4,5,6,7,8,.,.,.,.,.,.,.,1000,";
String searchStr = "9";
boolean searchFound = strIds.contains(","+searchStr +",");

【讨论】：

【解决方案4】：

除了表演，你不应该使用这样的字符串。虽然它很有创意，但它不是为这样的索引而设计的。如果要更改 id 的格式会怎样？

为了提高hashSet的性能和节省内存，你当然可以使用

HashSet<Integer> instead of HashSet<String>

【讨论】：

【解决方案5】：

哈希表查找是“恒定时间”，即它不会随着 id 的数量而增长。

但是一个字符串中所有 id 的紧凑字符串需要最少的内存。

所以，下定决心：最快的检索或最少的存储空间！

【讨论】：