【发布时间】:2021-01-15 05:29:47
【问题描述】:
我有一个名为 resultList 的大型 HashSet 列表(大约百万条记录)。
我需要在包含 10.000 条记录的字典列表中找到匹配项。没有必要匹配。
在 12 线程 CPU 上,这大约需要 40-50 秒。
我不断将新数据加载到sampleList 并将它们与resultList 列表进行比较。
我的问题是,这可以做得更快或更优雅吗?
这是我的代码:
HashSet<string> resultList = new HashSet<string>()
{
"0000000000000000000000000000000000000000",
"0000000000000000000000000000000000000001",
"0000000000000000000000000000000000000002",
"0000000000000000000000000000000000000003",
"0000000000000000000000000000000000000004",
"0000000000000000000000000000000000000005"
//... this list is about million records
};
Dictionary<string, string> sampleList = new Dictionary<string, string>()
{
{ "0000000003000000300000000000000000000005", "This is a value" },
{ "0000000000100000000000002000000800000001", "This is a value 1" },
{ "0000000000000000000000000000000000000004", "This is what I'm trying to match" },
{ "0000000200000000100000000000000000000000", "This is a value 2" },
{ "0000005000000000000000000050000000000004", "This is a value 3" },
{ "0000000080000000000200000000000000000004", "This is a value 4" },
{ "0000000000200000000000000000800000000004", "This is a value 5" }
//... this list is about 10.000 records
};
//first try to find any match - found that Any is faster than Where and the chance to find a match is little, so...
if (resultList.AsParallel().WithDegreeOfParallelism(MaxDegreeOfParallelism).Any(x => sampleList.Any(y => x == y.Key)))
{
//then if there is a match, fetch it.
foreach (var found in resultList.AsParallel().WithDegreeOfParallelism(MaxDegreeOfParallelism).Where(x => sampleList.Any(y => x == y.Key)))
{
//do something with the found matches
}
}
【问题讨论】:
-
你试过
Intersect吗? -
什么是密钥格式?
-
我会迭代字典,因为它的值较少,而是在哈希集中搜索匹配的键。
-
它们真的是字符串值吗?
-
我不确定我是否理解这个问题。无论如何,乔纳森给了你一个很好的答案——你为什么不试试呢?
标签: c# list dictionary search