展平 Dictionary<int, List<object>>答案

【问题标题】：Flatten a Dictionary<int, List<object>>展平 Dictionary<int, List<object>>
【发布时间】：2018-10-14 18:02:22
【问题描述】：

我有一个字典，它有一个表示年份的整数键和一个值，它是对象Channel 的列表。我需要展平数据并从中创建一个新对象。

目前，我的代码如下所示：

Dictionary<int, List<Channel>> myDictionary;

foreach(var x in myDictionary)
{
    var result = (from a in x.Value
                  from b in anotherList
                  where a.ChannelId == b.ChannelId
                  select new NewObject
                  {
                      NewObjectYear = x.Key,
                      NewObjectName = a.First().ChannelName,
                  }).ToList();
    list.AddRange(result);
}

请注意，我使用Key 作为属性NewObjectYear 的值。我想摆脱 foreach 因为字典包含大量数据并且在迭代中进行一些连接会使其非常慢。所以我决定重构并想出了这个：

var flatten = myDictionary.SelectMany(x => x.Value.Select(y => 
                  new KeyValuePair<int, Channel>(x.Key, y))).ToList();

但是有了这个，我无法直接获得Key。使用flatten.Select(x => x.Key) 之类的东西绝对不是正确的方法。所以我尝试寻找其他有利于我的场景的扁平化方法，但失败了。我还考虑过创建一个包含年份和扁平列表的类，但我不知道如何。请帮我解决这个问题。

另外，还有其他不需要创建新类的方法吗？

【问题讨论】：

为什么你的意思是你'不能直接得到钥匙' - 你所做的看起来很好。
"...绝对不是正确的方式" 那么是什么是正确的方式呢？
我不明白。在 from a in x.Value 中，a 是 Channel，而不是 List<Channel>。确实你有a.ChannelId，但是你以后怎么能用a.First()呢？

标签： c# linq dictionary key-value

【解决方案1】：

在我看来你只是想进行过滤，你不需要加入：

var anotherListIDs = new HashSet<int>(anotherList.Select(c => c.ChannelId));            

foreach (var x in myDictionary)
{
    list.AddRange(x.Value
        .Where(c => anotherListIDs.Contains(c.ChannelId))
        .Select(c => new NewObject
        {
            NewObjectYear = x.Key,
            NewObjectName = c.First().ChannelName,
        }));
}

【讨论】：

【解决方案2】：

您确实意识到，如果特定字典元素中列表的第二个元素具有匹配的 channelId，那么您会返回此列表的第一个元素，不是吗？

var otherList = new OtherItem[]
{
    new OtherItem() {ChannelId = 1, ...}
}
var dictionary = new Dictionary<int, List<Channel>[]
{
    { 10,                             // Key
      new List<Channel>()             // Value
      {
          new Channel() {ChannelId = 100, Name = "100"},
          new Channel() {ChannelId = 1, Name = "1"},
      },
};

虽然第二个元素具有匹配的 ChannelId，但您返回第一个元素的名称。

无论如何，让我们假设这是您真正想要的。你是对的，你的功能不是很有效。

您的字典实现了IEnumerable<KeyValuePair<int, List<Channel>>。因此，foreach 中的每个x 都是KeyValuePair<int, List<Channel>。每个x.Value 都是List<Channel>。

因此，对于字典中的每个元素（即 KeyValuePair<int, List<Channel>），您获取完整列表，并使用 otherList 执行完整列表的完整内部连接，并获取结果的键KeyValuePair 和 KeyValuePair 中 List 的第一个元素。

即使您可能不使用完整的结果，而只使用第一个或前几个结果，因为 FirstOrDefault() 或 Take(3)，您对字典中每个列表的每个元素都执行此操作。

确实，您的查询会更有效率。

当您在 OtherList 中使用 ChannelIds 只是为了找出它是否存在时，主要的改进之一是将 OtherList 的 ChannelIds 转换为您有优势的 HashSet<int>快速查找以检查 Dictionary 中某个值的 ChannelId 是否在 HashSet 中。

因此，对于字典中的每个元素，您只需检查列表中的每个 ChannelId 以查看其中一个是否在 HashSet 中。找到一个后，您可以停止并仅返回 List 的第一个元素和 Key。

我的解决方案是字典的扩展功能>。见Extension Methods Demystified

public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
     IEnumerable<OtherItem> otherList)
{
    // I'll only use the ChannelIds of the otherList, so extract them
    IEnumerable<int> otherChannelIds = otherList
        .Select(otherItem => otherItem.ChannelId);
    return dictionary.ExtractNewObjects(otherChannelIds);
}

这会调用其他的 ExtractNewobjects：

public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
     IEnumerable<int> otherChannelIds)
{
    var channelIdsSet = new  HashSet<int>(otherChannelIds));
    // duplicate channelIds will be removed automatically

    foreach (KeyValuePair<int, List<Channel>> keyValuePair in dictionary)
    {
        // is any ChannelId in the list also in otherChannelIdsSet?
        // every keyValuePair.Value is a List<Channel>
        // every Channel has a ChannelId
        // channelId found if any of these ChannelIds in in the HashSet
        bool channelIdFound = keyValuePair.Value
           .Any(channel => otherChannelIdsSet.Contains(channel.ChannelId);
        if (channelIdFound)
        {
            yield return new NewObject()
            {
                NewObjectYear = keyValuePair.Key,
                NewObjectName = keyValuePair.Value
                                .Select(channel => channel.ChannelName)
                                .FirstOrDefault(),
            };
        }
    }
}

用法：

IEnumerable<OtherItem> otherList = ...
Dictionary<int, List<Channel>> dictionary = ...

IEnumerable<Newobject> extractedNewObjects = dictionary.ExtractNewObjects(otherList);

var someNewObjects = extractedNewObjects
    .Take(5)      // here we see the benefit from the yield return
    .ToList();

我们可以看到四个效率提升：

使用HashSet<int> 可以非常快速地查找ChannelId 是否在OtherList 中
一旦我们在HashSet 中找到匹配的Channelid，使用Any() 就会停止枚举List<Channel>
yield return 的使用使您不会在字典中枚举比实际使用的更多的元素。
在创建NewObjectName 时使用Select 和FirstOrDefault 可防止List<Channel> 为空时出现异常

【讨论】：

OPs 问题包含a.First().ChannelName，其中a 是一个频道。 Channel 因此实现了IEnumerable<something>，而不是你对待它的IEnumerable<Channel>。从现在开始，恐怕一切都注定了。