Skip（和类似功能，如 Take）的性能答案

【问题标题】：Performance of Skip (and similar functions, like Take)Skip（和类似功能，如 Take）的性能
【发布时间】：2013-11-28 22:59:40
【问题描述】：

刚刚看了.NET Framework的Skip/Take扩展方法的源码（IEnumerable<T>类型），发现内部实现是用GetEnumerator方法：

// .NET framework
    public static IEnumerable<TSource> Skip<TSource>(this IEnumerable<TSource> source, int count)  
    {
        if (source == null) throw Error.ArgumentNull("source"); 
        return SkipIterator<TSource>(source, count); 
    }

    static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count) 
    {
        using (IEnumerator<TSource> e = source.GetEnumerator()) 
        {
            while (count > 0 && e.MoveNext()) count--;
            if (count <= 0) 
            { 
                while (e.MoveNext()) yield return e.Current;
            } 
        } 
    }

假设我有一个包含 1000 个元素的 IEnumerable<T>（基础类型是 List<T>）。如果我在做 list.Skip(990).Take(10) 会发生什么？它会在取最后 10 个元素之前迭代 990 个第一个元素吗？（这就是我的理解）。如果是，那我不明白微软为什么没有像这样实现Skip 方法：

    // Not tested... just to show the idea
    public static IEnumerable<T> Skip<T>(this IEnumerable<T> source, int count)
    {
        if (source is IList<T>)
        {
            IList<T> list = (IList<T>)source;
            for (int i = count; i < list.Count; i++)
            {
                yield return list[i];
            }
        }
        else if (source is IList)
        {
            IList list = (IList)source;
            for (int i = count; i < list.Count; i++)
            {
                yield return (T)list[i];
            }
        }
        else
        {
            // .NET framework
            using (IEnumerator<T> e = source.GetEnumerator())
            {
                while (count > 0 && e.MoveNext()) count--;
                if (count <= 0)
                {
                    while (e.MoveNext()) yield return e.Current;
                }
            }
        }
    }

事实上，他们就是为 Count 方法做的...

    // .NET Framework...
    public static int Count<TSource>(this IEnumerable<TSource> source) 
    {
        if (source == null) throw Error.ArgumentNull("source");

        ICollection<TSource> collectionoft = source as ICollection<TSource>; 
        if (collectionoft != null) return collectionoft.Count;

        ICollection collection = source as ICollection; 
        if (collection != null) return collection.Count; 

        int count = 0;
        using (IEnumerator<TSource> e = source.GetEnumerator())
        { 
            checked 
            {
                while (e.MoveNext()) count++;
            }
        } 
        return count;
    }

那是什么原因呢？

【问题讨论】：

我发现最好假设这些方法从未优化过。即使对于 Count()，它也会针对 ICollection<> 进行优化，但不会针对 IReadOnlyCollection<> 进行优化。如果需要优化，请自己编写。
因为他们从不费心添加优化？如果您发现它有帮助，我认为您自己这样做没有任何问题。但请注意，myList.Select(..).Skip(100) 比 myList.Skip(100).Select(..) 慢，即使它们在功能上相同。
另请注意，在 Linq-To-SQL 和 EF 中，Skip 和 Take 被下推到 SQL 查询中，因此它不会遍历前面的项目。（SQL 可能通过表/索引扫描，但 Linq 不会）
在这种情况下，您在 IQueryable<T>（而不是 IEnumerable<T>）上调用 Skip/Take 方法，它具有不同的实现...

标签： c# performance linq ienumerable skip-take

【解决方案1】：

我假设他们想抛出 InvalidOperationException "Collection was modified..." 当底层集合同时在另一个线程中被修改时。你的版本没有这样做。这将产生可怕的结果。

这是 MSFT 在所有非线程安全的集合中在整个 .Net 框架中遵循的标准做法（但有些是例外的）。

【讨论】：

【解决方案2】：

在 Jon Skeet 的重新实现 Linq 的优秀教程中，他（简要地）讨论了这个问题：

虽然这些操作中的大部分都无法进行明智的优化，但它当源实现 IList 时优化 Skip 是有意义的。我们可以跳过跳过，可以这么说，直接进入适当的索引。这不会发现源的情况在迭代之间修改，这可能是它不是的原因之一据我所知，在框架中实现。

这似乎是推迟该优化的合理理由，但我同意对于特定情况，如果您能保证您的源不能/不会被修改，那么进行该优化可能是值得的。

【讨论】：

好吧，我明白了……但在这种情况下，他们本可以使用IReadOnlyList<T>……我猜这个接口还不够用（因此如果@987654324，测试的成本也会增加） @ 是IReadOnlyList<T> 太高了）？
@Bidou，这也是我猜的。对于一个包罗万象的Skip() 实现，检查那个很少使用的接口可能被认为不值得。
出于功能目的，IList 基本上是一个缓存集合。在枚举器迭代之间修改 IList 通常无论如何都会引发异常，所以我不明白为什么不应该进行优化。事实上，如果您接受随机并行副作用修改列表的可能性，那么无论您是否直接跳过，您的结果都是随机的。
我的任务是优化一些需要数小时/数天才能完成执行的代码。当我运行 Visual Studio Profiler 时，我发现很少有需要优化的地方，但 profiler 并没有显示 Skip/Take 性能问题。当我切换到索引并删除 Skip/Take 时，我将性能提升了 1700 倍：代码执行了大约 9.5 小时，现在它只工作了 20 秒，所以如果你需要良好的性能，请不要将 Skip/Take 用于 IList。
Jon Skeet 博文的新家：codeblog.jonskeet.uk/2011/01/02/…

【解决方案3】：

正如 ledbutter 提到的，当 Jon Skeet reimplemented LINQ 时，他提到像你的 Skip 这样的优化“不会发现在迭代之间修改源的情况”。您可以将代码更改为以下内容以检查这种情况。它通过在集合的枚举器上调用MoveNext() 来实现这一点，即使它不使用e.Current，这样如果集合发生变化，该方法也会抛出。

当然，这消除了优化的重要部分：需要创建、部分单步执行和处置枚举器，但它仍然具有您不需要无意义地单步执行第一个 count 的好处对象。您有一个无用的e.Current 可能会令人困惑，因为它指向list[i - count] 而不是list[i]。

public static IEnumerable<T> Skip<T>(this IEnumerable<T> source, int count)
{
    using (IEnumerator<T> e = source.GetEnumerator())
    {
        if (source is IList<T>)
        {
            IList<T> list = (IList<T>)source;
            for (int i = count; i < list.Count; i++)
            {
                e.MoveNext();
                yield return list[i];
            }
        }
        else if (source is IList)
        {
            IList list = (IList)source;
            for (int i = count; i < list.Count; i++)
            {
                e.MoveNext();
                yield return (T)list[i];
            }
        }
        else
        {
            // .NET framework
            while (count > 0 && e.MoveNext()) count--;
            if (count <= 0)
            {
                while (e.MoveNext()) yield return e.Current;
            }
        }
    }
}

【讨论】：