Scala Seq vs List vs MutableList 性能（foldLeft）答案

【问题标题】：Scala Seq vs List vs MutableList performance (foldLeft)Scala Seq vs List vs MutableList 性能（foldLeft）
【发布时间】：2021-01-13 09:15:33
【问题描述】：

我想通过一些集合使用内存中的函数对 foldLeft 进行一些优化考虑以下代码：


val buffer = List.fill(10000)(Random.nextInt(10))

def `with list appending`() = buffer.foldLeft(List[Int](), List[Int]()) { case (_@(even, odd), currNum) => {
  if (currNum % 2 == 0) (even :+ currNum, odd)
  else (even, odd :+ currNum)
}
}

def `with list pre appending` = {
  val (evenList,oddList) = buffer.foldLeft(List[Int](), List[Int]()) { case (_@(even, odd), currNum) => {
    if (currNum % 2 == 0) (currNum :: even, odd)
    else (even, currNum :: odd)
  }
  }
  (evenList.reverse, oddList.reverse)
}

def `with seq appending` = buffer.foldLeft(Seq[Int](), Seq[Int]()) { case (_@(even, odd), currNum) => {
  if (currNum % 2 == 0) (even :+ currNum, odd)
  else (even, odd :+ currNum)
}
}

def `with mutable list appending` = buffer.foldLeft(mutable.MutableList[Int](), mutable.MutableList[Int]()) { case (_@(even, odd), currNum) => {
  if (currNum % 2 == 0) (even :+ currNum, odd)
  else (even, odd :+ currNum)
}
}

每次将该结果与List 聚合时，是否会复制整个集合？
Seq 上的 :+ 是复制到新的 Seq 还是最后只是附加的元素？可能 - O(1)？
List 上的 :+ 是复制到新的 List 还是只是在最后添加元素？可能 - O(n)？
mutableList 是否比使用 List 的 foldLeft 更快，因为每个聚合都没有复制？
您是否建议以其他方式 foldLeft 以获得更好的性能？谢谢！

【问题讨论】：

Seq 是一种抽象类型，因此性能将取决于具体类型。
Scala: Mutable vs. Immutable Object Performance - OutOfMemoryError
非常感谢，所以不是 List 与 MutableList，还是有其他更快的方法？
如果问题是列表很大，可以考虑使用流式解决方案。

标签： scala performance

【解决方案1】：

.foldLeft 使用 while 循环或尾递归实现 - 因此其性能取决于迭代速度。

List 在附加 (:+) 时复制所有内容，并在附加 (O(1)) 时创建一个 cons 实例。

Seq 可能就是一切，因此您无法保证性能。

对于构建List ListBuffer 是一个不错的选择，它应该类似于可变列表。

使用.foldLeft 构建任何类型的List 几乎总是一个坏主意，它几乎总是可以用.map、.filter、.groupBy、.flatMap 替换，如果你想避免中间表示，那么.view 也会很方便。

你所有的例子都可以替换为

buffer.partition(_ % 2 == 0)

我希望它在最坏的情况下具有与手动功能相当的性能。

如果示例更复杂，我会建议

使用特定的集合（从不Seq）
检查是否有一些内置功能可以满足您的需求，因为它可能已经过优化，可以快速完成所说的操作
使用 Java Microbenchmark Harness

【讨论】：

关于分区，在我的其他应用程序中，情况有点困难，有更多元组，所以它不是一个适合我的产品的好解决方案 - 我在评论中更新了 ListBuffer 的时间之前，它的运行速度仍然较慢 - 我错过了什么？

【解决方案2】：

首先，您需要read the documentation，它会告诉您附加到列表是线性时间，因此很慢。

那么您需要记住 JVM 有一个 JIT 编译器，因此如果不先预热代码，您将无法获得有用的性能数据。

最后，如果性能真的那么关键，那么请为此编写自己的递归例程，而不是使用库方法，这样可以避免开销。

def recursive(buffer: List[Int]) = {
  @annotation.tailrec
  def loop(rem: List[Int], even: List[Int], odd: List[Int]): (List[Int], List[Int]) =
    rem match {
      case Nil =>
        (even.reverse, odd.reverse)
      case i :: tail =>
        if (i % 2 == 0) {
          loop(tail, i :: even, odd)
        } else {
          loop(tail, even, i :: odd)
        }
    }

  loop(buffer, Nil, Nil)
}

【讨论】：

【解决方案3】：

更新：我明白了

import scala.collection.mutable
import scala.collection.mutable.ListBuffer
import scala.util.Random
def measureTime[T](block: => T) = {
  val t0 = System.currentTimeMillis
  block
  val t1 = System.currentTimeMillis
  Console println (s"Operation took ${t1 - t0} mills")
}

val buffer = List.fill(10000)(Random.nextInt(10))

def `with list appending`() = buffer.foldLeft(List[Int](), List[Int]()) { case (_@(even, odd), currNum) => {
  if (currNum % 2 == 0) (even :+ currNum, odd)
  else (even, odd :+ currNum)
}
}

def `with list pre appending` = {
  val (evenList,oddList) = buffer.foldLeft(List[Int](), List[Int]()) { case (_@(even, odd), currNum) => {
    if (currNum % 2 == 0) (currNum :: even, odd)
    else (even, currNum :: odd)
  }
  }
  (evenList.reverse, oddList.reverse)
}


def `with mutable list appending` = buffer.foldLeft(mutable.MutableList[Int](), mutable.MutableList[Int]()) { case (_@(even, odd), currNum) => {
  if (currNum % 2 == 0) (even :+ currNum, odd)
  else (even, odd :+ currNum)
}
}

def `with list buffer` =  buffer.foldLeft(ListBuffer[Int](), ListBuffer[Int]()) { case (_@(even, odd), currNum) => {
  if (currNum % 2 == 0) (even += currNum, odd)
  else (even, odd += currNum)
}
}

def `with partition` = buffer.partition(_ % 2 == 0)



measureTime(`with mutable list appending`)
measureTime(`with list appending`)
measureTime(`with list pre appending`)
measureTime(`with list buffer`)
measureTime(`with partition`)

输出：

Operation took 4356 mills
Operation took 2375 mills
Operation took 7 mills
Operation took 5 mills
Operation took 9 mills

那么list pre appending 的结论呢？它比可变列表/列表缓冲区快多少？ 问题：每次 foldLeft 发生时列表都没有覆盖整个列表？

【讨论】：

如果您关心性能，您需要阅读性能文档。 this article 给出了不同集合和操作的相对性能，从中可以看出为什么会得到这些数字。
measureTime 不会给你可靠的结果；在 JVM 上进行基准测试并不容易。如果你关心你得到的数字是否意味着什么，你应该使用 JMH。