分析 F# 以执行递归函数答案

【问题标题】：Profiling F# for performance of recursive functions分析 F# 以执行递归函数
【发布时间】：2018-12-01 23:03:05
【问题描述】：

我决定用 F# 来解决 Advent of Code 2018 的第一天的第二个问题（执行循环求和并找到第一个重复求和），但表现力不足，我找不到减速的原因。

在 Python 3 中解决的问题

For a given input 总计约 140,000 次，此代码在几秒钟内执行。

data = list(map(int, '''
+1
-1
'''.strip().splitlines()))
from itertools import cycle, accumulate
class superset(set):
    def add(self, other):
        super().add(other)
        return other

def mapwhile(func, pred, iterable):
    for i in iterable:
        if not pred(i):
            yield func(i)
            return
        yield func(i)

def last(iterable):
    return list(iterable)[-1]

s = superset([0])
print(last(mapwhile(s.add, lambda x: x not in s, accumulate(cycle(data)))))

用 F# 解决问题

我在匹配表达式上添加了一个条件断点，以每千分之一计时一次i，看来这段代码执行约 100 和/秒，即使在一小时后也无法解决。以荒谬的数量级急剧放缓。

let input = @"
+1
-1
"
let cycle xs = seq { while true do yield! xs }
let accumusum xs = Seq.scan(fun acc elem -> acc + elem) 0 xs

let rec findfreqcycle i (s:int Set) (data:int seq) = 
    let head, tail = Seq.head data, Seq.tail data
    match s.Contains(head) with
    | false -> findfreqcycle (i+1) (s.Add(head)) (tail)
    | true ->  head


let data = input.Trim().Split('\n') |> Seq.map int |> Seq.toList |> cycle
accumusum data |> findfreqcycle 0 Set.empty

据我所知，每个代码示例背后的核心思想或多或少是相同的。输入只被急切地解析一次，生成器函数/序列懒惰地重复每个数字。

唯一的区别是，在 F# 示例中，实际查找第一个重复求和的函数是递归的。内存分析表明内存使用几乎恒定，并且尾递归处于活动状态。

我可能做错了什么，如何更好地分析这些递归和生成函数的性能？

【问题讨论】：

这很可能完全取决于 Seq.tail 的使用——它不会像人们最初期望的那样工作:( 请参阅here、here 和here 了解更多信息。
听起来我的代码示例是垃圾收集的热点——yield!、Seq.head、Seq.tail 都会产生新的 IEnumerables。我什至不知道我打算如何解决这个问题。至少yield! 不在热循环中。希望该套装也不会造成问题。

标签： f# profiling

【解决方案1】：

正如 cmets 中所提到的，Seq.tail 的效率非常低，尤其是如果您按照自己的方式在循环中使用它。原因是它创建了一个新序列，该序列迭代原始序列并跳过第一个元素（因此，在 1000 次迭代之后，您必须遍历 1000 多个序列，每个序列都跳过一个元素）。

如果您使用列表，带有头尾的模式效果会更好，因为函数式列表就是为这种处理而设计的。在您的情况下，您可以执行以下操作（遵循与原始函数相同的模式）：

let rec findfreqcycle sum (s:int Set) input data = 
    match data with 
    | x::xs when s.Contains (sum + x) -> (sum + x)
    | x::xs -> findfreqcycle (sum + x) (s.Add (sum + x)) input xs
    | [] ->  findfreqcycle sum s input input

let data = input.Trim().Split('\n') |> Seq.map int |> Seq.toList 
findfreqcycle 0 Set.empty data data

我对其进行了更改，以便它使用模式匹配（在列表上）。我还更改了代码，使其采用有限列表，当它到达末尾时，它会重新开始。因此，它还会即时汇总数字（而不是使用 Seq.scan - 这在此处不起作用，因为我没有使用无限列表）。

在 Pastebin 的输入中，我在大约 0.17 秒内得到结果 448。

【讨论】：

这有点违反直觉和令人失望，因为当您远离无限序列时它会失去一点表现力，但替代方案太慢而无法原谅。
@I'llEatMyHat 如果您使用scan 而不是递归函数来实现findfreqcycle，您可以对无限序列执行相同的操作-也可以这样做-但我想我会按照您的初始方向。您可以改用Seq.scan 和Seq.pick，但状态处理不是那么优雅（至少我认为）。

【解决方案2】：

我决定根据 Tomas 的回答尝试使用 Seq.scan 和 Seq.pick 实现，并得到了这个结果。他是对的，这不是很好。从好的方面来说，它在 ~0.3 秒内执行。

let cycle xs = seq { while true do yield! xs }    
let accumusum xs = Seq.scan(fun acc elem -> acc + elem) 0 xs

let tryfind (sum, s:int Set) =
    match s.Contains(sum) with
    | true -> Some(sum)
    | false -> None

let scanstate (sum, s:int Set) el =
    el, s.Add(sum)

let findfreqcycle (data:int seq) =
    let seen = Seq.scan scanstate (Seq.head data, Set.empty) (Seq.tail data)
    Seq.pick tryfind seen

let data = cycle <| (input.Trim().Split('\n') |> Seq.map int |> Seq.toList)
accumusum data |> findfreqcycle

【讨论】：

不错！很高兴看到使用这种方法的完整解决方案:-)。

【解决方案3】：

OP 已经有一个公认的答案，但我想我提出了一些变体。

任务要求在输入值上运行聚合（集合），同时仍然允许在集合处于我们无法向其添加数字的状态时提前退出，因为我们已经看到了它。

通常我们fold 来聚合一个状态，但fold 不允许我们提前退出。这就是为什么建议使用scan 的原因，它是一个流式fold + pick，允许提前退出。

另一种方法是编写一个fold，它允许在达到状态后进行快捷方式：val foldAndCheck: (a' -> 'b -> CheckResult<'a, 'c>) -> 'a -> 'b seq -> 'c option。 fold 就像一个 for 循环，聚合所有值，foldAndCheck 就像一个 for 循环，将值聚合到一个点，然后返回一个结果。

它可能看起来像：

type [<Struct>] CheckResult<'T, 'U> =
  | Continue of c:'T
  | Done     of d:'U

// val foldAndCheck: (a' -> 'b -> CheckResult<'a, 'c>) -> 'a -> 'b seq -> 'c option
let foldAndCheck f z (s : _ seq) =
  let f = OptimizedClosures.FSharpFunc<_, _, _>.Adapt f
  use e = s.GetEnumerator ()
  let rec loop s =
    if e.MoveNext () then
      match f.Invoke (s, e.Current) with
      | Continue ss -> loop ss
      | Done     rr -> Some rr 
    else
      None
  loop z

let cycle xs = seq { while true do yield! xs }

let run (input : string) =
  let folder s v = if Set.contains v s then Done v else Continue (Set.add v s)
  input.Trim().Split('\n') 
  |> Seq.map int 
  |> cycle
  |> Seq.scan (+) 0
  |> foldAndCheck folder Set.empty

在我的机器上运行它时，我得到这样的数字：

Result: Some 448
Took  : 280 ms
CC    : (31, 2, 1)

（CC 是 gen 0、1 和 2 中的垃圾收集）

然后我创建了一个 F# 程序，我认为它等同于 Python 程序，因为它使用可变集和 mapWhile 函数：

let addAndReturn (set : HashSet<_>) =
  fun v ->
    set.Add v |> ignore
    v

let mapWhile func pred (s : _ seq) =
  seq {
    // F# for v in s ->
    //  doesn't support short-cutting. So therefore the use while
    use e = s.GetEnumerator ()
    let mutable cont = true
    while cont && e.MoveNext () do
      let v = e.Current
      if not (pred v) then
        cont <- false
        yield func v
      else
        yield func v
  }

let cycle xs = seq { while true do yield! xs }

let accumulate xs = Seq.scan (+) 0 xs

let last xs = Seq.last xs

let run (input : string) =
  let data = input.Trim().Split('\n') |> Seq.map int 
  let s = HashSet<int> ()

  data
  |> cycle
  |> accumulate
  |> mapWhile (addAndReturn s) (fun x -> s.Contains x |> not)
  |> last

性能数字：

Result: 448
Took  : 50 ms
CC    : (1, 1, 1)

如果我们说我们允许突变 + seq，则解决方案可能如下所示：

let cycle xs = seq { while true do yield! xs }

let run (input : string) =
  let s = HashSet<int> ()

  input.Trim().Split('\n')
  |> Seq.map int 
  |> cycle
  |> Seq.scan (+) 0
  |> Seq.find (fun v -> s.Add v |> not)

运行如下：

Result: 448
Took  : 40 ms
CC    : (1, 1, 1)

还有其他很酷的技巧可用于进一步提高搜索性能，但不值得付出努力，因为此时大部分成本都用于解析整数。

【讨论】：