根据谓词将列表拆分为列表列表答案

【问题标题】：Splitting a list into list of lists based on predicate根据谓词将列表拆分为列表列表
【发布时间】：2011-01-05 11:26:33
【问题描述】：

（我知道this question，但它与序列有关，这不是我的问题）

给定这个输入（例如）：

let testlist = 
    [  
       "*text1";
       "*text2";
       "text3";
       "text4";
       "*text5";
       "*text6";
       "*text7"
    ]

let pred (s:string) = s.StartsWith("*")

我希望能够调用 MyFunc pred testlist 并获得以下输出：

[
    ["*text1";"*text2"];
    ["*text5";"*text6";"*text7"]
]

这是我目前的解决方案，但我不太喜欢嵌套的 List.revs（忽略它需要 Seq 作为输入的事实）

let shunt pred sq =
    let shunter (prevpick, acc) (pick, a) = 
        match pick, prevpick with
        | (true, true)  -> (true, (a :: (List.hd acc)) :: (List.tl acc))
        | (false, _)    -> (false, acc)
        | (true, _)     -> (true, [a] :: acc)

    sq 
        |> Seq.map (fun a -> (pred a, a))
        |> Seq.fold shunter (false, []) 
        |> snd
        |> List.map List.rev
        |> List.rev

【问题讨论】：

迟来的后记：这个问题的标题措辞不好，所以一些答案更适合this问题。

标签： f#

【解决方案1】：

在 F# 核心库中有一个 List.partition 函数（如果您想实现它只是为了让它工作而不是学习如何自己编写递归函数）。使用这个函数，你可以这样写：

> testlist |> List.partition (fun s -> s.StartsWith("*"))
val it : string list * string list =
  (["*text1"; "*text2"; "*text5"; "*text6"; "*text7"], ["text3"; "text4"])

请注意，此函数返回一个元组而不是返回一个列表列表。这与您想要的有点不同，但如果谓词只返回真或假，那么这更有意义。

返回元组的partition函数的实现也稍微简单一些，所以可能对学习有用：

let partition pred list = 
  // Helper function, which keeps results collected so
  // far in 'accumulator' arguments outTrue and outFalse
  let rec partitionAux list outTrue outFalse =
    match list with 
    | [] -> 
        // We need to reverse the results (as we collected
        // them in the opposite order!)
        List.rev outTrue, List.rev outFalse
    // Append element to one of the lists, depending on 'pred'
    | x::xs when pred x -> partitionAux xs (x::outTrue) outFalse
    | x::xs -> partitionAux xs outTrue (x::outFalse)

  // Run the helper function
  partitionAux list [] []

【讨论】：

不，抱歉，我想将 pred 对应的每组行分开，并丢弃其他行。

【解决方案2】：

编辑：下面添加了使用 foldBack 的无版本版本。

下面是一些使用列表和尾递归的代码：

//divides a list L into chunks for which all elements match pred
let divide pred L =
    let rec aux buf acc L =
        match L,buf with
        //no more input and an empty buffer -> return acc
        | [],[] -> List.rev acc 
        //no more input and a non-empty buffer -> return acc + rest of buffer
        | [],buf -> List.rev (List.rev buf :: acc) 
        //found something that matches pred: put it in the buffer and go to next in list
        | h::t,buf when pred h -> aux (h::buf) acc t
        //found something that doesn't match pred. Continue but don't add an empty buffer to acc
        | h::t,[] -> aux [] acc t
        //found input that doesn't match pred. Add buffer to acc and continue with an empty buffer
        | h::t,buf -> aux [] (List.rev buf :: acc) t
    aux [] [] L

用法：

> divide pred testlist;;
val it : string list list =
  [["*text1"; "*text2"]; ["*text5"; "*text6"; "*text7"]]

使用列表作为缓冲区的数据结构意味着在输出内容时总是需要反转它。如果单个块的大小适中，这可能不是问题。如果速度/效率成为问题，您可以使用 Queue<'a> 或 `List' 作为缓冲区，因为它们的追加速度很快。但是使用这些数据结构而不是列表也意味着你失去了强大的列表模式匹配。在我看来，能够对列表进行模式匹配比调用几个 List.rev 更重要。

这是一次输出一个块的结果的流式版本。这避免了上例中累加器上的 List.rev：

let dividestream pred L =
    let rec aux buf L =
        seq { match L, buf with
              | [],[] -> ()
              | [],buf -> yield List.rev buf
              | h::t,buf when pred h -> yield! aux (h::buf) t
              | h::t,[] -> yield! aux [] t
              | h::t,buf -> yield List.rev buf
                            yield! aux [] t }
    aux [] L

这个流版本避免了累加器上的List.rev。使用List.foldBack 也可用于避免反转累积的块。

更新：这是一个使用 foldBack 的版本

//divides a list L into chunks for which all elements match pred
let divide2 pred L =
    let f x (acc,buf) =
        match pred x,buf with
        | true,buf -> (acc,x::buf)
        | false,[] -> (acc,[])
        | false,buf -> (buf::acc,[])

    let rest,remainingBuffer = List.foldBack f L ([],[])
    match remainingBuffer with
    | [] -> rest
    | buf -> buf :: rest

【讨论】：

对不起，我的错误，它实际上应该只是一个字符串列表，这是否简化了您的答案？
它消除了我采取的额外展平步骤。分割（流）算法保持不变。我将编辑我的答案。
很好，两个答案合二为一，它们都“更快”（基于我的最小并排）、功能性和“直通式”。认为我更喜欢 divide2，短，甜，没有堆栈溢出的风险（我相信）。
忍不住夸赞divide2。我在火车上，忍不住笑了起来，太美了。谢谢cfern的喜悦。

【解决方案3】：

只需将列表反转一次，然后轻松按顺序构建结构：

let Shunt p l =
    let mutable r = List.rev l
    let mutable result = []
    while not r.IsEmpty do
        let mutable thisBatch = []
        while not r.IsEmpty && not(p r.Head) do
            r <- r.Tail 
        while not r.IsEmpty && p r.Head do
            thisBatch <- r.Head :: thisBatch
            r <- r.Tail
        if not thisBatch.IsEmpty then
            result <- thisBatch :: result
    result

外部while 处理每个“批次”，第一个内部while 跳过任何与谓词不匹配的内容，然后是另一个while，它会抓取所有匹配的内容并将它们存储在当前批次。如果这批中有任何内容（最后一个可能为空），请将其添加到最终结果中。

这是一个我认为本地命令式代码优于纯函数式代码的示例。上面的代码很容易编写和推理。

【讨论】：

我是函数式代码的新手，所以我想问一下这个命令式代码如何优于 Petricek 的函数式版本？
好吧，我的解决方案可以满足您的要求/要求，而他的则没有。
我认为这一定是我提出问题的方式，我有几个类似的“分组依据”答案没有回答我的问题。对您的命令式解决方案感到惊讶，不是因为您错了，而是因为我已经习惯于尝试“F# 方式”，以至于我什至没有想到要朝那个方向看。
顺便说一句，鉴于 cmets 在那里被禁用，我将利用这个空间来为你博客上的单子解析器组合器系列竖起大拇指 (lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!123.entry)，我刚刚“完成” C#部分。你说 LukeH 的版本更好，但你的版本更容易理解。你得到了正确的波特率，再一次:)

【解决方案4】：

shunt的另一个版本：

let shunt pred lst =
    let rec tWhile pred lst = 
        match lst with
        | []                    -> [], []
        | hd :: tl when pred hd -> let taken, rest = tWhile pred tl
                                   (hd :: taken), rest
        | lst                   -> [], lst
    let rec collect = function
        | []  -> []
        | lst -> let taken, rest = tWhile pred lst
                 taken :: (collect (snd (tWhile (fun x -> not (pred x)) rest)))
    collect lst

这个避免了List.rev，但它不是尾递归 - 所以只适用于小列表。

【讨论】：

【解决方案5】：

还有一个……

let partition pred lst = 
    let rec trec xs cont =
        match xs with
        | []               -> ([],[]) |> cont
        | h::t when pred h -> (fun (y,n) -> h::y,n) >> cont |> trec t
        | h::t             -> (fun (y,n) -> y,h::n) >> cont |> trec t
    trec lst id

然后我们可以定义分流：

let shunt pred lst = lst |> partition pred |> (fun (x,y) -> [x;y])

【讨论】：

不，抱歉，我想将 pred 对应的每组行分开，并丢弃其他行。