使用 FParsec 解析 int 或 float答案

【问题标题】：Parsing int or float with FParsec使用 FParsec 解析 int 或 float
【发布时间】：2016-02-03 10:11:45
【问题描述】：

我正在尝试使用 FParsec 解析一个文件，该文件由 float 或 int 值组成。我面临着两个我找不到好的解决方案的问题。

pint32 和 pfloat 都将成功解析相同的字符串，但给出不同的答案，例如 pint32 在解析字符串时将返回 3 "3.0" 和 pfloat 在解析时将返回 3.0相同的字符串。是否可以尝试使用pint32 解析浮点值并在字符串为"3.0" 时失败？

换句话说，有没有办法让下面的代码工作：

let parseFloatOrInt lines =
    let rec loop intvalues floatvalues lines =
        match lines with
        | [] -> floatvalues, intvalues
        | line::rest ->
            match run floatWs line with
            | Success (r, _, _) -> loop intvalues (r::floatvalues) rest
            | Failure _ -> 
                match run intWs line with
                | Success (r, _, _) -> loop (r::intvalues) floatvalues rest
                | Failure _ -> loop intvalues floatvalues rest

    loop [] [] lines

这段代码会正确地将所有浮点值放入floatvalues列表中，但是由于pfloat在解析字符串"3.0"时返回"3.0"，所以所有整数值也会放入floatvalues中列表。

上面的代码示例对我来说似乎有点笨拙，所以我猜一定有更好的方法来做到这一点。我考虑使用choice 组合它们，但是两个解析器必须返回相同的类型才能工作。我想我可以创建一个区分联合，其中一个选项用于 float，一个选项用于 int，并使用 |>> 运算符转换来自 pint32 和 pfloat 的输出。但是，我想知道是否有更好的解决方案？

【问题讨论】：

标签： f# fparsec

【解决方案1】：

您在考虑定义域数据和分离解析器的定义及其在源数据上的用法方面是正确的。这似乎是一个不错的方法，因为随着您的实际项目进一步发展，您可能需要更多的数据类型。

我会这样写：

/// The resulting type, or DSL
type MyData =
    | IntValue of int
    | FloatValue of float
    | Error  // special case for all parse failures

// Then, let's define individual parsers:
let pMyInt =
    pint32
    |>> IntValue

// this is an alternative version of float parser.
// it ensures that the value has non-zero fractional part.
// caveat: the naive approach would treat values like 42.0 as integer
let pMyFloat =
    pfloat
    >>= (fun x -> if x % 1 = 0 then fail "Not a float" else preturn (FloatValue x))
let pError =
    // this parser must consume some input,
    // otherwise combined with `many` it would hang in a dead loop
    skipAnyChar
    >>. preturn Error

 // Now, the combined parser:
let pCombined =
    [ pMyFloat; pMyInt; pError ]    // note, future parsers will be added here;
                                    // mind the order as float supersedes the int,
                                    // and Error must be the last
    |> List.map (fun p -> p .>> ws) // I'm too lazy to add whitespase skipping
                                    // into each individual parser
    |> List.map attempt             // each parser is optional
    |> choice                       // on each iteration, one of the parsers must succeed
    |> many                         // a loop

请注意，上面的代码可以处理任何来源：字符串、流或其他。您的真实应用可能需要处理文件，但只需使用 string list 即可简化单元测试。

// Now, applying the parser somewhere in the code:
let maybeParseResult =
    match run pCombined myStringData with
    | Success(result, _, _) -> Some result
    | Failure(_, _, _)      -> None // or anything that indicates general parse failure

UPD。我已经根据 cmets 编辑了代码。 pMyFloat 已更新以确保解析的值具有非零小数部分。

【讨论】：

顺便说一下@bytebuster，我在测试字符串上运行pCombined 时遇到了一些麻烦，它给出了错误消息The combinator 'many' was applied to a parser that succeeds without consuming input and without changing the parser state in any other way.。 eof需要处理吗？
太好了，谢谢@bytebuster！还有一件事，我在删除many 组合器之后尝试在字符串"2 " 上运行解析器，然后我遇到了与上面描述的相同的问题，即它作为浮点数返回，因为浮点解析器来了在 int 解析器之前。关于如何补救的任何想法？
@Chepe，请检查更新版本。希望它能解决这两个问题。
像魅力一样工作！非常感谢！
仅供参考；回溯（即attempt）可能是一项昂贵的操作，这就是我相信 FParsec 选择明确回溯的原因。