Haskell 读取变量名答案

【问题标题】：Haskell read variable nameHaskell 读取变量名
【发布时间】：2015-09-24 06:31:53
【问题描述】：

我需要编写一个解析某种语言的代码。我被困在解析变量名上——它可以是至少 1 个字符长的任何东西，以小写字母开头，并且可以包含下划线“_”字符。我想我用下面的代码开了个好头：

identToken :: Parser String
identToken = do 
                       c <- letter
                       cs <- letdigs
                       return (c:cs)
             where letter = satisfy isLetter
                   letdigs = munch isLetter +++ munch isDigit +++ munch underscore
                   num = satisfy isDigit
                   underscore = \x -> x == '_'
                   lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?

ident :: Parser Ident
ident = do 
          _ <- skipSpaces
          s <- identToken
          skipSpaces; return $ s

idents :: Parser Command
idents = do 
          skipSpaces; ids <- many1 ident
          ...

然而，这个函数给了我一个奇怪的结果。如果我调用我的测试函数

test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p = 
  case readP_to_S prog p of
    [(j, "")] -> Right j
    [] -> Left InvalidParse
    multipleRes -> Left (AmbiguousIdents multipleRes)
  where
    prog :: Parser [Ident]
    prog = do
      result <- many ident
      eof
      return result

像这样：

test_parseIdents  "test"

我明白了：

Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
    (["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])

请注意，Parser 只是 ReadP a 的同义词。

我还想在解析器中编码变量名应该以小写字符开头。

感谢您的帮助。

【问题讨论】：

标签： parsing haskell

【解决方案1】：

部分问题在于您使用了+++ 运算符。以下代码适用于我：

import Data.Char
import Text.ParserCombinators.ReadP

type Parser a = ReadP a
type Ident = String

identToken :: Parser String
identToken = do c <- satisfy lowerCase
                cs <- letdigs
                return (c:cs)
  where lowerCase = \x -> x `elem` ['a'..'z']
        underscore = \x -> x == '_'
        letdigs = munch (\c -> isLetter c || isDigit c || underscore c)

ident :: Parser Ident
ident = do _ <- skipSpaces
           s <- identToken
           skipSpaces
           return s

test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
    [(j, "")]   -> Right j
    []          -> Left "Invalid parse"
    multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
  where prog :: Parser [Ident]
        prog = do result <- many ident
                  eof
                  return result

main = print $ test_parseIdents "test_1349_zefz"

所以出了什么问题：

+++ 对其参数施加顺序，并允许多个替代方案成功 (symmetric choice)。 <++ 偏左，因此只有最左边的选项成功 -> 这将消除解析中的歧义，但仍会留下下一个问题。
您的解析器正在查找字母 first、then 数字和 finally 下划线。例如，下划线后的数字失败。解析器必须修改为munch 字符，字母、数字或下划线。

我还删除了一些未使用的函数，并对您的数据类型的定义进行了有根据的猜测。

【讨论】：