【问题标题】:Haskell read variable nameHaskell 读取变量名
【发布时间】:2015-09-24 06:31:53
【问题描述】:

我需要编写一个解析某种语言的代码。我被困在解析变量名上——它可以是至少 1 个字符长的任何东西,以小写字母开头,并且可以包含下划线“_”字符。我想我用下面的代码开了个好头:

identToken :: Parser String
identToken = do 
                       c <- letter
                       cs <- letdigs
                       return (c:cs)
             where letter = satisfy isLetter
                   letdigs = munch isLetter +++ munch isDigit +++ munch underscore
                   num = satisfy isDigit
                   underscore = \x -> x == '_'
                   lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?

ident :: Parser Ident
ident = do 
          _ <- skipSpaces
          s <- identToken
          skipSpaces; return $ s

idents :: Parser Command
idents = do 
          skipSpaces; ids <- many1 ident
          ...

然而,这个函数给了我一个奇怪的结果。如果我调用我的测试函数

test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p = 
  case readP_to_S prog p of
    [(j, "")] -> Right j
    [] -> Left InvalidParse
    multipleRes -> Left (AmbiguousIdents multipleRes)
  where
    prog :: Parser [Ident]
    prog = do
      result <- many ident
      eof
      return result

像这样:

test_parseIdents  "test"

我明白了:

Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
    (["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])

请注意,Parser 只是 ReadP a 的同义词。

我还想在解析器中编码变量名应该以小写字符开头。

感谢您的帮助。

【问题讨论】:

    标签: parsing haskell


    【解决方案1】:

    部分问题在于您使用了+++ 运算符。以下代码适用于我:

    import Data.Char
    import Text.ParserCombinators.ReadP
    
    type Parser a = ReadP a
    type Ident = String
    
    identToken :: Parser String
    identToken = do c <- satisfy lowerCase
                    cs <- letdigs
                    return (c:cs)
      where lowerCase = \x -> x `elem` ['a'..'z']
            underscore = \x -> x == '_'
            letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
    
    ident :: Parser Ident
    ident = do _ <- skipSpaces
               s <- identToken
               skipSpaces
               return s
    
    test_parseIdents :: String -> Either String [Ident]
    test_parseIdents p = case readP_to_S prog p of
        [(j, "")]   -> Right j
        []          -> Left "Invalid parse"
        multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
      where prog :: Parser [Ident]
            prog = do result <- many ident
                      eof
                      return result
    
    main = print $ test_parseIdents "test_1349_zefz"
    

    所以出了什么问题:

    • +++ 对其参数施加顺序,并允许多个替代方案成功 (symmetric choice)。 &lt;++ 偏左,因此只有最左边的选项成功 -> 这将消除解析中的歧义,但仍会留下下一个问题。

    • 您的解析器正在查找字母 firstthen 数字和 finally 下划线。例如,下划线后的数字失败。解析器必须修改为munch 字符, 字母、数字或下划线。

    我还删除了一些未使用的函数,并对您的数据类型的定义进行了有根据的猜测。

    【讨论】:

      猜你喜欢
      • 2016-01-24
      • 2013-08-02
      • 1970-01-01
      • 1970-01-01
      • 2023-03-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多