【问题标题】:xml-conduit parse xml attributesxml-conduit 解析 xml 属性
【发布时间】:2015-12-14 12:06:36
【问题描述】:

使用 xml-conduit 解析 XML 我偶然发现了以下问题:当我有多个属性时,具有相同的基本名称但不同的前缀只有(词法)顺序中的第一个。

如果属性的前缀和非前缀版本都存在,我如何获取前缀值?

最小的非工作示例:

Main.hs

{-# LANGUAGE OverloadedStrings #-}

module Main where

import           Data.Text.Lazy (Text)
import qualified Data.Text.Lazy as T
import           Text.XML (parseText, def, elementAttributes, documentRoot)
import           Data.List (splitAt, drop)

main :: IO ()
main = do
  putStrLn "Example1: only the first element is parsed"
  putStrLn "========\n"
  print $ elementAttributes . documentRoot <$> parseText def (T.unlines test)
  putStrLn "Example2: this behaviour is independent of both having a prefix"
  putStrLn "========\n"
  print $ elementAttributes . documentRoot <$> parseText def (T.unlines $ dropAt 1 test)
  putStrLn "Example3: also no difference if there is just one attribute with prefix"
  putStrLn "========\n"
  print $ elementAttributes . documentRoot <$> parseText def (T.unlines $ dropAt 2 test)
  putStrLn "Example4: on its own the last element can be parsed"
  putStrLn "========\n"
  print $ elementAttributes . documentRoot <$> parseText def (T.unlines $ dropAt 1 $ dropAt 1 test)
  putStrLn "==============="
  putStrLn "Example1: it is always the first element parsed"
  putStrLn "========\n"
  print $ elementAttributes . documentRoot <$> parseText def (T.unlines test2)
  putStrLn "Example2: really just the first"
  putStrLn "========\n"
  print $ elementAttributes . documentRoot <$> parseText def (T.unlines $ dropAt 1 test2)


test :: [Text]
test =["<Root"
      ,  "here    = \"ok\""
      ,  "is:here = \"ok\""
      ,  "not:here=\"nok\">"
      ,"</Root>"]

test2 :: [Text]
test2 =["<Root"
       ,  "is:here = \"ok\""
       ,  "here    = \"ok\""
       ,  "not:here=\"nok\">"
       ,"</Root>"]

dropAt :: Int -> [a] -> [a]
dropAt i xs = let (hd,tl) = splitAt i xs
              in hd ++ drop 1 tl

attr.cabal

build-depends: base >= 4.7 && < 5
             , xml-conduit
             , text

> stack exec attr
Example1: only the first element is parsed
========

Right (fromList [(Name {nameLocalName = "here", nameNamespace = Nothing, namePrefix = Nothing},"ok")])
Example2: this behaviour is independent of both having a prefix
========

Right (fromList [(Name {nameLocalName = "here", nameNamespace = Nothing, namePrefix = Just "is"},"ok")])
Example3: also no difference if there is just one attribute with prefix
========

Right (fromList [(Name {nameLocalName = "here", nameNamespace = Nothing, namePrefix = Nothing},"ok")])
Example4: on its own the last element can be parsed
========

Right (fromList [(Name {nameLocalName = "here", nameNamespace = Nothing, namePrefix = Just "not"},"nok")])
===============
Example1: only the first element is parsed
========

Right (fromList [(Name {nameLocalName = "here", nameNamespace = Nothing, namePrefix = Just "is"},"ok")])
Example2: this behaviour is independent of both having a prefix
========

Right (fromList [(Name {nameLocalName = "here", nameNamespace = Nothing, namePrefix = Nothing},"ok")])

【问题讨论】:

    标签: xml haskell xml-conduit


    【解决方案1】:

    引用Text.XML.Name

    前缀在语义上并不重要;包含它们只是为了简化传递解析。使用 Eq 或 Ord 方法比较名称时,前缀将被忽略。

    语义上的区别在于命名空间,所以下面解决了你的问题:

    test :: [Text]
    test =["<Root xmlns:is=\"http://example.com\" xmlns:not=\"http://example.com/2\""
          ,  "here    = \"ok\""
          ,  "is:here = \"ok\""
          ,  "not:here=\"nok\">"
          ,"</Root>"]
    

    这也是有道理的,因为我们可以在不同的地方以不同的方式命名同一个命名空间,但它应该仍然是相同的。我认为使用前缀而不将命名空间与它们关联也不是有效的 XML。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-09-27
      • 2010-11-02
      • 2010-11-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多