【问题标题】:haskell: xml filtering a subtreehaskell:xml过滤子树
【发布时间】:2016-09-23 12:44:16
【问题描述】:

我正在努力使用 haskell 删除一个元素及其所有子元素。 任务是从给定的 xml 文档中删除所有表格标签(也许我还没有理解游标的概念,或者它是我缺少的其他东西)。

我尝试了三种不同的方法:

  • 具有遍历/过滤功能并使用新元素设置过滤值的镜头 - 此处仅替换标签,而不替换内容
  • 使用光标访问表格元素 - 重置那里的内容并通过将光标向上遍历到文档根再次获取文档根 - 没有过滤
  • 递归过滤文档根目录的子项 - 没有过滤

工具

  • xml-conduit
  • xml-lens
  • ghc-8.0.1

输入 (test.xml)/输出

 INPUT                                     EXPECTED OUTPUT (for the filtered cases)
<?xml version="1.0"?>                 |  <?xml version="1.0"?>                   
<root>                                |  <root>                                  
    <a>                               |      <a>                                 
        ...                           |          ...                             
    </a>                              |      </a>                                
    <b>                               |      <b>                                 
        <table>                       |          <bb>                            
            <!--table entries-->      |              ...                         
        </table>                      |          </bb>                           
        <bb>                          |      </b>                                
            ...                       |      <c>                                 
        </bb>                         |          <cc>                            
    </b>                              |              ...                         
    <c>                               |          </cc>                           
        <cc>                          |      </c>                                
            ...                       |  </root>                                 
        </cc>
    </c>
</root>

最小的非工作示例

{-# LANGUAGE OverloadedStrings #-}

module Minimal where

import           Control.Lens
import           Data.Conduit.Text as CT
import           Data.Default
import qualified Data.Text.Lazy.IO as TIO
import           Text.XML
import           Text.XML.Cursor
import qualified Text.XML.Lens     as L
import           Data.Maybe (isNothing, isJust)

main :: IO ()
main = do test <- Text.XML.readFile def "./test.xml"
          pput $ filterDocument test

          let cursor = fromDocument test

          pput $ docUp $ elemUp $ getRoot ((head $ cursor $// checkName (== "table")) {child = []} )

          pput $ docUp $ elemUp $ filterChildren (checkName (/= "table")) cursor
          return ()


filterChildren :: Axis -> Cursor -> Cursor
filterChildren pred c = c {child = map (filterChildren pred) (c $/ pred) }

filterDocument :: Document -> Document
filterDocument doc = doc & (L.root.L.entire.filtered (\e -> isJust $ e^?L.named "table") .~ emptyElemt)
  where emptyElemt = Element "empty" mempty []

-- helper functions --

docUp :: Element -> Document
docUp e = Document {documentRoot = e, documentPrologue = Prologue [] Nothing [], documentEpilogue = [] }

elemUp :: Cursor -> Element
elemUp cursor = Element {elementName = "DOC", elementAttributes = mempty , elementNodes = [node cursor]}

elemUp' :: [Cursor] -> Element
elemUp' cursors = Element {elementName = "DOC", elementAttributes = mempty , elementNodes = map node cursors}

getRoot :: Cursor -> Cursor
getRoot c = let p = (c $| parent)
            in if null p then c else getRoot $ head p

pput :: Document -> IO ()
pput = TIO.putStrLn . renderText pretty
  where pretty = def {rsPretty = True}

输出

> stack ghci
. . .
Ok, modules loaded: Minimal.
λ > main
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <a>
        ...
    </a>
    <b>
        <empty>
            <!-- table entries -->
        </empty>
        <bb>
            ...
        </bb>
    </b>
    <c>
        <cc>
            ...
        </cc>
    </c>
</root>

<?xml version="1.0" encoding="UTF-8"?>
<DOC>
    <root>
        <a>
            ...
        </a>
        <b>
            <table>
                <!-- table entries -->
            </table>
            <bb>
                ...
            </bb>
        </b>
        <c>
            <cc>
                ...
            </cc>
        </c>
    </root>
</DOC>

<?xml version="1.0" encoding="UTF-8"?>
<DOC>
    <root>
        <a>
            ...
        </a>
        <b>
            <table>
                <!-- table entries -->
            </table>
            <bb>
                ...
            </bb>
        </b>
        <c>
            <cc>
                ...
            </cc>
        </c>
    </root>
</DOC>

【问题讨论】:

    标签: xml haskell filter xml-conduit


    【解决方案1】:

    此代码似乎可以根据 xml-conduit 执行您想要的操作。我从yesod网书example开始,通过一个简单的递归函数实现了转换。

    {-# LANGUAGE OverloadedStrings #-}
    import qualified Data.Map        as M
    import           Prelude         hiding (readFile, writeFile)
    import           Text.XML
    
    main :: IO ()
    main = do
        Document prologue root epilogue <- readFile def "test.xml"
    
        let root' = transform root
    
        writeFile def
            { rsPretty = True
            } "output.html" $ Document prologue root' epilogue
    
    transform :: Element -> Element
    transform (Element _name attrs children) = 
      Element _name attrs (filterChildren children)
    
    filterChildren :: [Node] -> [Node]
    filterChildren = concatMap kickTable
      where
        kickTable :: Node -> [Node]
        kickTable (NodeElement (Element "table" attrs children)) = -- Drop it
          [  ]
        kickTable (NodeElement (Element n attrs children)) = -- Recurse on
          [ NodeElement (Element n attrs (filterChildren children)) ]
        kickTable n = -- ok - whatever
          [ n ]
    

    我的 lens-foo 不够强大,无法说明为什么您的解决方案不起作用,但从文档中 - 您必须小心 filtered,不要违反遍历法则,尽管我不知道什么时候会发生你违反了他们。

    希望对您有所帮助。

    【讨论】:

    • 从这本书和黑线鳕的一点点看来,对于这个用例来说,像上面这样的简单递归函数似乎是解决问题的最简单方法。游标似乎主要是关于从 xml 中提取数据,而不是关于修改 xml - 至少在我的理解中。
    【解决方案2】:

    我不知道Text.XML,但这是Text.XML.Light 的解决方案:

    module Minimal where
    
    import Data.Maybe(catMaybes)
    import Text.XML.Light.Input
    import Text.XML.Light.Output
    import Text.XML.Light.Types
    
    main :: IO ()
    main = do
      test <- parseXML <$> readFile "./test.xml"
      mapM_ (putStrLn . ppContent) . catMaybes $ map cutTables test
    
    cutTables :: Content -> Maybe Content
    cutTables (Elem e) = if qName (elName e) == "table" then Nothing else
      Just . Elem $ e { elContent = catMaybes . map cutTables $ elContent e }
    cutTables x = Just x
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-07-04
      • 1970-01-01
      • 2010-10-11
      • 2015-06-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多