【问题标题】:Groovy script (NiFi) - xml select dynamically attributes of child nodeGroovy 脚本 (NiFi) - xml 动态选择子节点的属性
【发布时间】:2021-12-30 19:00:30
【问题描述】:

我有一个这样的 xml 结构:

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <Document>
 <ExportData>
    <Site name="name" f="">
        <Kapta id1="id1">
            <Infos>
                <Info>
                    <EndPoint foo="value-name" />
                </Info>
            </Infos>
            <Samples>
                <Sample date="date" attribute1="5.44" attribute2="234" attribute3="8.45"/>
                <Sample date="date" attribute1="7.45" attribute5="8.45"/>
            </Samples>
        </Kapta>
        <Kapta id2="id2">
            <Infos>
                <Info>
                    <EndPoint foo="value-name" />
                </Info>
            </Infos>
            <Samples>
                <Sample date="date" attribute1="5.44" attribute2="234" attribute3="8.45"/>
                <Sample date="date" attribute1="7.45" attribute5="8.45" attribute6="7.45" attribute7="8.45"/>
            </Samples>
        </Kapta>
    </Site>
 </ExportData>

想要的输出是这样的:

 {"time":"date1","name":"id1_attribute1","value":5.44}
 {"time":"date1","name":"id1_attribute2","value":234}
 {"time":"date1","name":"id1_attribute3","value":8.45}
 {"time":"date2","name":"id1_attribute4","value":7.45}
 {"time":"date2","name":"id1_attribute5","value":8.45}
 {"time":"date3","name":"id2_attribute1","value":5.44}
 .
 .
 .

我通过(列出并获取 NiFi 中的 ftp 处理器,但我无法打印我想要的输出。

我正在尝试通过this 相关问题中的此代码获得所需的输出,但我不确定如何更改它以使其正确。

所以代码如下:

import org.apache.nifi.flowfile.FlowFile;
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.InputStreamCallback
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import groovy.xml.dom.DOMCategory
import groovy.json.JsonGenerator

def flowFile

try {

  flowFile = session.get()

  DocumentBuilderFactory dbFactory = 
DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = null

session.read(flowFile, {inputStream ->
    doc =  dBuilder.parse(inputStream)
} as InputStreamCallback)

def root = doc.documentElement
def sb = new StringBuilder()
def jsonGenerator = new 
 JsonGenerator.Options().disableUnicodeEscaping().build()

// get a specific attribute
use(DOMCategory) {
    root['ExportData']['Site']['*'].findAll { node ->
        def data = new LinkedHashMap()
        data.id = node['@id1']
        sb.append(jsonGenerator.toJson(data))
        sb.append('\n')
    }   
 }

  // get all attributes of Sample under Samples
  use(DOMCategory) {
    root['ExportData']['Site']['Kapta']['Samples']['*'].findAll { 
  node ->
        def data = new LinkedHashMap()
        data.NodeName = node.name()
        def attributesMap = node.attributes()
        for (int x = 0; x < attributesMap.getLength(); x++) {
            data.AttrName = attributesMap.item(x).getNodeName();
            data.AttrValue = attributesMap.item(x).getNodeValue();
            sb.append(jsonGenerator.toJson(data))
            sb.append('\n')
        }
                
   }
 }   

 flowFile = session.write(flowFile, {inputStream, outputStream ->
    
 outputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8))
 } as StreamCallback)

 session.transfer(flowFile, REL_SUCCESS)

 } catch (Exception e) {
   log.error('',e)
   session.transfer(flowFile, REL_FAILURE)
 }

此代码输出一个属性 id,然后动态输出所有示例属性。我想像上面描述的那样打印每个 id,它的示例属性。

非常感谢您的时间和精力!

【问题讨论】:

  • 请添加您尝试过的代码以及失败的原因(例如错误、堆栈跟踪、日志等),以便我们对其进行改进。 “但我无法打印我想要的输出” - 有什么问题?它根本不打印吗?它打印错误的东西吗?为什么错了?
  • 感谢您的笔记,我编辑了我的问题,请再次检查!
  • 能否请您添加您得到的输出并指出错误的原因。

标签: xml groovy xml-parsing apache-nifi


【解决方案1】:

ExecuteGroovyScript 处理器的代码

import groovy.json.JsonBuilder

def ff = session.get()
if(!ff) return

ff.write{streamIn, streamOut->
    def xml = new XmlParser().parse(streamIn)
    def json = xml.ExportData.Site.Kapta.Samples.Sample.collectMany{sample->
        def attr = sample.attributes()
        def date = attr.remove('date')
        //use regexp to find id attribute by prefix `id`
        def id = sample.parent().parent().attributes().find{ k,v-> k =~ "^id.*" }.value
        attr.collect{k,v->
            [
                time: date,
                name: "${id}_${k}",
                value: new BigDecimal(v),
            ]
        }
    }
    streamOut.withWriter("UTF-8"){w-> new JsonBuilder(json).writeTo(w) }
}
ff."mime.type" = "application/json"
REL_SUCCESS<<ff

输出:

[
    {
        "time": "date1",
        "name": "id1_attribute1",
        "value": 5.44
    },
    {
        "time": "date1",
        "name": "id1_attribute2",
        "value": 234
    },
    {
        "time": "date1",
        "name": "id1_attribute3",
        "value": 8.45
    },
    {
        "time": "date2",
        "name": "id1_attribute1",
        "value": 7.45
    },
    {
        "time": "date2",
        "name": "id1_attribute5",
        "value": 8.45
    },
    {
        "time": "date3",
        "name": "id2_attribute1",
        "value": 5.44
    },
    {
        "time": "date3",
        "name": "id2_attribute2",
        "value": 234
    },
    {
        "time": "date3",
        "name": "id2_attribute3",
        "value": 8.45
    },
    {
        "time": "date4",
        "name": "id2_attribute1",
        "value": 7.45
    },
    {
        "time": "date4",
        "name": "id2_attribute5",
        "value": 8.45
    },
    {
        "time": "date4",
        "name": "id2_attribute6",
        "value": 7.45
    },
    {
        "time": "date4",
        "name": "id2_attribute7",
        "value": 8.45
    }
]

【讨论】:

  • 感谢您的回答!我尝试了脚本,但出现此错误
  • ExecuteScript[id=be8ea6ad-017d-1000-f152-21ac2f1f011e] 由于 javax.script.ScriptException 无法处理会话:javax.script.ScriptException:groovy.lang.MissingMethodException:没有方法签名: org.apache.nifi.controller.repository.StandardFlowFileRecord.write() 适用于参数类型:(Script5795$_run_closure1) 值:
  • [Script5795$_run_closure1@1bca0a07] 可能的解决方案:wait(), wait(long), with(groovy.lang.Closure), wait(long, int), print(java.lang.Object ), print(java.io.PrintWriter) ↳ 原因
  • 你必须使用 executeGroovyScript 处理器
  • 谢谢它按预期工作!
猜你喜欢
  • 2012-11-10
  • 1970-01-01
  • 1970-01-01
  • 2023-04-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多