【发布时间】:2023-03-26 06:23:02
【问题描述】:
我正在尝试将 XML 文件转换为 CSV 文件。 (基本上是从XML获取数据成表格格式) 但我需要用于 csv 转换的特定 XML 文件的 XSL 文件。 这是一个复杂的 XML。我在这里是菜鸟。
这是示例 XML 文件。 我需要提取相关的所有字段,例如 1. MetaversionOID 2.StudyOID 3.LocationOID
<ODM
xmlns="http://www.cdisc.org/ns/odm/v1.3"
xmlns:p1="https://www.protocolfirst.com/ns/odm/v1.3.2" CreationDateTime="2019-03-25T06:33:43.806Z" FileOID="9c94b49a-0110-418b-a8e9-adb5d557b106" ODMVersion="1.3.2" FileType="Snapshot" AsOfDateTime="2019-03-25T06:33:43.806Z" SourceSystem="ProtocolFirst EDC">
<ClinicalData MetaDataVersionOID="1.0" StudyOID="BAML-S16 AGI-IDH1">
<SubjectData SubjectKey="101-155-16">
<StudyEventData StudyEventOID="1.01" p1:Name="Screening (Master)" p1:CreationDateTime="2018-11-01T14:45:12.997Z" p1:Branch="1.0" p1:NotDone="N" p1:VisitDate="2018-10-18T04:00:00.000Z">
<FormData FormOID="demo" p1:Name="Demographics" p1:Started="Y" p1:NotDone="N">
<ItemGroupData ItemGroupOID="demo">
<ItemData ItemOID="2a48d0b6-de96-4da9-8b90-c9d555ccbc45" p1:FieldName="Date of Birth" p1:EntryType="Transcription" Value="1950-08-24" p1:TimezoneOffset="-04:00">
<AuditRecord>
<UserRef UserOID="molly.vittorio@osumc.edu"/>
<DateTimeStamp>2018-11-05T16:30:42.220Z</DateTimeStamp>
</AuditRecord>
</ItemData>
<ItemData ItemOID="73bce803-1540-479f-8022-1a814f5bfa8e" p1:FieldName="Sex" p1:EntryType="Transcription" Value="M" p1:DisplayValue="Male">
<AuditRecord>
<UserRef UserOID="molly.vittorio@osumc.edu"/>
<DateTimeStamp>2018-11-05T16:30:43.007Z</DateTimeStamp>
</AuditRecord>
</ItemData>
<ItemData ItemOID="bc160779-263c-40ca-97ce-72c8f07f907c" p1:FieldName="Ethnicity" p1:EntryType="Transcription" Value="NOT HISPANIC OR LATINO" p1:DisplayValue="Not Hispanic or Latino">
<AuditRecord>
<UserRef UserOID="molly.vittorio@osumc.edu"/>
<DateTimeStamp>2018-11-05T16:30:46.151Z</DateTimeStamp>
</AuditRecord>
</ItemData>
<ItemData ItemOID="8f064011-8e2b-486b-8b60-c2f744ca5235" p1:FieldName="Race" p1:EntryType="Transcription" Value="CAUCASIAN" p1:DisplayValue="Caucasian">
<AuditRecord>
<UserRef UserOID="molly.vittorio@osumc.edu"/>
<DateTimeStamp>2018-11-05T16:30:45.366Z</DateTimeStamp>
</AuditRecord>
</ItemData>
</ItemGroupData>
<AuditRecord EditPoint="Monitoring">
<p1:Review DateTimeStamp="2019-03-12T16:59:47.139Z" UserOID="lia.zevallos@syneoshealth.com" Action="query" Comment="Birth recorded in the SD 24 August 1950. Please verify and correct the CRF page, thanks."/>
<p1:Review DateTimeStamp="2018-11-05T16:30:51.928Z" UserOID="molly.vittorio@osumc.edu" Action="submitted"/>
<p1:Review DateTimeStamp="2018-11-01T14:45:12.997Z" UserOID="molly.vittorio@osumc.edu" Action="open"/>
</AuditRecord>
</FormData>
</StudyEventData>
</SubjectData>
</ClinicalData>
</ODM>
这是我用于将 XML 转换为 CSV 格式的代码。
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
public class Temp {
public static void main(String args[]) throws Exception {
Document document;
File stylesheet = new File("C:/Users/mmahajan/Desktop/Input/style.xsl");
File xmlSource = new File("C:/Users/mmahajan/Desktop/Input/subject-beataml-BAML-S8 AST-FLT3-20190325114820225683361888824.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
document = builder.parse(xmlSource);
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance().newTransformer(stylesource);
Source source = new DOMSource(document);
Result outputTarget = new StreamResult(new File("C:/Users/mmahajan/Desktop/Input/x.csv"));
transformer.transform(source, outputTarget);
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}
}
到目前为止,我已经编写了以下 XSL 文件。
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
Metaversion,StudyOID,SubjectKey,CreationDateTime,VisitDate,NotDone,Branch,Name,StudyEventOID,Name,Started,FormOID,ItemGroupOID,TimezoneOffset,Value,EntryType,FieldName,ItemOID,UserOID,DateTimeStamp
<xsl:for-each select="//AuditRecord">
<xsl:value-of select="concat(//ClinicalData/@MetaDataVersionOID,',',//ClinicalData/@StudyOID,',',//ClinicalData/SubjectData/@SubjectKey,',',//ClinicalData/SubjectData/StudyEventData/@CreationDateTime,',',//ClinicalData/SubjectData/StudyEventData/@CreationDateTime,',',//ClinicalData/SubjectData/StudyEventData/@VisitDate,',',//ClinicalData/SubjectData/StudyEventData/@NotDone,',',//ClinicalData/SubjectData/StudyEventData/@Branch,',',//ClinicalData/SubjectData/StudyEventData/@Name,',',//ClinicalData/SubjectData/StudyEventData/@StudyEventOID,',',//ClinicalData/SubjectData/StudyEventData/FormData/',',//ClinicalData/SubjectData/StudyEventData/FormData/@Started,',',//ClinicalData/SubjectData/StudyEventData/FormData/@FormOID,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/@ItemGroupOID,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/ItemData/@TimezoneOffset,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/ItemData/@Value,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/ItemData/@EntryType,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/ItemData/@FieldName,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/ItemData/@ItemOID,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/ItemData/AuditRecord/UserRef/@UserOID,',',//ClinicalData/SubjectData/StudyEventData/FormData/ItemGroupData/ItemData/AuditRecord/UserRef/@DateTimeStamp,
'
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
但这是错误的。对于正确 XSL 的任何形式的帮助将不胜感激。
更新:我已经修改了 XML 文件和 XSL 文件,但仍然无法 为其生成正确的 XSL 文件。
【问题讨论】:
-
预期输出?
-
我需要将 XML 文件中的信息提取到 CSV 文件中。我正在尝试这种方法。 [链接]stackoverflow.com/a/21415506/10884684
-
您问题中的 XML 示例的格式似乎有些不正确,但它表明您实际上有一个具有默认命名空间的根元素(“xmlns="...." 部分显示在您的分段)。如果确实如此,您应该查看这个问题; stackoverflow.com/questions/1344158/… 。谢谢!
-
您的模板匹配也可能有问题;
match="/ClinicalData"只会在根元素为ClinicalData时匹配任何内容,但在您的情况下,它(可能)是ODM(或者可能高于此)。谢谢! -
如果你想要CSV输出,为什么你的XSLT被写入输出
<html>和其他HTML元素?