【问题标题】:XSLT To Find Repeating values while transformingXSLT 在转换时查找重复值
【发布时间】:2017-06-09 16:09:19
【问题描述】:

我有以下 XML:

<?xml version="1.0" encoding="utf-8"?>
<NewDataSet>
    <GUID>
        <Active>true</Active>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <DateOfBirth>16/01/1988</DateOfBirth>
        <FirstName>Fred</FirstName>
        <Notes>some notes</Notes>
        <PlaceOfResidence>United Kingdom</PlaceOfResidence>
        <RowNumber>1</RowNumber>
        <TableName>PersonDetails</TableName>
    </GUID>
    <GUID>
        <Active>true</Active>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <DateOfBirth>01/01/1960</DateOfBirth>
        <FirstName>Harold</FirstName>
        <Notes>some notes</Notes>
        <PlaceOfResidence>United Kingdom</PlaceOfResidence>
        <RowNumber>2</RowNumber>
        <TableName>PersonDetails</TableName>
    </GUID>
    <GUID>
        <Active>true</Active>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <DateOfBirth>05/05/1955</DateOfBirth>
        <FirstName>Mary</FirstName>
        <Notes>some notes</Notes>
        <PlaceOfResidence>United States</PlaceOfResidence>
        <RowNumber>3</RowNumber>
        <TableName>PersonDetails</TableName>
    </GUID>
    <GUID>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <CoverType>Property</CoverType>
        <DateAdded>01/06/2017</DateAdded>
        <Notes>some notes</Notes>
        <RowNumber>1</RowNumber>
        <TableName>Covers</TableName>
    </GUID>
    <GUID>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <CoverType>Motor</CoverType>
        <DateAdded>01/06/2017</DateAdded>
        <Notes>some notes</Notes>
        <RowNumber>2</RowNumber>
        <TableName>Covers</TableName>
    </GUID>
    <GUID>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <CoverType>Liability</CoverType>
        <DateAdded>01/06/2017</DateAdded>
        <Notes>some notes</Notes>
        <RowNumber>3</RowNumber>
        <TableName>Covers</TableName>
    </GUID>
</NewDataSet>

我需要将其转换为以下内容:

<data>
    <ContractName>Contract Name</ContractName>
    <ContractNumber>Auto</ContractNumber>
    <Table>
        <TableRow RowNumber="1" TableName="PersonDetails">
            <FirstName>Fred</FirstName>
            <PlaceOfResidence>United Kingdom</PlaceOfResidence>
            <DateOfBirth>16/01/1988</DateOfBirth>
            <Active>true</Active>
        </TableRow>
        <TableRow RowNumber="2" TableName="PersonDetails">
            <FirstName>Harold</FirstName>
            <PlaceOfResidence>United Kingdom</PlaceOfResidence>
            <DateOfBirth>01/01/1960</DateOfBirth>
            <Active>true</Active>
        </TableRow>
        <TableRow RowNumber="3" TableName="PersonDetails">
            <FirstName>Mary</FirstName>
            <PlaceOfResidence>United States</PlaceOfResidence>
            <DateOfBirth>05/05/1955</DateOfBirth>
            <Active>true</Active>
        </TableRow>
    </Table>
    <Table>
        <TableRow RowNumber="1" TableName="Covers">
            <CoverType>Property</CoverType>
            <DateAdded>01/06/2017</DateAdded>
        </TableRow>
        <TableRow RowNumber="2" TableName="Covers">
            <CoverType>Motor</CoverType>
            <DateAdded>01/06/2017</DateAdded>
        </TableRow>
        <TableRow RowNumber="3" TableName="Covers">
            <CoverType>Liability</CoverType>
            <DateAdded>01/06/2017</DateAdded>
        </TableRow>
    </Table>
    <Notes>some notes</Notes>
</data>

我只能使用 XSLT 1.0。

到目前为止我有:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-16"/>
<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*[(*)]">
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="/">
<data>
    <xsl:apply-templates select="@* | node()"/>
</data>
</xsl:template>
</xsl:stylesheet>

去除&lt;NewDataSet&gt;&lt;GUID&gt; 标签并替换为&lt;data&gt;

但是我不确定如何生成 2 个表分组,以及如何自动* 识别重复值:ContractName、Contract Number 和 Notes。

*其他重复值可能稍后出现。

任何帮助或指针将不胜感激。

【问题讨论】:

  • 那么如果PlaceOfResidence 对表格中的每一行也完全相同,你会想要省略它吗?
  • 您好,每个表格中都不应重复 PlaceOfResidence。

标签: xml xslt xslt-1.0 grouping


【解决方案1】:

这里的部分挑战是你有一个GUID 元素的平面列表,你需要按TableName 值分组。这似乎非常适合 Muenchian 分组,Jenny Tennison 的网站上详细描述了这种技术:http://www.jenitennison.com/xslt/grouping/muenchian.html

这里是一些 XSL 代码,它产生的输出几乎与您所需的 XML 格式相同——唯一的区别在于 &lt;TableRow&gt; 中的元素顺序。然而,这种转变有许多不明确的方面。我已经在代码 cmets 中指出了这些问题。

<xsl:output method="xml" version="1.0" encoding="utf-16" indent="yes"/>

<!-- Avoids newline whitespace where source elements are excluded from output -->
<xsl:strip-space elements="*"/>

<!-- The input XML appears to be a flat-ish list of `GUID` elements,
    each of which represents a single row of data in any of various tables. 
    In reorganizing this data, we want to group by `GUID/TableName` values.
    Muenchian grouping is probably the best way to do this in XSLT 1.0:
    see more at http://www.jenitennison.com/xslt/grouping/muenchian.html.
    This requires a key, so before we try to process the `GUID`s, we set 
    up the key. -->
<xsl:key name="table" match="GUID" use="TableName"/>

<!-- Begin at the beginning: the root and topmost element. -->
<xsl:template match="/NewDataSet">
    <data>
        <!-- I see in your desired output XML that you've put
            `ContractName` and `ContractNumber` just under the
            top-level `data` element.  This appears to assume
            that ALL of these have the same value for ALL of
            the individual `GUID` recordsets.
            In your input XML, these two are elements under `GUID`,
            so these data fields are included in the individual data
            records.  NOTE: If there is *any* chance that the values
            in these fields might differ between records, these
            should be kept within the table rows, and *not* moved
            to the same level as the output `Table` elements. -->
        <!-- This just naively copies these two elements from the first 
            `GUID` that has them.
            Again, if these values have any chance of differing between 
            `GUID`s, this whole approach is flawed. -->
        <xsl:copy-of select="GUID[ContractName][1]/ContractName"/>
        <xsl:copy-of select="GUID[ContractNumber][1]/ContractNumber"/>

        <!-- We want to process `GUID`s after grouping by `TableName` values.
            This `for-each` is part of the Muenchian grouping technique.  See
            Jenny Tennison's page (linked above) for a detailed explanation. -->
        <xsl:for-each select="GUID[count(. | key('table', TableName)[1]) = 1]">
            <!-- If you wanted to sort alphabetically by TableName, you'd use:
                <xsl:sort select="TableName" /> -->
            <Table>
                <!-- Now, within each `table`, we want to process all those
                    `GUID`s with this same corresponding `TableName`. -->
                <xsl:for-each select="key('table', TableName)">
                    <!-- We select "this", since we want to process the matching
                        `GUID`, not just its children. -->
                    <xsl:apply-templates select="."/>
                </xsl:for-each>
            </Table>
        </xsl:for-each>

        <!-- This just copies the `Notes` element from the first `GUID` that has a 
            `Notes` child.
            Similar to `ContactName` and `ContactNumber`, this naively assumes that
            all `Notes` elements have identical content. This approach is flawed if
            there is *any* possibility of different values. -->
        <xsl:copy-of select="GUID[Notes][1]/Notes"/>
    </data>

</xsl:template>

<!-- List up the elements we don't want to copy verbatim into each table row -->
<xsl:variable name="nocopy">
    <item>ContractName</item>
    <item>ContractNumber</item>
    <item>Notes</item>
    <item>RowNumber</item>
    <item>TableName</item>
</xsl:variable>

<xsl:template match="GUID">
    <TableRow RowNumber="{RowNumber}" TableName="{TableName}">
        <!-- Copy over child data, but _only_ if it's not in `$nocopy` -->
        <xsl:copy-of select="*[not(name() = $nocopy/item)]"/>
    </TableRow>
</xsl:template>

更新

重新阅读您帖子中的文字(而不仅仅是您的代码:)),我看到您在询问如何识别重复的元素。但是,您所需的 XML 输出似乎已经假定 ContractNameContractNumberNotes 元素必须在所有 GUID 结构之间相同。

这令人困惑。您想要的输出已经假设您的问题的答案。

您的意思是要问,“我如何识别所有 GUID 结构共有的任何 GUID 子元素,并创建这些元素的单个顶级副本,同时将它们从输出GUID 结构?”

更新 2

在 XSL 中很容易确定给定元素是否存在于一组 XML 结构中的任何位置。

要确定给定元素是否存在于一组 XML 结构的每个中并不容易。然而,虽然丑陋,但这是可能的。 :)

将上面的 XSL 替换为以下内容。

<xsl:output method="xml" version="1.0" encoding="utf-16" indent="yes"/>

<!-- Avoids newline whitespace where source elements are excluded from output -->
<xsl:strip-space elements="*"/>

<!-- The input XML appears to be a flat-ish list of `GUID` elements,
each of which represents a single row of data in any of various tables. 
In reorganizing this data, we want to group by `GUID/TableName` values.
Muenchian grouping is probably the best way to do this in XSLT 1.0:
see more at http://www.jenitennison.com/xslt/grouping/muenchian.html.
This requires a key, so before we try to process the `GUID`s, we set 
up the key. -->
<xsl:key name="table" match="GUID" use="TableName"/>

这个$kids 变量是我们确定所有GUID 结构共有哪些GUID 子级的关键部分。在 XSL 1.0 中可能有一种更优雅、更有效的方式来执行此操作;在 Oxygen XML(使用 Saxon-HE 9.6.0.7 处理器)中针对您的小型数据集运行此操作需要 0.8 秒。

<!-- Build list of unique GUID children that appear in all GUID structures -->
<xsl:variable name="kids">
    <xsl:for-each select="/NewDataSet/GUID/*">
        <xsl:variable name="this" select="."/>
        <!-- Intermediate variable used to collect results of whether the
            given child is in each GUID -->
        <xsl:variable name="in_all">
            <xsl:for-each select="/NewDataSet/GUID/*">
                <xsl:if test="name($this) = name(.) and $this = .">
                    <result><xsl:value-of select="true()"/></result>
                </xsl:if>
            </xsl:for-each>
        </xsl:variable>
        <!-- If we have the same number of `result`s as we have number of GUIDs,
                then output the first of each such child (there are dupes otherwise). -->
        <xsl:if test="count($in_all/result) = count(/NewDataSet/GUID) and 
         not(.=preceding::*)">
            <xsl:copy-of select="$this"/>
        </xsl:if>
    </xsl:for-each>
</xsl:variable>

这部分基本相同,除了关于$kids的部分。

<!-- Begin at the beginning: the root and topmost element. -->
<xsl:template match="/NewDataSet">
    <data>
        <!-- We'll put common elements at the top of the `data` structure. -->
        <xsl:copy-of select="$kids"/>

        <!-- We want to process `GUID`s after grouping by `TableName` values.
        This `for-each` is part of the Muenchian grouping technique.  See
        Jenny Tennison's page (linked above) for a detailed explanation. -->
        <xsl:for-each select="GUID[count(. | key('table', TableName)[1]) = 1]">
            <!-- If you wanted to sort alphabetically by TableName, you'd use:
            <xsl:sort select="TableName" /> -->
            <Table>
                <!-- Now, within each `table`, we want to process all those
                `GUID`s with this same corresponding `TableName`. -->
                <xsl:for-each select="key('table', TableName)">
                    <!-- We select "this", since we want to process the matching
                    `GUID`, not just it's children. -->
                    <xsl:apply-templates select="."/>
                </xsl:for-each>
            </Table>
        </xsl:for-each>
    </data>

</xsl:template>

更新了$nocopy 以包含由$kids 标识的元素名称。

<!-- List up the elements we don't want to copy verbatim into each table row -->
<xsl:variable name="nocopy">
    <item>RowNumber</item>
    <item>TableName</item>
    <!-- Copy in the bits from $kids -->
    <xsl:for-each select="$kids/*">
        <item><xsl:value-of select="name(.)"/></item>
    </xsl:for-each>
</xsl:variable>

<xsl:template match="GUID">
    <TableRow RowNumber="{RowNumber}" TableName="{TableName}">
        <!-- Copy over child data, but _only_ if it's not in `$nocopy` -->
        <xsl:copy-of select="*[not(name() = $nocopy/item)]"/>
    </TableRow>
</xsl:template>

这会产生与您想要的输出 XML 功能相同的输出。唯一的区别是元素的顺序——TableRow 子元素的顺序不同,Notesdata 的顶部,与ContractNameContractNumber 并排,而不是在@987654348 的底部@。

关于输出 XML 数据格式的说明

将表名作为属性包含在每个表行上似乎有点奇怪。将其作为Table 元素本身的属性会更有意义。

同样,在每一行上都有一个RowNumber 属性似乎是多余的。只需查看其父 Table 中每个 TableRowposition() 即可收集此信息。

也就是说,您知道自己的要求。这只是让事情正常工作的问题。 :)

【讨论】:

  • 您好,首先感谢您的快速回复!回答您的问题:“我如何识别所有 GUID 结构共有的任何 GUID 子元素,并创建这些元素的单个顶级副本,同时从输出 GUID 结构中删除它们?”是的。这正是我所追求的。这些项目将始终以相同的属性重复。这是对您已经提供的内容的补充,用于生成表格。
猜你喜欢
  • 2011-05-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-06-01
  • 1970-01-01
  • 2020-02-11
  • 1970-01-01
相关资源
最近更新 更多