【问题标题】:Multiple goal search on XML nodes in SQL ServerSQL Server 中 XML 节点上的多目标搜索
【发布时间】:2016-08-23 08:09:03
【问题描述】:

我在 SQL Server 中有一个这样的进程表:

workflowXML 列的值如下:

sample1 (ProcessID=1)

sample1 的workflowXML:

<process>
    <Event type="start" id="StartEvent_1" name="Start">
      <outgoing>SequenceFlow_0z7u86p</outgoing>
      <outgoing>SequenceFlow_1onkt3z</outgoing>
    </Event>
    <task type="" id="Task_0a7vu1x" name="D">
      <incoming>SequenceFlow_108ajnm</incoming>
      <incoming>SequenceFlow_1onkt3z</incoming>
      <outgoing>SequenceFlow_01clcmz</outgoing>
    </task>
    <task type="goal" id="Task_00ijt4n" name="B">
      <incoming>SequenceFlow_17q1ecq</incoming>
      <incoming>SequenceFlow_0q9j3et</incoming>
      <outgoing>SequenceFlow_1ygvv8b</outgoing>
      <outgoing>SequenceFlow_02glv1g</outgoing>
    </task>
    <task type="" id="Task_1rnuz4y" name="A">
      <incoming>SequenceFlow_1ygvv8b</incoming>
      <incoming>SequenceFlow_0z7u86p</incoming>
      <outgoing>SequenceFlow_108ajnm</outgoing>
      <outgoing>SequenceFlow_17q1ecq</outgoing>
      <outgoing>SequenceFlow_075iuj9</outgoing>
    </task>                
    <task type="goal" id="Task_1d4ykor" name="E">
      <incoming>SequenceFlow_01clcmz</incoming>
      <incoming>SequenceFlow_075iuj9</incoming>
      <incoming>SequenceFlow_1djp3tu</incoming>
      <outgoing>SequenceFlow_0q9j3et</outgoing>
    </task>        
    <task type="goal" id="Task_1sembw4" name="C">
      <incoming>SequenceFlow_02glv1g</incoming>
      <outgoing>SequenceFlow_1djp3tu</outgoing>
    </task>    
</process>

sample2 (ProcessID=2)

sample2 的workflowXML:

<process id="Process_1" isExecutable="false">
    <Event type="start" id="StartEvent_0bivq0x" name="Start">
      <outgoing>SequenceFlow_0q5ik20</outgoing>
      <outgoing>SequenceFlow_147xk2x</outgoing>
    </Event>
    <task type="" id="Task_141buye" name="A">
      <incoming>SequenceFlow_0q5ik20</incoming>
      <incoming>SequenceFlow_0wg37hn</incoming>
      <outgoing>SequenceFlow_1pvpyhe</outgoing>
      <outgoing>SequenceFlow_10is4pe</outgoing>
    </task>
    <task type="" id="Task_1n3p00i" name="C">
      <incoming>SequenceFlow_147xk2x</incoming>
      <incoming>SequenceFlow_10is4pe</incoming>
      <outgoing>SequenceFlow_18ks1jr</outgoing>
      <outgoing>SequenceFlow_08gxini</outgoing>
    </task>
    <task type="goal" id="Task_0olxqpp" name="B">
      <incoming>SequenceFlow_1pvpyhe</incoming>
      <outgoing>SequenceFlow_03eekq0</outgoing>
    </task>
    <task type="goal" id="Task_0zjgfkf" name="D">
      <incoming>SequenceFlow_18ks1jr</incoming>
      <incoming>SequenceFlow_03eekq0</incoming>
      <outgoing>SequenceFlow_0wg37hn</outgoing>
    </task>
    <task type="" id="Task_1q71efy" name="E">
      <incoming>SequenceFlow_08gxini</incoming>
    </task>
</process>

编辑1(添加样本3)

sample3 (ProcessID=3)

sample3 的workflowXML:

<process>
  <Event type="start" id="StartEvent_1" name="Start">
    <outgoing>SequenceFlow_01rkkhj</outgoing>
  </Event>
  <task type="" id="Task_1jixk79" name="A">
    <incoming>SequenceFlow_01rkkhj</incoming>
    <incoming>SequenceFlow_1tszkq8</incoming>
    <outgoing>SequenceFlow_0v8wuqu</outgoing>
    <outgoing>SequenceFlow_14u6fh7</outgoing>
    <outgoing>SequenceFlow_1q4991g</outgoing>
  </task>
  <task type="" id="Task_0xwvhuo" name="B">
    <incoming>SequenceFlow_0v8wuqu</incoming>
    <outgoing>SequenceFlow_15fmkbq</outgoing>
    <outgoing>SequenceFlow_0x4ykgp</outgoing>
    <outgoing>SequenceFlow_0f4gpf1</outgoing>
  </task>
  <task type="goal" id="Task_0qsvlob" name="G">
    <incoming>SequenceFlow_0qse1xk</incoming>
    <incoming>SequenceFlow_16a0qvv</incoming>
  </task>
  <task type="goal" id="Task_0wtjftd" name="E">
    <incoming>SequenceFlow_14u6fh7</incoming>
    <incoming>SequenceFlow_0z3qle8</incoming>
    <outgoing>SequenceFlow_0vg7sax</outgoing>
    <outgoing>SequenceFlow_0qse1xk</outgoing>
  </task>
  <task type="" id="Task_0c85e6p" name="F">
    <incoming>SequenceFlow_0x4ykgp</incoming>
    <incoming>SequenceFlow_17k5zfg</incoming>
    <outgoing>SequenceFlow_16a0qvv</outgoing>
    <outgoing>SequenceFlow_0z3qle8</outgoing>
  </task>
  <task type="" id="Task_164ihwt" name="D">
    <incoming>SequenceFlow_0q9hqs6</incoming>
    <incoming>SequenceFlow_1q4991g</incoming>
    <outgoing>SequenceFlow_17k5zfg</outgoing>
  </task>
  <task type="goal" id="Task_032o8jx" name="C">
    <incoming>SequenceFlow_15fmkbq</incoming>
    <incoming>SequenceFlow_0vg7sax</incoming>
    <outgoing>SequenceFlow_0q9hqs6</outgoing>
    <outgoing>SequenceFlow_1tszkq8</outgoing>
  </task>
  <task type="goal" id="Task_0fsibap" name="H">
    <incoming>SequenceFlow_0f4gpf1</incoming>
  </task>
</process>

我需要从起始节点找到目标节点:

他们从一开始就有一条路径,这条路径中没有目标节点。

sample1sample2查询进程表的结果是这样的:

+-------+----------+-------------+
|  ID   | nodeName |    nodeID   |
+-------+----------+-------------+
|   1   |    B     |Task_00ijt4n |
+-------+----------+-------------+
|   1   |    E     |Task_1d4ykor |
+-------+----------+-------------+
|   2   |    B     |Task_0olxqpp |
+-------+----------+-------------+
|   2   |    D     |Task_0zjgfkf |
+-------+----------+-------------+

如果有人能解释这个查询的解决方案,那将非常有帮助。

谢谢。

【问题讨论】:

  • 只是为了弄清楚这一点:除了你的related question,你还有一个Event 元素作为开始,除了task 元素之外别无他物。在您的最后一个问题中,Flow 元素在哪里?而且 - 另一个非常重要的区别 - 元素的 id 不在任何地方使用。完整的流程由incomingoutgoing 元素组成。这是正确的吗?
  • @Shnugo 我从 XML 中删除了一个 Event 元素,Flows(我们不需要它们),我在结果中添加了 id 列,最后是的,这是正确的。
  • 这真的很难解决...这是一种路线规划旅行推销员困境。有死胡同和圈子……目前我没有足够的时间深入研究它,我怀疑 T-SQL 是最好的工具。我稍后再回来......也许其他人有个好主意
  • 无论如何,我认为这是 WHILE 循环或 CURSOR 可能是正确选择的罕见时刻之一......
  • @Shnugo 我认为这是一种广度优先遍历。旅行推销员是 NP-Hard 问题,每个节点都可以启动,但在这种情况下,我有一个启动节点。

标签: sql sql-server xml traversal


【解决方案1】:

我很想用基于集合的递归方法来解决这个问题,因为我不喜欢 T-SQL 中的过程编码。希望你喜欢它:-)

DECLARE @process TABLE(ID INT IDENTITY, workflowXML XML);
INSERT INTO @process(workflowXML) VALUES
('<process>
    <Event type="start" id="StartEvent_1" name="Start">
      <outgoing>SequenceFlow_0z7u86p</outgoing>
      <outgoing>SequenceFlow_1onkt3z</outgoing>
    </Event>
    <task type="" id="Task_0a7vu1x" name="D">
      <incoming>SequenceFlow_108ajnm</incoming>
      <incoming>SequenceFlow_1onkt3z</incoming>
      <outgoing>SequenceFlow_01clcmz</outgoing>
    </task>
    <task type="goal" id="Task_00ijt4n" name="B">
      <incoming>SequenceFlow_17q1ecq</incoming>
      <incoming>SequenceFlow_0q9j3et</incoming>
      <outgoing>SequenceFlow_1ygvv8b</outgoing>
      <outgoing>SequenceFlow_02glv1g</outgoing>
    </task>
    <task type="" id="Task_1rnuz4y" name="A">
      <incoming>SequenceFlow_1ygvv8b</incoming>
      <incoming>SequenceFlow_0z7u86p</incoming>
      <outgoing>SequenceFlow_108ajnm</outgoing>
      <outgoing>SequenceFlow_17q1ecq</outgoing>
      <outgoing>SequenceFlow_075iuj9</outgoing>
    </task>                
    <task type="goal" id="Task_1d4ykor" name="E">
      <incoming>SequenceFlow_01clcmz</incoming>
      <incoming>SequenceFlow_075iuj9</incoming>
      <incoming>SequenceFlow_1djp3tu</incoming>
      <outgoing>SequenceFlow_0q9j3et</outgoing>
    </task>        
    <task type="goal" id="Task_1sembw4" name="C">
      <incoming>SequenceFlow_02glv1g</incoming>
      <outgoing>SequenceFlow_1djp3tu</outgoing>
    </task>    
</process>')
,('<process id="Process_1" isExecutable="false">
    <Event type="start" id="StartEvent_0bivq0x" name="Start">
      <outgoing>SequenceFlow_0q5ik20</outgoing>
      <outgoing>SequenceFlow_147xk2x</outgoing>
    </Event>
    <task type="" id="Task_141buye" name="A">
      <incoming>SequenceFlow_0q5ik20</incoming>
      <incoming>SequenceFlow_0wg37hn</incoming>
      <outgoing>SequenceFlow_1pvpyhe</outgoing>
      <outgoing>SequenceFlow_10is4pe</outgoing>
    </task>
    <task type="" id="Task_1n3p00i" name="C">
      <incoming>SequenceFlow_147xk2x</incoming>
      <incoming>SequenceFlow_10is4pe</incoming>
      <outgoing>SequenceFlow_18ks1jr</outgoing>
      <outgoing>SequenceFlow_08gxini</outgoing>
    </task>
    <task type="goal" id="Task_0olxqpp" name="B">
      <incoming>SequenceFlow_1pvpyhe</incoming>
      <outgoing>SequenceFlow_03eekq0</outgoing>
    </task>
    <task type="goal" id="Task_0zjgfkf" name="D">
      <incoming>SequenceFlow_18ks1jr</incoming>
      <incoming>SequenceFlow_03eekq0</incoming>
      <outgoing>SequenceFlow_0wg37hn</outgoing>
    </task>
    <task type="" id="Task_1q71efy" name="E">
      <incoming>SequenceFlow_08gxini</incoming>
    </task>
</process>');

--查询

WITH DerivedTable AS
(
    SELECT prTbl.ID AS tblID
          ,nd.value('local-name(.)','nvarchar(max)') AS NodeName
          ,nd.value('@type','nvarchar(max)') AS [Type]
          ,nd.value('@id','nvarchar(max)') AS Id
          ,nd.value('@name','nvarchar(max)') AS [Name]
          ,nd.query('.') AS Task
    FROM @process AS prTbl
    CROSS APPLY prTbl.workflowXML.nodes('process') AS A(pr)
    CROSS APPLY pr.nodes('*') AS B(nd)
)
,AllIncoming AS
(
    SELECT tblId
          ,NodeName 
          ,[Type]
          ,Id 
          ,[Name]
          ,i.value('.','nvarchar(max)') AS [Target] 
    FROM DerivedTable
    CROSS APPLY Task.nodes('task/incoming') AS A(i)
    WHERE NodeName='task'
)
,recCTE AS
(
    SELECT tblID,NodeName,[Type],Id,[Name],Task,1 AS Step,' | ' +CAST(Id AS NVARCHAR(MAX)) AS NodePath
    FROM DerivedTable 
    WHERE [Type]='start'

    UNION ALL

    SELECT nxt.tblID,nxt.NodeName,nxt.[Type],nxt.Id,nxt.[Name],nxt.Task,r.Step+1,r.NodePath + ' | ' + nxt.Id
    FROM recCTE AS r
    INNER JOIN DerivedTable AS nxt ON nxt.Id IN(SELECT x.Id 
                                                FROM AllIncoming AS x 
                                                WHERE x.[Target] IN (SELECT o.value('.','nvarchar(max)')
                                                                     FROM r.Task.nodes('*/outgoing') AS A(o)
                                                                    )
                                                )
    WHERE r.[Type]<>'goal' 
      AND r.NodePath NOT LIKE '%| ' + nxt.Id + '%' 
      AND r.Step<=10 --add an appropriate depth limit to avoid recusion-depth error
)
SELECT t.tblID 
      ,t.[Name] 
      ,t.NodePath
      ,t.Step
      ,t.Id
FROM recCTE AS t
WHERE t.[Type]='goal'
  AND t.Step<=ISNULL((SELECT MIN(x.Step) FROM recCTE AS x WHERE x.tblID=t.tblID AND x.[Type]='goal' AND x.NodeName='task'),999)
ORDER BY t.tblID,t.Step

结果

tblID   Name    NodePath                                          Step  Id
  1     B      | StartEvent_1 | Task_1rnuz4y | Task_00ijt4n        3    Task_00ijt4n
  1     E      | StartEvent_1 | Task_1rnuz4y | Task_1d4ykor        3    Task_1d4ykor
  1     E      | StartEvent_1 | Task_0a7vu1x | Task_1d4ykor        3    Task_1d4ykor
  2     D      | StartEvent_0bivq0x | Task_1n3p00i | Task_0zjgfkf  3    Task_0zjgfkf
  2     B      | StartEvent_0bivq0x | Task_141buye | Task_0olxqpp  3    Task_0olxqpp

您会发现 tblID=1 的结果不止两个,因为通向同一目标节点的路径不同。

更新你的例子 3

我的第一次尝试找到了通往目标的最短路径。任何通过更长路径达到的目标都会被过滤掉。这很容易改变:

让最终的WHERE通过添加Id找到到特定节点的最短路径:

WHERE t.[Type]='goal'
  AND t.Step<=ISNULL((SELECT MIN(x.Step) 
                      FROM recCTE AS x 
                      WHERE x.tblID=t.tblID
                        AND x.Id=t.Id 
                        AND x.[Type]='goal' AND x.NodeName='task'),999)

这将返回所有三个示例:

+-------+------+----------------------------------------------------------------------------+------+--------------+
| tblID | Name | NodePath                                                                   | Step | Id           |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 1     | B    | | StartEvent_1 | Task_1rnuz4y | Task_00ijt4n                               | 3    | Task_00ijt4n |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 1     | E    | | StartEvent_1 | Task_1rnuz4y | Task_1d4ykor                               | 3    | Task_1d4ykor |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 1     | E    | | StartEvent_1 | Task_0a7vu1x | Task_1d4ykor                               | 3    | Task_1d4ykor |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 2     | B    | | StartEvent_0bivq0x | Task_141buye | Task_0olxqpp                         | 3    | Task_0olxqpp |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 2     | D    | | StartEvent_0bivq0x | Task_1n3p00i | Task_0zjgfkf                         | 3    | Task_0zjgfkf |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 3     | E    | | StartEvent_1 | Task_1jixk79 | Task_0wtjftd                               | 3    | Task_0wtjftd |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 3     | C    | | StartEvent_1 | Task_1jixk79 | Task_0xwvhuo | Task_032o8jx                | 4    | Task_032o8jx |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 3     | H    | | StartEvent_1 | Task_1jixk79 | Task_0xwvhuo | Task_0fsibap                | 4    | Task_0fsibap |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 3     | G    | | StartEvent_1 | Task_1jixk79 | Task_0xwvhuo | Task_0c85e6p | Task_0qsvlob | 5    | Task_0qsvlob |
+-------+------+----------------------------------------------------------------------------+------+--------------+
| 3     | G    | | StartEvent_1 | Task_1jixk79 | Task_164ihwt | Task_0c85e6p | Task_0qsvlob | 5    | Task_0qsvlob |
+-------+------+----------------------------------------------------------------------------+------+--------------+

【讨论】:

  • 将基于集合的方法应用于图论问题的问题在于,该算法将首先将解空间分解为所有可能的组合,然后过滤掉不符合要求的组合。这适用于小集合,但在较大的图表上会耗尽资源。仍然,令人印象深刻的答案:)
  • @RemusRusanu,谢谢 :-) 我知道递归爆炸......但我有一个想法:我添加了AND r.NodePath NOT LIKE '%| ' + nxt.Id + '%' ,如果我重新访问一个节点,它就会中断 - 瞧!
  • 声明式环境中的程序性思维 :) 优化器会以一种或另一种方式惩罚你。总有一天,它会在执行树的某处改变some 执行顺序,并且bam! 过滤器在堆栈上应用得更高。我曾经在 RECEIVE 语句实现的类似问题上投入了 +2 个月的时间,但最终却被优化团队告知要滚蛋,they do not make any order guarantees :)
  • @RemusRusanu,是的,这是一个应该考虑的事情......我真的很喜欢 TREAT_AS_TABLE 关键字与 CTE 的组合。我很确定,如果您使用声明的(内存中)表变量甚至索引临时表而不是 CTE,则执行顺序不会有太大区别。从这个角度来看,CTE 非常好,但确实很危险......
  • 嗨@RemusRusanu,你的链接文章是1993年的:-)我曾经遇到过一个非常昂贵的标量操作,该操作应该在一个非常小的集合上每行只执行一次。 几个连接之后的非常大集合的行。使用表变量而不是 CTE 极大地减少了查询时间......
猜你喜欢
  • 2012-04-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多