【问题标题】:FileTable Delimited String SplitFileTable 分隔字符串拆分
【发布时间】:2017-12-08 03:07:35
【问题描述】:

编辑 - 为了清楚起见,我会将目标句放在顶部。考试 我的问题是是否有办法获得相同的性能 作为临时表而不使用临时表。

我觉得这应该是一个简单的问题,但我被困住了。我正在 SQL2014 中试验 FileTables。我知道一些可行的替代方案,但目标是确定从文件表中提取文本子字符串的可行性。

此测试有 35,000 个文本文件,其中一行文本如下,每个文件平均有 100 个字节的非 unicode 文本。

Aaa|Bbb|Ccc|Ddd|Eee|Fff|Ggg

所需的输出是每个文件的一行,并将分隔字符串分成七列。

我找到了一个快速的字符串解析器功能,但与 varchar 列相比,在文件流上运行对性能有显着影响。

此查询需要 18 秒才能运行。我试图让从文件流到 varchar 的转换只执行一次,但我认为调用 UDF 可能会导致它发生在每一行(文件)。

Create View vAddresses As
Select file_type, Convert(Varchar(8000),file_stream) TextData /* Into #Temp */ From InputFiles Where file_type = 'adr'
Go
Select  --TextData,
        dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
        dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
        dbo.udf_StringSplit(TextData, 7, '|')--, TextData
    From vAddresses

我已经尝试将它作为视图、cte 和子查询。唯一似乎有帮助的是创建一个临时表。创建临时表需要 1 秒,查询需要 1 秒。因此,对于 35k 行,总查询时间为 2 秒,而总查询时间为 18 秒。

Drop Table #Temp
(Select file_type, Convert(Varchar(8000),file_stream) TextData Into #Temp From HumanaInputFiles Where file_type = 'adr')
Select  --TextData,
        dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
        dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
        dbo.udf_StringSplit(TextData, 7, '|')--, TextData
    From #Temp

我已经阅读了很多关于文件表和临时表与单查询性能主题的文章和博客,但我似乎无法弄清楚。它可能与 sargable 或统计有关?非常感谢任何建议。

这是 UDF,我在 MSDN 博客/论坛上找到了它,它是迄今为止我找到的表现最好的。

ALTER FUNCTION [dbo].[udf_StringSplit](
 @TEXT      varchar(8000)
,@COLUMN    tinyint
,@SEPARATOR char(1)
)RETURNS varchar(8000)
AS
  BEGIN
       DECLARE @POS_START  int = 1
       DECLARE @POS_END    int = CHARINDEX(@SEPARATOR, @TEXT, @POS_START)

       WHILE (@COLUMN >1 AND @POS_END> 0)
         BEGIN
             SET @POS_START = @POS_END + 1
             SET @POS_END = CHARINDEX(@SEPARATOR, @TEXT, @POS_START)
             SET @COLUMN = @COLUMN - 1
         END 

       IF @COLUMN > 1  SET @POS_START = LEN(@TEXT) + 1
       IF @POS_END = 0 SET @POS_END = LEN(@TEXT) + 1 

       RETURN SUBSTRING (@TEXT, @POS_START, @POS_END - @POS_START)
  END

这是临时表的执行计划。

<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.4100.1" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="17486" StatementId="1" StatementOptmLevel="TRIVIAL" CardinalityEstimationModelVersion="120" StatementSubTreeCost="0.166487" StatementText="Select --TextData,&#xD;&#xA;       dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),&#xD;&#xA;      dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),&#xD;&#xA;      dbo.udf_StringSplit(TextData, 7, '|')--, TextData&#xD;&#xA; From #Temp" StatementType="SELECT" QueryHash="0xC4D6F0215D332F3D" QueryPlanHash="0xC50CFAF9494B5DBE" RetrievedFromCache="true">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="CouldNotGenerateValidParallelPlan" CachedPlanSize="24" CompileTime="1" CompileCPU="1" CompileMemory="168">
            <MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="838735" EstimatedPagesCached="419367" EstimatedAvailableDegreeOfParallelism="4" />
            <RelOp AvgRowSize="28023" EstimateCPU="0.0017486" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.166487">
              <OutputList>
                <ColumnReference Column="Expr1003" />
                <ColumnReference Column="Expr1004" />
                <ColumnReference Column="Expr1005" />
                <ColumnReference Column="Expr1006" />
                <ColumnReference Column="Expr1007" />
                <ColumnReference Column="Expr1008" />
                <ColumnReference Column="Expr1009" />
              </OutputList>
              <RunTimeInformation>
                <RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
              </RunTimeInformation>
              <ComputeScalar>
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Column="Expr1003" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(1),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(1)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1004" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(2),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(2)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1005" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(3),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(3)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1006" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(4),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(4)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1007" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(5),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(5)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1008" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(6),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(6)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1009" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(7),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(7)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                </DefinedValues>
                <RelOp AvgRowSize="4011" EstimateCPU="0.0193131" EstimateIO="0.145426" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Table Scan" NodeId="1" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.164739" TableCardinality="17486">
                  <OutputList>
                    <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                  </OutputList>
                  <RunTimeInformation>
                    <RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
                  </RunTimeInformation>
                  <TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                    <DefinedValues>
                      <DefinedValue>
                        <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
                      </DefinedValue>
                    </DefinedValues>
                    <Object Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" IndexKind="Heap" Storage="RowStore" />
                  </TableScan>
                </RelOp>
              </ComputeScalar>
            </RelOp>
          </QueryPlan>
        </StmtSimple>
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>

这是视图的计划。

<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.4100.1" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="17486" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="120" StatementSubTreeCost="0.905265" StatementText="Select    --TextData,&#xD;&#xA;       dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),&#xD;&#xA;      dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),&#xD;&#xA;      dbo.udf_StringSplit(TextData, 7, '|')--, TextData&#xD;&#xA; From vAddresses" StatementType="SELECT" QueryHash="0xB4F8A0B288802C4E" QueryPlanHash="0x28DA02D774B1AF53" RetrievedFromCache="true">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="CouldNotGenerateValidParallelPlan" CachedPlanSize="32" CompileTime="3" CompileCPU="3" CompileMemory="520">
            <Warnings>
              <PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT(varchar(8000),[DmProd01].[dbo].[HumanaInputFiles].[file_stream],0)" />
            </Warnings>
            <MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="838735" EstimatedPagesCached="419367" EstimatedAvailableDegreeOfParallelism="4" />
            <RelOp AvgRowSize="28023" EstimateCPU="0.0017486" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.905265">
              <OutputList>
                <ColumnReference Column="Expr1004" />
                <ColumnReference Column="Expr1005" />
                <ColumnReference Column="Expr1006" />
                <ColumnReference Column="Expr1007" />
                <ColumnReference Column="Expr1008" />
                <ColumnReference Column="Expr1009" />
                <ColumnReference Column="Expr1010" />
              </OutputList>
              <RunTimeInformation>
                <RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
              </RunTimeInformation>
              <ComputeScalar>
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Column="Expr1004" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(1),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1011" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(1)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1005" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(2),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1011" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(2)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1006" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(3),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1011" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(3)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1007" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(4),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1011" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(4)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1008" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(5),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1011" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(5)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1009" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(6),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1011" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(6)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1010" />
                    <ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(7),'|')">
                      <UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1011" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(7)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="'|'" />
                        </ScalarOperator>
                      </UserDefinedFunction>
                    </ScalarOperator>
                  </DefinedValue>
                </DefinedValues>
                <RelOp AvgRowSize="4019" EstimateCPU="0.0034972" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="1" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.88673">
                  <OutputList>
                    <ColumnReference Column="Expr1011" />
                  </OutputList>
                  <ComputeScalar>
                    <DefinedValues>
                      <DefinedValue>
                        <ColumnReference Column="Expr1011" />
                        <ScalarOperator ScalarString="CONVERT(varchar(8000),[DmProd01].[dbo].[HumanaInputFiles].[file_stream],0)">
                          <Convert DataType="varchar" Length="8000" Style="0" Implicit="false">
                            <ScalarOperator>
                              <Identifier>
                                <ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
                              </Identifier>
                            </ScalarOperator>
                          </Convert>
                        </ScalarOperator>
                      </DefinedValue>
                    </DefinedValues>
                    <RelOp AvgRowSize="4043" EstimateCPU="0.0386262" EstimateIO="0.844606" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.883233" TableCardinality="34972">
                      <OutputList>
                        <ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
                      </OutputList>
                      <RunTimeInformation>
                        <RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
                      </RunTimeInformation>
                      <TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                        <DefinedValues>
                          <DefinedValue>
                            <ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
                          </DefinedValue>
                        </DefinedValues>
                        <Object Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" IndexKind="Heap" Storage="RowStore" />
                        <Predicate>
                          <ScalarOperator ScalarString="[DmProd01].[dbo].[HumanaInputFiles].[file_type]=N'adr'">
                            <Compare CompareOp="EQ">
                              <ScalarOperator>
                                <Identifier>
                                  <ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_type" ComputedColumn="true" />
                                </Identifier>
                              </ScalarOperator>
                              <ScalarOperator>
                                <Const ConstValue="N'adr'" />
                              </ScalarOperator>
                            </Compare>
                          </ScalarOperator>
                        </Predicate>
                      </TableScan>
                    </RelOp>
                  </ComputeScalar>
                </RelOp>
              </ComputeScalar>
            </RelOp>
          </QueryPlan>
        </StmtSimple>
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>

编辑:对此进行了不同的搜索,发现答案是使用 top 和 order by。这使它缩短到 4 秒。似乎有点做作,仍然没有解释查看查询计划如何帮助解决这个问题,所以我自己不会回答这个问题,而是让它保持打开状态。

【问题讨论】:

  • 尝试使用像sqlperformance.com/2016/03/sql-server-2016/string-split这样的表函数来代替多语句标量UDF
  • 感谢您的建议,但我尝试了这些建议以及其他一些推荐该方法的博客/帖子。甚至有一个使用 delimitedsplit8k 和 pivot 。使用此测试数据进行的表值测试均未在 40 秒内完成。
  • Here 是几个字符串拆分函数的列表。最好的非本地执行者 CLR 使用 nvarchars,这意味着您可能需要修改代码。
  • 最后,当这些文本文件包含分隔数据时,您想为大约 100 字节的文本文件使用 FILESTREAM 的想法只是对我尖叫:“您正在设计您的表以工作反对关系模型。停止打破第一范式并期望数据库合理执行。不要偷懒。解析数据并正确存储。如果您需要原始文件,那很好,但这并不意味着您应该被解析那个FILESTREAM。”
  • 您可以模拟持久列 - 创建另一个表并使用触发器填充它

标签: sql-server tsql user-defined-functions filetable


【解决方案1】:

MSDN 中有建议不要在您的情况下使用 FileTable:

FileTable 功能建立在 SQL Server FILESTREAM 之上 技术。

对于小对象,FileStream 的性能不好。 Filestream 旨在处理大约 1MB 或更大的文件,但您只有 100 个字节(https://docs.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server):

何时使用 FILESTREAM 在 SQL Server 中,BLOB 可以是标准的 将数据存储在表中的 varbinary(max) 数据,或 FILESTREAM 将数据存储在文件系统中的 varbinary(max) 对象。 该 数据的大小和用途决定了你是否应该使用数据库 存储或文件系统存储。如果满足以下条件, 您应该考虑使用 FILESTREAM:

  • 正在存储的对象平均大于 1 MB
  • 快速读取访问很重要。
  • 您正在开发将中间层用于应用程序逻辑的应用程序。

对于较小的对象,将 varbinary(max) BLOB 存储在 数据库通常提供更好的流媒体性能。

您可以模拟持久列 - 创建另一个表并使用触发器填充它。在这种情况下,您可以从两种情况中获得奖金

附:您可以使用内联 TVF + union all 来实现 @TT 的评论

【讨论】:

  • 感谢您提供的信息。但是,我已经阅读了这一点,这种情况的优势使其成为理想的解决方案。使用临时表可提供出色的性能。然而,测试和我的问题是是否有办法在没有临时表的情况下获得相同的性能。
  • ps 我不明白如何将 union all 添加到组合中将有助于提高性能。如果您可以发布一个示例,我很想尝试一下。
  • @JoeC union all 并没有提高性能,我写它是因为我认为你对内联函数的 unpivot 数据有问题。
【解决方案2】:

编辑 - 更快的方法

比起调用 Parse/Split 函数 7 次,也许这个 TVF 可能更有效。以下将在 0.773 秒内处理 35,000 条唯一记录

示例

-- Create Some Sample/UNIQUE Data
Select N,TextData =concat(N,TextData )
Into #Temp
From  (values ('Aaa|Bbb|Ccc|Ddd|Eee|Fff|Ggg') ) A (TextData )
Cross Apply (Select Top 35000 N=Row_Number() Over (Order By (Select NULL)) From master..spt_values n1,master..spt_values n2) B

Select B.*
 From  #Temp A
 Cross Apply (
                Select Pos1=max(case when RetSeq=1 then RetVal end)
                      ,Pos2=max(case when RetSeq=2 then RetVal end)
                      ,Pos3=max(case when RetSeq=3 then RetVal end)
                      ,Pos4=max(case when RetSeq=4 then RetVal end)
                      ,Pos5=max(case when RetSeq=5 then RetVal end)
                      ,Pos6=max(case when RetSeq=6 then RetVal end)
                      ,Pos7=max(case when RetSeq=7 then RetVal end)
                 From [dbo].[udf-Str-Parse-8K](A.TextData,'|') B1
             ) B

有兴趣的 UDF

CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(25))
Returns Table 
As
Return (  
    with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
           cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
           cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)

    Select RetSeq = Row_Number() over (Order By A.N)
          ,RetVal = LTrim(RTrim(Substring(@String, A.N, A.L)))
    From   cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/

只是为了帮助可视化

作为独立的 TVF 包括可应用于条件聚合的项目序列。

Select * from [dbo].[udf-Str-Parse-8K]('Aaa|Bbb|Ccc|Ddd|Eee|Fff|Ggg','|')

返回

RetSeq  RetVal
1       Aaa
2       Bbb
3       Ccc
4       Ddd
5       Eee
6       Fff
7       Ggg

【讨论】:

  • 感谢您提供完善的答案。但是查询运行了 13 分钟,所以我停止了它。我尝试了许多通常执行得很好的拆分函数。正如我在我的问题中所说,我发现的最好的在 18 秒内完成。目标是找到一种接近临时表 2 秒性能的方法。
  • @JoeC 无法想象为什么您会目睹如此糟糕的表现。 7 次调用循环函数与 1 次计数解析?只是为了好玩,我解析了 91,861 个分支地址(使用管道格式的字符串),这花了 4.031 秒。我会多加一点。
  • @JoeC 只是好奇...以下如何执行... Select min(TextData),max(TextData),count(*) from vAddresses
  • 我认为性能缓慢的原因是它每次调用函数时都在检索文件流。使用临时表进行一次转换为 varchar 的性能要好得多。我无法从执行计划中判断这是否确实发生了。您发布的聚合查询需要 5 秒才能运行。再次感谢您的帮助。
  • @JoeC 令人着迷。如果聚合查询花费了 5 秒,则提供的解析应该只花费了 5 秒的时间。现在,我真的很好奇。今天晚些时候,我会看看我是否可以重新创建您的结构和结果。
【解决方案3】:

[1] 我已经多次使用以下解决方案,即 (1.1) 将源字符串转换为 XML,然后 (1.2) 每次使用 value 方法从 XMl 中提取:

USE tempdb
GO
IF OBJECT_ID('dbo.SourceTable') IS NOT NULL
    DROP TABLE dbo.SourceTable
GO
CREATE TABLE dbo.SourceTable (
    ID      INT IDENTITY PRIMARY KEY,
    Col1    VARCHAR(100) NOT NULL
);
INSERT  dbo.SourceTable (Col1) VALUES ('Aaa|Bbb|Ccc|Ddd|Eee|Fff|Ggg')
INSERT  dbo.SourceTable (Col1) VALUES ('hhh|iii|JJJ|kkk')

SELECT  b.ID, c.XmlCol.value('.', 'VARCHAR(100)') AS ItemVal--, ROW_NUMBER() OVER(PARTITION BY b.ID ORDER BY c.XmlCol) AS RowNum
FROM (
    SELECT  a.ID, CONVERT(XML, '<root><i>' + REPLACE(a.Col1, '|', '</i><i>') + '</i></root>') AS Col1AsXML
    FROM    dbo.SourceTable a
) b OUTER APPLY b.Col1AsXML.nodes('root/i') c(XmlCol)
--OPTION(FORCE ORDER)

[2] 如果您提供 DDL 和 DML 语句,我会进行一些性能测试。

[3]如果您能提供以下详细信息会有所帮助

  1. 每个字符串有最大项目数吗?
  2. 如果所有项目的长度相同(例如 3 个字符)?

【讨论】:

    【解决方案4】:

    我玩过这个,最快的解决方案,除了编写 CLR 程序集(不是非常快)似乎是以下某种类型的变体:

    CREATE FUNCTION [UDF_Split] (
        @InputStr NVARCHAR(Max),
        @Delimiter NVARCHAR(255)
    )
    RETURNS TABLE
    AS RETURN(
    
       WITH lv0 AS (SELECT 0 g UNION ALL SELECT 0)
        ,lv1 AS (SELECT 0 g FROM lv0 a CROSS JOIN lv0 b) -- 4
        ,lv2 AS (SELECT 0 g FROM lv1 a CROSS JOIN lv1 b) -- 16
        ,lv3 AS (SELECT 0 g FROM lv2 a CROSS JOIN lv2 b) -- 256
        ,lv4 AS (SELECT 0 g FROM lv3 a CROSS JOIN lv3 b) -- 65,536
        ,lv5 AS (SELECT 0 g FROM lv4 a CROSS JOIN lv4 b) -- 4,294,967,296
        ,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM lv5)
    
     SELECT SUBSTRING(@InputStr, N, CHARINDEX(@Delimiter, @InputStr + @Delimiter, N) - N) AS TextLine
     FROM   Tally
     WHERE  N BETWEEN 1 AND LEN(@InputStr) + LEN(@InputStr)
            AND SUBSTRING(@Delimiter + @InputStr, N, LEN(@Delimiter)) = @Delimiter);
    

    从 MS SQL 2016 开始,我们得到了自己的拆分函数: Sql Server 2016 STRING_SPLIT ( string , separator )

    【讨论】:

      【解决方案5】:

      由于您有固定/预定数量的列要拆分,因此您实际上根本不需要使用字符串拆分功能。

      在相当乏力的开发服务器上,以下内容能够在大约 2 秒内浏览 100K 行。

      注意...此解决方案正在合并 Adam Machanic's MakeParallel function 以强制执行并行执行计划。

      IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL 
      DROP TABLE #TestData;
      
      WITH 
          cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), 
          cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
          cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
          cte_Tally (n) AS (
              SELECT TOP 100000
                  ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
              FROM
                  cte_n3 a CROSS JOIN cte_n3 b
              )
      SELECT 
          ID = ISNULL(CAST(t.n AS INT), 0),
          FilePath = CAST(fp.FilePath AS VARCHAR(1000))
          INTO #TestData
      FROM
          cte_Tally t
          CROSS APPLY ( VALUES (CONCAT(
                                      ABS(CHECKSUM(NEWID())) % 9999 + 1000, '|',
                                      ABS(CHECKSUM(NEWID())) % 9999 + 1000, '|',
                                      ABS(CHECKSUM(NEWID())) % 9999 + 1000, '|',
                                      ABS(CHECKSUM(NEWID())) % 9999 + 1000, '|',
                                      ABS(CHECKSUM(NEWID())) % 9999 + 1000, '|',
                                      ABS(CHECKSUM(NEWID())) % 9999 + 1000, '|',
                                      ABS(CHECKSUM(NEWID())) % 9999 + 1000
                                      )
                                  ) ) fp (FilePath);
      
      ALTER TABLE #TestData ADD PRIMARY KEY CLUSTERED (ID);
      
      --=============================================================================
      DECLARE     -- Dump values into variables to eliminate display rendering from execution time.
          @id INT,
          @Col_1 VARCHAR(5),
          @Col_2 VARCHAR(5),
          @Col_3 VARCHAR(5),
          @Col_4 VARCHAR(5),
          @Col_5 VARCHAR(5),
          @Col_6 VARCHAR(5),
          @Col_7 VARCHAR(5);
      
      SELECT 
          @ID = td.ID,
          @Col_1 = SUBSTRING(td.FilePath, 1, ABS(d1.DelimLocation - 1)),
          @Col_2 = SUBSTRING(td.FilePath, d1.DelimLocation + 1, ABS(d2.DelimLocation - d1.DelimLocation - 1)),
          @Col_3 = SUBSTRING(td.FilePath, d2.DelimLocation + 1, ABS(d3.DelimLocation - d2.DelimLocation - 1)),
          @Col_4 = SUBSTRING(td.FilePath, d3.DelimLocation + 1, ABS(d4.DelimLocation - d3.DelimLocation - 1)),
          @Col_5 = SUBSTRING(td.FilePath, d4.DelimLocation + 1, ABS(d5.DelimLocation - d4.DelimLocation - 1)),
          @Col_6 = SUBSTRING(td.FilePath, d5.DelimLocation + 1, ABS(d6.DelimLocation - d5.DelimLocation - 1)),
          @Col_7 = SUBSTRING(td.FilePath, d6.DelimLocation + 1, 1000)
      FROM
          #TestData td
          CROSS APPLY ( VALUES (LEN(td.FilePath) - LEN(REPLACE(td.FilePath, '|', ''))) ) dc (DelimiterCount)
          CROSS APPLY ( VALUES (IIF(dc.DelimiterCount < 1, 1000,  CHARINDEX('|', td.FilePath, 1))) ) d1 (DelimLocation)
          CROSS APPLY ( VALUES (IIF(dc.DelimiterCount < 2, 1000,  CHARINDEX('|', td.FilePath, d1.DelimLocation + 1))) ) d2 (DelimLocation)
          CROSS APPLY ( VALUES (IIF(dc.DelimiterCount < 3, 1000,  CHARINDEX('|', td.FilePath, d2.DelimLocation + 1))) ) d3 (DelimLocation)
          CROSS APPLY ( VALUES (IIF(dc.DelimiterCount < 4, 1000,  CHARINDEX('|', td.FilePath, d3.DelimLocation + 1))) ) d4 (DelimLocation)
          CROSS APPLY ( VALUES (IIF(dc.DelimiterCount < 5, 1000,  CHARINDEX('|', td.FilePath, d4.DelimLocation + 1))) ) d5 (DelimLocation)
          CROSS APPLY ( VALUES (IIF(dc.DelimiterCount < 6, 1000,  CHARINDEX('|', td.FilePath, d5.DelimLocation + 1))) ) d6 (DelimLocation)
          CROSS APPLY dbo.MakeParallel() mp;  -- Forces a parallel execution plan.
                                              -- http://dataeducation.com/next-level-parallel-plan-forcing-an-alternative-to-8649/
      

      当然,MakeParallel 函数也可以与 Splitter 函数结合使用。在这种情况下,使用Jeff Moden's DelimitedSplit8K function

      --=============================================================================
      DECLARE     -- Dump values into variables to eliminate display rendering from execution time.
          @id INT,
          @Col_1 VARCHAR(5),
          @Col_2 VARCHAR(5),
          @Col_3 VARCHAR(5),
          @Col_4 VARCHAR(5),
          @Col_5 VARCHAR(5),
          @Col_6 VARCHAR(5),
          @Col_7 VARCHAR(5);
      
      SELECT 
          @ID = td.ID,
          @Col_1 = MAX(CASE WHEN sc.ItemNumber = 1 THEN sc.Item END),
          @Col_2 = MAX(CASE WHEN sc.ItemNumber = 2 THEN sc.Item END),
          @Col_3 = MAX(CASE WHEN sc.ItemNumber = 3 THEN sc.Item END),
          @Col_4 = MAX(CASE WHEN sc.ItemNumber = 4 THEN sc.Item END),
          @Col_5 = MAX(CASE WHEN sc.ItemNumber = 5 THEN sc.Item END),
          @Col_6 = MAX(CASE WHEN sc.ItemNumber = 6 THEN sc.Item END),
          @Col_7 = MAX(CASE WHEN sc.ItemNumber = 7 THEN sc.Item END)
      FROM
          #TestData td
          CROSS APPLY dbo.DelimitedSplit8K(td.FilePath, '|') sc
          CROSS APPLY dbo.MakeParallel() mp       -- Forces a parallel execution plan.
                                                  -- http://dataeducation.com/next-level-parallel-plan-forcing-an-alternative-to-8649/
      GROUP BY
          td.ID;
      

      HTH, 杰森

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2014-06-29
        • 2019-09-19
        • 1970-01-01
        • 2018-01-06
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多