【问题标题】:T-SQL How can I query a column with incorrect JSON?T-SQL 如何查询 JSON 不正确的列?
【发布时间】:2021-10-16 07:08:51
【问题描述】:

我被要求从一个包含一个包含 JSON 字符串的 varchar(MAX) 列的表中创建一个 VIEW。不幸的是,某些条目包含未转义的双引号。

示例(注释中无效):

{"Eligible":"true","Reason":"","Notes":"Left message for employee to "call me"","EDate":"08/16/2021"}

我无法在插入的任何位置进行更正,因此我只需要按原样处理数据即可。

所以在我看来,我需要找到一种方法来逃避那些双引号。

我正在像这样提取数据:

JSON_VALUE(JsonData, '$.Notes') as Notes

但是,我收到以下错误:

JSON text is not properly formatted. Unexpected character '"' is found at position 102.

我不能对整个字段进行简单的替换,因为这也会创建无效的 JSON。

我尝试了 JSON_MODIFY,但遇到了让 notes 字段自行替换的问题。

JSON_MODIFY(JsonData, '$.Notes', REPLACE(JSON_VALUE(JsonData, '$.Notes'), '"', '\"'))

也许我遗漏了一些明显的东西,但我不知道如何处理。有没有办法在我的查询中转义那些双引号?

【问题讨论】:

  • 当我在您的示例 Json 上运行您的代码时,我得到 Left message for employee to 作为输出。您能否提供一个返回与您看到的相同错误的示例?

标签: json tsql ssms


【解决方案1】:

所以这是非常骇人听闻的,可能有几个示例可以按原样破坏它,但如果您绝对无法修复源数据输出或简单地将错误的 JSON 标记为手动调整,这可能是您需要的路线采取并进一步充实。

根据您的示例和我提供的一些附加功能,借助维护排序顺序的自定义字符串拆分表值函数,您可以实现如下输出:

查询

declare @t table (JsonData nvarchar(max));
insert into @t values('{"Eligible":true,"Reason":"","Notes":"Left message for employee to "call me"","EDate":"08/16/2021","Test":     "999","Another Test":"Value with " character"}');

with q as
(
    select t.JsonData
          ,s.rn
          ,case when right(trim(lag(s.item,1) over (order by s.rn)),1) in('{',':',',')
                then '"'
                else ''
                end    -- Do we need a starting double quote?
           + s.item    -- Value from the split text
           + case when right(trim(lead(s.item,1) over (order by s.rn)),1) not in('}',':',',')
                         and right(trim(s.item),1) not in('{','}',':',',')
                  then '\"'
                  else ''
                  end  -- Do we need an escaped double quote?
           + case when left(trim(lead(s.item,1) over (order by s.rn)),1) in('}',':',',')
                  then '"'
                  else ''
                  end  -- Do we need an ending double quote?
           as Quoted
    from @t as t
        cross apply dbo.fn_StringSplit4k(t.JsonData,'"',null) as s  -- By splitting on " characters, we know where they all are even though they are removed, so we can add them back in as required based on the remaining text
)
,j as
(
    select JsonData
          ,string_agg(Quoted,'') within group (order by rn) as JsonFixed
    from q
    group by JsonData
)
select json_value(JsonFixed, '$.Eligible') as Eligible
      ,json_value(JsonFixed, '$.Reason') as Reason
      ,json_value(JsonFixed, '$.Notes') as Notes
      ,json_value(JsonFixed, '$.EDate') as EDate
      ,json_value(JsonFixed, '$.Test') as Test
      ,json_value(JsonFixed, '$."Another Test"') as AnotherTest
from j;

输出

Eligible Reason Notes EDate Test AnotherTest
true Left message for employee to "call me" 08/16/2021 999 Value with " character

字符串拆分器

create function [dbo].[fn_StringSplit4k]
(
     @str nvarchar(4000) = ' '              -- String to split.
    ,@delimiter as nvarchar(1) = ','        -- Delimiting value to split on.
    ,@num as int = null                     -- Which value to return.
)
returns table
as
return
                    -- Start tally table with 10 rows.
    with n(n)   as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)

                    -- Select the same number of rows as characters in @str as incremental row numbers.
                    -- Cross joins increase exponentially to a max possible 10,000 rows to cover largest @str length.
        ,t(t)   as (select top (select len(isnull(@str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)

                    -- Return the position of every value that follows the specified delimiter.
        ,s(s)   as (select 1 union all select t+1 from t where substring(isnull(@str,''),t,1) = @delimiter)

                    -- Return the start and length of every value, to use in the SUBSTRING function.
                    -- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
        ,l(s,l) as (select s,isnull(nullif(charindex(@delimiter,isnull(@str,''),s),0)-s,4000) from s)
    
    select rn
          ,item
    from(select row_number() over(order by s) as rn
               ,substring(@str,s,l) as item
         from l
        ) a
    where rn = @num
       or @num is null;

【讨论】:

  • 是的,我希望避免这样的事情,但我认为你是对的,这可能是唯一的方法。谢谢你,我会试试看!
【解决方案2】:

我想推荐一个存储过程:

CREATE FUNCTION dbo.clearJSon(@v nvarchar(max)) RETURNS nvarchar(max)
AS
BEGIN
  DECLARE @i AS int
  DECLARE @security int
  SET @i=PATINDEX('%[^{:,]"[^,:}]%',@v)
  SET @security=0 -- just to prevent an endless loop
  WHILE @i>0 and @security<100
  BEGIN
    SET @v = LEFT(@v,@i)+''''+SUBSTRING(@v,@i+2,len(@v))
    SET @i=PATINDEX('%[^{:,]"[^,:}]%',@v)
    SET @security = @security+1
  END
  RETURN @v
END

返回 {"Eligible":"true","Reason":"","Notes":"Left message for employee to 'call me'","EDate":"08/16/2021"} 作为dbo.clearJSon(JsonData) 的结果

但我不得不承认,如果未转义的引号后面跟有 ,:} 之一,或者如果它后面有 {:,

【讨论】:

  • 可能没问题。我会试一下!谢谢。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-03-29
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多