还有另一种方法:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
xmlcast(
xmlquery(
'count(string-to-codepoints(.)[. > 127])'
passing t.str
returning content)
as int) cnt_over_127
from t;
结果:
STR RES CNT_OVER_127
---------- ------ ------------
why Ä ? Ok 1
why - ? Not ok 0
why - ? Ä Ok 1
why Not ok 0
如您所见,我将xmlquery() 与string-to-codepoints xpath 函数一起使用,然后过滤掉>127 的代码点并返回它们的count()。
您也可以使用dump 或utl_raw.cast_to_raw() 函数,但它有点复杂,我有点懒得使用它们编写完整的解决方案。
但只是草稿:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
dump(str,1016) dmp,
dump(str,1015) dmp,
utl_raw.cast_to_raw(str) as_row,
regexp_count(dump(str,1016)||',', '[89a-f][0-9a-f],') xs
from t;
结果:
STR RES DMP DMP AS_ROW XS
---------- ------ ------------------------------------------------------------------- ----------------------------------------------------------------------- -------------------- --
why Ä ? Ok Typ=1 Len=8 CharacterSet=AL32UTF8: 77,68,79,20,c3,84,20,3f Typ=1 Len=8 CharacterSet=AL32UTF8: 119,104,121,32,195,132,32,63 77687920C384203F 2
why - ? Not ok Typ=1 Len=7 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f Typ=1 Len=7 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63 776879202D203F 0
why - ? Ä Ok Typ=1 Len=10 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f,20,c3,84 Typ=1 Len=10 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63,32,195,132 776879202D203F20C384 2
why Not ok Typ=1 Len=3 CharacterSet=AL32UTF8: 77,68,79 Typ=1 Len=3 CharacterSet=AL32UTF8: 119,104,121 776879 0
注意:因为那是 unicode,所以第一个字节 >127 表示它是一个多字节字符,所以它计算 'Ä' 两次 - c3,84, - 两个字节都高于 127。