RegEx 在出现的字符之间查找字符串答案

【问题标题】：RegEx to find string between occurences of characterRegEx 在出现的字符之间查找字符串
【发布时间】：2023-02-25 04:04:08
【问题描述】：

I have pipe delimited file, something like that:

col1|col2|col3||col5|col6||||col10

(some columns might be blank as you can see above)

I want to fetch string between 5th and 6th occurrence of pipe. It would be 'col6' in this example.

How to do that with RegEx?

I wanted to put such file in Oracle db and then do this by using REGEXP_SUBSTR, but I could also do it via different tools (e.g. Notepad++), just need to know RegEx pattern.

标签： regex oracle

【解决方案1】：

我不是 Oracle 专家，所以可能有更好的方法，但你应该能够使用这个表达式：

(w*)|

它匹配所有单词字符组（w，* 也捕获空组）后跟管道（|，因为管道字符在正则表达式中具有特殊含义而被转义）。然后你可以简单地提取第6组。

Working fiddle:

select
  regexp_substr('col1|col2|col3||col5|col6||||col10', '(w*)|', 1, 6, NULL, 1)
from dual;

【讨论】：

【解决方案2】：

您可以使用模式 '(.*?)(||$)' 以非贪婪方式 (?) 查找任何字符 (.*)，后跟管道符号 - 必须转义为 | - 或（未转义 @ 987654328@) 字符串结尾 ($)。如果您不包括行尾，那么它仍然适用于位置 6，但如果您需要它，将找不到最后一个元素，因为 col10 后面没有管道分隔符。

然后您可以将其用作：

select regexp_substr('col1|col2|col3||col5|col6||||col10',
  '(.*?)(||$)', 1, 6, null, 1) as col6
from dual;

COL6
col6

6 表示您想要第六次匹配。

使用 CTE 稍微简化一下，您可以通过更改出现次数来查看它提取所有元素（包括空值）的内容：

-- cte for sample data
with your_table (str) as (
  select 'col1|col2|col3||col5|col6||||col10' from dual
)
  -- actual query
select
  regexp_substr(str, '(.*?)(||$)', 1, 1, null, 1) as col1,
  regexp_substr(str, '(.*?)(||$)', 1, 2, null, 1) as col2,
  regexp_substr(str, '(.*?)(||$)', 1, 3, null, 1) as col3,
  regexp_substr(str, '(.*?)(||$)', 1, 4, null, 1) as col4,
  regexp_substr(str, '(.*?)(||$)', 1, 5, null, 1) as col5,
  regexp_substr(str, '(.*?)(||$)', 1, 6, null, 1) as col6,
  regexp_substr(str, '(.*?)(||$)', 1, 7, null, 1) as col7,
  regexp_substr(str, '(.*?)(||$)', 1, 8, null, 1) as col8,
  regexp_substr(str, '(.*?)(||$)', 1, 9, null, 1) as col9,
  regexp_substr(str, '(.*?)(||$)', 1, 10, null, 1) as col10
from your_table;

COL1	COL2	COL3	COL4	COL5	COL6	COL7	COL8	COL9	COL10
col1	col2	col3	null	col5	col6	null	null	null	col10

fiddle

这种模式也经常用于将分隔的字符串拆分成多行。

【讨论】：

【解决方案3】：

如果它不必是正则表达式，我建议老式的substr + instr方法：

SQL> with test (col) as
  2    (select 'col1|col2|col3||col5|col6||||col10' from dual)
  3  select substr(col, instr(col, '|', 1, 5) + 1,
  4                     instr(col, '|', 1, 6) - instr(col, '|', 1, 5) - 1
  5               ) result
  6  from test;

RESULT
----------
col6

SQL>

【讨论】：