【问题标题】:RegEx to find string between occurences of characterRegEx 在出现的字符之间查找字符串
【发布时间】:2023-02-25 04:04:08
【问题描述】:

I have pipe delimited file, something like that:

col1|col2|col3||col5|col6||||col10

(some columns might be blank as you can see above)

I want to fetch string between 5th and 6th occurrence of pipe. It would be 'col6' in this example.

How to do that with RegEx?

I wanted to put such file in Oracle db and then do this by using REGEXP_SUBSTR, but I could also do it via different tools (e.g. Notepad++), just need to know RegEx pattern.

    标签: regex oracle


    【解决方案1】:

    我不是 Oracle 专家,所以可能有更好的方法,但你应该能够使用这个表达式:

    (w*)|
    

    它匹配所有单词字符组(w* 也捕获空组)后跟管道(|,因为管道字符在正则表达式中具有特殊含义而被转义)。然后你可以简单地提取第6组。

    Working fiddle:

    select
      regexp_substr('col1|col2|col3||col5|col6||||col10', '(w*)|', 1, 6, NULL, 1)
    from dual;
    

    【讨论】:

      【解决方案2】:

      您可以使用模式 '(.*?)(||$)' 以非贪婪方式 (?) 查找任何字符 (.*),后跟管道符号 - 必须转义为 | - 或(未转义 @ 987654328@) 字符串结尾 ($)。如果您不包括行尾,那么它仍然适用于位置 6,但如果您需要它,将找不到最后一个元素,因为 col10 后面没有管道分隔符。

      然后您可以将其用作:

      select regexp_substr('col1|col2|col3||col5|col6||||col10',
        '(.*?)(||$)', 1, 6, null, 1) as col6
      from dual;
      
      COL6
      col6

      6 表示您想要第六次匹配。

      使用 CTE 稍微简化一下,您可以通过更改出现次数来查看它提取所有元素(包括空值)的内容:

      -- cte for sample data
      with your_table (str) as (
        select 'col1|col2|col3||col5|col6||||col10' from dual
      )
        -- actual query
      select
        regexp_substr(str, '(.*?)(||$)', 1, 1, null, 1) as col1,
        regexp_substr(str, '(.*?)(||$)', 1, 2, null, 1) as col2,
        regexp_substr(str, '(.*?)(||$)', 1, 3, null, 1) as col3,
        regexp_substr(str, '(.*?)(||$)', 1, 4, null, 1) as col4,
        regexp_substr(str, '(.*?)(||$)', 1, 5, null, 1) as col5,
        regexp_substr(str, '(.*?)(||$)', 1, 6, null, 1) as col6,
        regexp_substr(str, '(.*?)(||$)', 1, 7, null, 1) as col7,
        regexp_substr(str, '(.*?)(||$)', 1, 8, null, 1) as col8,
        regexp_substr(str, '(.*?)(||$)', 1, 9, null, 1) as col9,
        regexp_substr(str, '(.*?)(||$)', 1, 10, null, 1) as col10
      from your_table;
      
      COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10
      col1 col2 col3 null col5 col6 null null null col10

      fiddle

      这种模式也经常用于将分隔的字符串拆分成多行。

      【讨论】:

        【解决方案3】:

        如果它不必是正则表达式,我建议老式的substr + instr方法:

        SQL> with test (col) as
          2    (select 'col1|col2|col3||col5|col6||||col10' from dual)
          3  select substr(col, instr(col, '|', 1, 5) + 1,
          4                     instr(col, '|', 1, 6) - instr(col, '|', 1, 5) - 1
          5               ) result
          6  from test;
        
        RESULT
        ----------
        col6
        
        SQL>
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2014-08-15
          • 2016-01-12
          • 2021-07-02
          • 1970-01-01
          • 2015-09-16
          • 2020-06-20
          • 1970-01-01
          相关资源
          最近更新 更多