【问题标题】:c# Split string using another string as delimiter and include delimiter as part of the splitted stringc#使用另一个字符串作为分隔符拆分字符串,并将分隔符作为拆分字符串的一部分
【发布时间】:2017-11-14 14:14:50
【问题描述】:

我需要使用 c# 正则表达式拆分输入字符串。 需要知道如何在输出中包含分隔符内容,如下所示。

输入:

string content="heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";

string[] delimters = new string[] {"heading1:","heading2:","heading3:"};

预期输出:

outputArray[0] = heading1: contents with respect to heading1
outputArray[1] = heading2: heading2 contents
outputArray[2] = heading3: heading 3 related contents sample strings

我尝试了什么:

var result = content.Split(delimters,StringSplitOptions.RemoveEmptyEntries);

我得到的输出:

result [0]: " contents with respect to heading1 "
result [1]: " heading2 contents "
result [2]: " heading 3 related contents sample strings"

我在 string.split 或 Regex 中找不到要拆分为预期结果的 API。

【问题讨论】:

  • 请发布失败的正则表达式,否则问题离题,必须关闭。
  • 开始搜索以下IndexOf(), string.Split(), and string.Join() 函数.. 祝你好运
  • 这个\S+:.*?(?=\S+:|$) 可能适用于您的情况。
  • @MethodMan:谢谢。我正在阅读有关正则表达式的更多信息以找到最佳解决方案。一旦我发现我会发布答案。

标签: c# regex split


【解决方案1】:

您可以使用基于积极前瞻的解决方案:

var result = Regex.Split(content, $@"(?={string.Join("|", delimiters.Select(m => Regex.Escape(m)))})")
                  .Where(x => !string.IsNullOrEmpty(x))

C# demo

var content="heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";
var delimiters = new string[] {"heading1:","heading2:","heading3:"};
Console.WriteLine(
    string.Join("\n", 
        Regex.Split(content, $@"(?={string.Join("|", delimiters.Select(m => Regex.Escape(m)))})")
             .Where(x => !string.IsNullOrEmpty(x))
    )
);

输出:

heading1: contents with respect to heading1 
heading2: heading2 contents 
heading3: heading 3 related contents sample strings

(?={string.Join("|", delimiters.Select(m => Regex.Escape(m)))}) 将动态构造一个正则表达式,它看起来像

(?=heading1:|heading2:|heading3:)

请参阅regex demo。该模式将基本上匹配字符串中的任何位置,然后是herring1:herring2:herring3:,而不会消耗这些子字符串,因此它们将出现在输出中。

请注意,delimiters.Select(m => Regex.Escape(m)) 用于确保所有可能位于分隔符中的特殊正则表达式元字符都被正则表达式引擎转义并视为文字字符。

【讨论】:

    【解决方案2】:

    我建议匹配而不是拆分,然后我们可以订购

    private static IEnumerable<string> Solution(string source, string[] delimiters) {
      int from = 0;
      int length = 0;
    
      // Points at which we can split
      var points = delimiters
          .SelectMany(delimiter => Regex
            .Matches(source, delimiter)
            .OfType<Match>()
            .Select(match => match.Index)
            .Select(index => new {
              index = index,
              delimiter = delimiter,
            }))
          .OrderBy(item => item.index)
          .ThenBy(item => Array.IndexOf(delimiters, item.delimiter)); // tie break
    
      foreach (var point in points) {
        if (point.index >= from + length) {
          // Condition: we don't want the very first empty part
          if (from != 0 || point.index - from != 0)
            yield return source.Substring(from, point.index - from);
    
          from = point.index;
          length = point.delimiter.Length;
        }
      }
    
      yield return source.Substring(from);
    }
    

    测试:

    string content = 
      "heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";
    
    string[] delimiters = new string[] { 
      "heading1:", "heading2:", "heading3:" };
    
    Console.WriteLine(Solution(content, delimiters));
    

    结果:

    heading1: contents with respect to heading1 
    heading2: heading2 contents 
    heading3: heading 3 related contents sample strings
    

    如果我们按数字拆分(第二次测试)

    Console.WriteLine(Solution(content, new string[] {"[0-9]+"}));
    

    我们会得到

    heading
    1: contents with respect to heading
    1 heading
    2: heading
    2 contents heading
    3: heading 
    3 related contents sample strings
    

    【讨论】:

      【解决方案3】:
      string content = "heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";
      string[] delimters = new string[] { "heading1:", "heading2:", "heading3:" };
      
      var dels = string.Join("|", delimters);
      var pattern = "(" + dels + ").*?(?=" + dels + "|\\Z)";
      
      var outputArray = Regex.Matches(content, pattern);
      
      foreach (Match match in outputArray)
          Console.WriteLine(match);
      

      模式如下:

      (heading1:|heading2:|heading3:).*?(?=heading1:|heading2:|heading3:|\Z)
      

      看起来像 Wiktor Stribiżew 的答案。
      当然,我们应该使用Regex.Escape,正如他所展示的那样。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2014-06-29
        • 2015-12-28
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多