如何仅替换 TCL 正则表达式中的前 n 个匹配实例？答案

【问题标题】：How to replace only first n matching instances in TCL regexp?如何仅替换 TCL 正则表达式中的前 n 个匹配实例？
【发布时间】：2015-08-07 16:11:46
【问题描述】：

我需要将前 50 个 abc 替换为 bcd。我尝试了以下方法，但它不起作用。

set a "1 abc 2 abc 3 abc 4 abc......... 100 abc"
regsub -all "(.*?(abc).*)(50)" $a "bcd \1" b
puts $b

字符串中的数字用于演示目的。字符串可以是任意的：

set a "hh abc cc abc hh abc cc abc dd abc hh abc......... hh abc"

【问题讨论】：

@strib：不是重复的。那边的解决方案不太理想。让我们把这个问题悬而未决。
TCL regsub 不允许替换 n 次出现。只有 1 或 -ALL，这就是这里不能使用正则表达式的原因。

标签： regex tcl

【解决方案1】：

您可以使用这个使用替换函数的自定义过程：

set a "1 abc 2 abc 3 abc 4 abc......... 100 abc"

proc rangeSub {a first last string sub} {
  # This variable keeps the count of matches
  set count 0
  proc re_sub {str first last rep} {
    upvar count count
    incr count
    # If match number within wanted range, replace with rep, else return original string
    if {$count >= $first && $count <= $last} {
      return $rep
    } else {
      return $str
    }
  }

  set cmd {[re_sub "\0" $first $last $sub]}
  set b [subst [regsub -all "\\y$string\\y" $a $cmd]]

  return $b
}

# Here replacing the 1st to 3rd occurrences of abc
puts [rangeSub $a 1 3 "abc" "bcd"]
# => 1 bcd 2 bcd 3 bcd 4 abc......... 100 abc
puts [rangeSub $a 2 3 "abc" "bcd"]
# => 1 abc 2 bcd 3 bcd 4 abc......... 100 abc

将调用更改为 rangeSub $a 1 50 "abc" "bcd" 以替换前 50 个匹配项。

codepad demo

使用索引和string range的替代方法：

set a "1 abc 2 abc 3 abc 4 abc......... 100 abc"

proc rangeSub {a first last string sub} {
  set idx [regexp -all -inline -indices "\\yabc\\y" $a]
  set start [lindex $idx $first-1 0]
  set end [lindex $idx $last-1 1]
  regsub -all -- "\\yabc\\y" [string range $a $start $end] bcd result
  return [string range $a 0 $start-1]$result[string range $a $end+1 end]
}

puts [rangeSub $a 1 3 abc bcd]

【讨论】：

我们不能使用 regsub 实现吗？
@Bharathi 不，这是不可能的。 re_syntax 无法根据匹配来限制替换。

【解决方案2】：

我希望这可以通过同时使用regexp 和regsub 来完成。

%
% set count 0
0
% # Don't bother about this 'for' loop. It is just for input generation
% for { set i 65} {$i < 123} {incr i} {
        if {$count == 101} break
        if { $i >= 90 && $i <=96} {
                continue
        }
        for { set j 65 } {$j < 123} {incr j} {
                if {$count == 101} break
                if { $j >= 90 && $j <=96} {
                        continue
                }
                incr count
                append input "[format %c%c $i $j] abc "


        }
}
%
% # Following the 'input' value taken for processing
% # So, concentrate only from now on wards :D
% set input
AA abc AB abc AC abc AD abc AE abc AF abc AG abc AH abc AI abc AJ abc AK abc AL abc AM abc AN abc AO abc AP abc AQ abc AR abc AS abc AT abc AU abc AV abc AW abc AX abc AY abc Aa abc Ab abc Ac abc Ad abc Ae abc Af abc Ag abc Ah abc Ai abc Aj abc Ak abc Al abc Am abc An abc Ao abc Ap abc Aq abc Ar abc As abc At abc Au abc Av abc Aw abc Ax abc Ay abc Az abc BA abc BB abc BC abc BD abc BE abc BF abc BG abc BH abc BI abc BJ abc BK abc BL abc BM abc BN abc BO abc BP abc BQ abc BR abc BS abc BT abc BU abc BV abc BW abc BX abc BY abc Ba abc Bb abc Bc abc Bd abc Be abc Bf abc Bg abc Bh abc Bi abc Bj abc Bk abc Bl abc Bm abc Bn abc Bo abc Bp abc Bq abc Br abc Bs abc Bt abc Bu abc Bv abc Bw abc Bx abc By abc
%
% regexp "(.*?abc.*?){50}" $input match; #First matching upto '50' occurence
1
% regsub -all "((.*?)abc.*?)" $match "\\2bcd" replaceText; #Replacing the 'abc' with 'bcd'
50
% set replaceText
% regsub $match $input $replaceText output; #At last, replace this content from the main input
1
% 
% set output
AA bcd AB bcd AC bcd AD bcd AE bcd AF bcd AG bcd AH bcd AI bcd AJ bcd AK bcd AL bcd AM bcd AN bcd AO bcd AP bcd AQ bcd AR bcd AS bcd AT bcd AU bcd AV bcd AW bcd AX bcd AY bcd Aa bcd Ab bcd Ac bcd Ad bcd Ae bcd Af bcd Ag bcd Ah bcd Ai bcd Aj bcd Ak bcd Al bcd Am bcd An bcd Ao bcd Ap bcd Aq bcd Ar bcd As bcd At bcd Au bcd Av bcd Aw bcd Ax bcd Ay bcd Az abc BA abc BB abc BC abc BD abc BE abc BF abc BG abc BH abc BI abc BJ abc BK abc BL abc BM abc BN abc BO abc BP abc BQ abc BR abc BS abc BT abc BU abc BV abc BW abc BX abc BY abc Ba abc Bb abc Bc abc Bd abc Be abc Bf abc Bg abc Bh abc Bi abc Bj abc Bk abc Bl abc Bm abc Bn abc Bo abc Bp abc Bq abc Br abc Bs abc Bt abc Bu abc Bv abc Bw abc Bx abc By abc

注意：我注意到您使用\1 代表第一个捕获组。但是，您在双引号内使用它是错误的。如果你在大括号内使用，应该没问题，但是在双引号内使用时，反斜杠应该像\\1一样转义

【讨论】：