如果您将非 html 与 html 混合使用,最好使用正则表达式。
这是一种进行替换的方法。
链接:
(?i)(<a)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*(['"])/mycms/~/link\.aspx\?_id=)([a-f0-9]{32})(&amp;_z=z\3(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
替换为$1$2 + key{$4} + $5
其中key{$4} 是字典中的新链接 ID 值。
https://regex101.com/r/xRf1xN/1
# https://regex101.com/r/ieEBj8/1
(?i) # Case insensitive modifier
( < a ) # (1), The a tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the ID num
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s href \s* = \s* # href attribute
( ['"] ) # (3), Quote
/mycms/~/link\.aspx\?_id= # Prefix link static text
) # (2 end)
( [a-f0-9]{32} ) # (4), hex link ID
( # (5 start), All past the ID num
&_z=z # Postfix link static text
\3 # End quote
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (5 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
媒体:
(?i)(<img)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\ssrc\s*=\s*(['"])/mycms/~/media/)([a-f0-9]{32})(\.ashx\3(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
替换为$1$2 + key{$4} + $5
其中key{$4} 是字典中的新媒体 ID 值。
https://regex101.com/r/pwyjoK/1
# https://regex101.com/r/ieEBj8/1
(?i) # Case insensitive modifier
( < img ) # (1), The img tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the ID num
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s src \s* = \s* # src attribute
( ['"] ) # (3), Quote
/mycms/~/media/ # Prefix media static text
) # (2 end)
( [a-f0-9]{32} ) # (4), hex media ID
( # (5 start), All past the ID num
\.ashx # Postfix media static text
\3 # End quote
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (5 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
如果我想 a) 在链接/src 标记中提取 ID 并 b) 替换整个 href=".." 或 src=".." 值(而不是隐藏 ID部分,这在 RegEx 中看起来如何?
为此,只需重新排列捕获组。
链接:
(?i)(<a)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s)(href\s*=\s*(['"])/mycms/~/link\.aspx\?_id=([a-f0-9]{32})&amp;_z=z\4)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
替换为$1$2href='NEWID:key{$5}'$6
其中key{$5} 是字典中的新链接 ID 值。
https://regex101.com/r/FxpJVl/1
(?i) # Case insensitive modifier
( < a ) # (1), The a tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the href attribute
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
) # (2 end)
( # (3 start), href attribute
href \s* = \s*
( ['"] ) # (4), Quote
/mycms/~/link\.aspx\?_id= # Prefix link static text
( [a-f0-9]{32} ) # (5), hex link ID
&_z=z # Postfix link static text
\4 # End quote
) # (3 end)
( # (6 start), remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (6 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
媒体:
(?i)(<img)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s)(src\s*=\s*(['"])/mycms/~/media/([a-f0-9]{32})\.ashx\4)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
替换为$1$2src='NEWID:key{$5}'$6
其中key{$5} 是字典中的新媒体 ID 值。
https://regex101.com/r/EqKYjM/1
(?i) # Case insensitive modifier
( < img ) # (1), The img tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the src attribute
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
) # (2 end)
( # (3 start), src attribute
src \s* = \s*
( ['"] ) # (4), Quote
/mycms/~/media/ # Prefix media static text
( [a-f0-9]{32} ) # (5), hex media ID
\.ashx # Postfix media static text
\4 # End quote
) # (3 end)
( # (6 start), remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (6 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>