【发布时间】:2021-11-03 23:34:38
【问题描述】:
我有一个包含大量链接的 HTML 文件。
它们的格式为
http:/oldsite/showFile.asp?doc=1234&lib=lib1
我想用
http://newsite/?lib=lib1&doc=1234
(1234和lib1是可变的)
你知道怎么做吗?
谢谢 P
【问题讨论】:
标签: powershell
我有一个包含大量链接的 HTML 文件。
它们的格式为
http:/oldsite/showFile.asp?doc=1234&lib=lib1
我想用
http://newsite/?lib=lib1&doc=1234
(1234和lib1是可变的)
你知道怎么做吗?
谢谢 P
【问题讨论】:
标签: powershell
我不认为你的例子是正确的。
http:/oldsite/showFile.asp?doc=1234&lib=lib1 应该是http:/oldsite/showFile.asp?doc=1234&lib=lib1
和
http://newsite/?lib=lib1&doc=1234 应该是http://newsite?lib=lib1&doc=1234
要对这些进行替换,您可以这样做
'http:/oldsite/showFile.asp?doc=1234&lib=lib1' -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1'
返回http://newsite?lib=lib1&doc=1234
要在文件中替换这些,您可以使用:
(Get-Content -Path 'X:\TheHtmlFile.html' -Raw) -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1' |
Set-Content -Path 'X:\TheNewHtmlFile.html'
正则表达式详细信息:
http:/oldsite/showFile Match the characters “http:/oldsite/showFile” literally
\. Match the character “.” literally
asp Match the characters “asp” literally
\? Match the character “?” literally
( Match the regular expression below and capture its match into backreference number 1
doc= Match the characters “doc=” literally
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
& Match the character “&” literally
( Match the regular expression below and capture its match into backreference number 2
lib= Match the characters “lib=” literally
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
【讨论】:
读入文件,遍历每一行并将旧值替换为新值,将输出发送到新文件:
gc file.html | % { $_.Replace('oldsite...','newsite...') } | out-file new-file.html
【讨论】: