【发布时间】:2016-07-27 09:45:37
【问题描述】:
这是我的 HTML 代码:
<ul class="hide menuSearchType">
<li><a href="../../dynamic/city_select.aspx">Search by city</a></li>
<li><a href="../../searchbyphone.aspx">Search by phone</a></li>
<li><a href="../searchbyaddress.aspx">Search by address</a></li>
<li><a href="../searchbybrand.aspx">Search by brand</a></li>
<li><a href="/advertisement-center/">Advertise with us</a></li>
<li><a href="/advertisement-center/">Advertise with us</a></li>
<li><a href="//fonts.googleapis.com/css?family=Open+Sans">Find a Person</a></li>
<li><a href="//fonts.googleapis.com/css?family=Open+Sans">Find a Person</a></li>
<li><a href="dynamic/city_select.aspx">Search by city</a></li>
<li><a href="searchbybrand.aspx">Search by brand</a></li>
</ul>
这是我的 Python 代码:
import re, os
from urllib.parse import urlparse
url = "http://www.phonebook.com.pk/dynamic/search.aspx?searchtype=cat&class_id=2566"
path = urlparse(url)
lpath = os.path.dirname(path.path)
html = u"<ul class=\"hide menuSearchType\">\n <li><a href=\"../../dynamic/city_select.aspx\">Search by city</a></li>\n <li><a href=\"../../searchbyphone.aspx\">Search by phone</a></li>\n <li><a href=\"../searchbyaddress.aspx\">Search by address</a></li>\n <li><a href=\"../searchbybrand.aspx\">Search by brand</a></li>\n <li><a href=\"/advertisement-center/\">Advertise with us</a></li>\n <li><a href=\"/advertisement-center/\">Advertise with us</a></li>\n <li><a href=\"//fonts.googleapis.com/css?family=Open+Sans\">Find a Person</a></li>\n <li><a href=\"//fonts.googleapis.com/css?family=Open+Sans\">Find a Person</a></li>\n <li><a href=\"dynamic/city_select.aspx\">Search by city</a></li>\n <li><a href=\"searchbybrand.aspx\">Search by brand</a></li>\n</ul>"
linkList1 = re.findall(re.compile(u'(?<=href=")../.*?(?=")'), str(html))
for link1 in linkList:
html = re.sub(link1, path.scheme + "://" + os.path.normpath(path.netloc + os.path.abspath(lpath + "/" + link1)), str(html))
print (html)
问题是它按预期检测到带有“../”的链接,但“../../”也发生了变化,有什么办法可以限制我的正则表达式只选择带有单个“../”的网址“?
预期输出:
<ul class="hide menuSearchType">
<li><a href="../../dynamic/city_select.aspx">Search by city</a></li>
<li><a href="../../searchbyphone.aspx">Search by phone</a></li>
<li><a href="http://www.phonebook.com.pk/searchbyaddress.aspx">Search by address</a></li>
<li><a href="http://www.phonebook.com.pk/searchbybrand.aspx">Search by brand</a></li>
<li><a href="/advertisement-center/">Advertise with us</a></li>
<li><a href="/advertisement-center/">Advertise with us</a></li>
<li><a href="//fonts.googleapis.com/css?family=Open+Sans">Find a Person</a></li>
<li><a href="//fonts.googleapis.com/css?family=Open+Sans">Find a Person</a></li>
<li><a href="dynamic/city_select.aspx">Search by city</a></li>
<li><a href="searchbybrand.aspx">Search by brand</a></li>
</ul>
【问题讨论】:
-
请使用解析器而不是正则表达式...
-
@ThomasAyoub 亲爱的先生,我不允许使用除正则表达式之外的任何东西。我公司的限制。
-
这里不适用,当你听到老板说他有权为所欲为时。
-
你能发布你预期的输出,这样我就可以用正则表达式给你更好的解决方案
-
@akashkarothiya 添加了预期的输出。