【发布时间】:2014-12-29 13:59:28
【问题描述】:
此 Python 脚本无法输出此案例的电子邮件地址 example@email.com。
这是我之前的帖子。
#!/usr/bin/env python
from bs4 import BeautifulSoup
import re
soup = '''
<script LANGUAGE="JavaScript">
function something()
{
var ptr;
ptr = "";
ptr += "<table><td class=france></td></table>";
ptr += "<table><td class=france><a href=mail";
ptr += "to:example@email.com>email</a></td></table>";
document.all.something.innerHTML = ptr;
}
</script>
'''
soup = BeautifulSoup(soup)
for script in soup.find_all('script'):
reg = '(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)'
reg2 = 'mailto:.*'
secondHalf= re.search(reg, script.text)
firstHalf= re.search(reg2, script.text)
secondHalfEmail = secondHalf.group()
firstHalfEmail = firstHalf.group()
firstHalfEmail = firstHalfEmail.replace('mailto:', '')
firstHalfEmail = firstHalfEmail.replace('";', '')
if firstHalfEmail == secondHalfEmail:
email = secondHalfEmail
else:
if ('>') not in firstHalfEmail:
if ('>') not in secondHalfEmail:
if firstHalfEmail != secondHalfEmail:
email = firstHalfEmail + secondHalfEmail
else:
email = firstHalfEmail
else:
email = secondHalfEmail
print email
如果有人可以帮助我,那就太好了。
谢谢
【问题讨论】:
标签: javascript python regex email beautifulsoup