【发布时间】:2019-08-28 12:14:49
【问题描述】:
我正在尝试提取 EXPERIENCE 标签下的数据。我使用 beautifulsoup 来提取数据。下面是我的html:
<div><span>EXPERIENCE
<br/></span></div><div><span>
<br/></span></div><div><span>
<br/></span></div><div><span></span><span> </span><span>I worked in XYZ company from 2016 - 2018
<br/></span></div><div><span> I worked on JAVA platform
<br/></span></div><div><span>From then i worked in ABC company
</br>2018- Till date
</br></span></div><div><span>I got handson on Python Language
</br></span></div><div><span>PROJECTS
</br></span></div><div><span>Developed and optimized many application, etc...
到目前为止我的工作:
with open('E:/cvparser/test.html','rb') as h:
dh = h.read().splitlines()
out = str(dh)
soup = BeautifulSoup(out,'html.parser')
for tag in soup.select('div:has(span:contains("EXPERIENCE"))'):
final = (tag.get_text(strip = True, separator = '\n'))
print(final)
预期输出:
I worked in XYZ company from 2016 - 2018
I worked on JAVA platform
From then i worked in ABC company
2018- Till date
I got handson on Python Language
对于我的代码,它返回 null。有人可以帮我吗?
【问题讨论】:
-
只是为了澄清,经验不是标签。您感兴趣的标签是
<span>标签。因此,您正在寻找包含文本/内容EXPERIENCE的span标签下的数据 -
这几乎肯定是重复的。我最近三次看到同样的问题,只是形式略有不同。
标签: python beautifulsoup