【发布时间】:2018-02-04 15:19:25
【问题描述】:
我正在尝试运行 beautifulSoup 从网站中提取链接和文本(我已获得许可)
我运行以下代码来获取链接和文本:
import requests
from bs4 import BeautifulSoup
url = "http://implementconsultinggroup.com/career/#/6257"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("a")
for link in links:
if "career" in link.get("href"):
print "<a href='%s'>%s</a>" %(link.get("href"), link.text)
这给了我以下输出:
View Position
</a>
<a href='/career/business-analyst-within-human-capital-management/'>
Business analyst within human capital management
COPENHAGEN • We are looking for an ambitious student with an interest in HR
who is passionate about working in the cross-field of people management,
business and technology
View Position
</a>
<a href='/career/management-consultants-within-strategic-workforce-planning/'>
Management consultants within strategic workforce planning
COPENHAGEN • We are looking for consultants with profound experience from
other consultancies
View Position
</a>
<a href='/career/management-consultants-within-supply-chain-strategy-
production-and-process-management/'>
Management consultants within supply chain strategy, production and process
management
MALMÖ • We are looking for talented graduates who want a career in management
consulting
这几乎是正确的,但我只希望在文本中有名称 COPENHAGEN 的位置返回(即不应返回 MALMO 位置之上)。
网站的 HTML 代码如下所示:
<div class="small-12 medium-9 columns top-lined">
<a href="/career/management-consultants-within-supply-chain-management/" class="box-link">
<h2 class="article__title--tiny" data-searchable-text="">Management consultants within supply chain management</h2>
<p class="article__longDescription" data-searchable-text="">COPENHAGEN • We are looking for bright graduates with a passion for supply chain management and supply chain planning for our planning and execution excellence team.</p>
<div class="styled-link styled-icon">
<span class="icon icon-icon">
<i class="fa fa-chevron-right"></i>
</span>
<span class="icon-text">View Position</span>
</div>
</a>
</div>
【问题讨论】:
-
Palle Broe 怎么了?给出的答案不满足您的要求吗?如果是这样,请接受否则发表评论为什么不是?人们浪费时间准备答案。所以,不要忽视这一点。谢谢。
-
抱歉 - 现在已在评论中添加。非常感谢您的帮助。
标签: python web-scraping beautifulsoup