【发布时间】:2021-08-09 22:46:54
【问题描述】:
我正在尝试以以下格式解析一行:
<span class="foobar">text_I_want</span>
我怎样才能只访问“text_I_want”?
也许在用 bs 解析时我应该采取更早的步骤。最初,我有以下内容:
<div class="commit_item">
<span class="commit_id"><a href="/commit/944bd962177fd1444b2e6282ec808402bb9e3fa6/">944bd962177f</a></span>
<span class="commit_summary">
<span class="commit_subject">mm/memory-failure: make sure wait for page writeback in memory_failure</span>
<span class="commit_date">2021-08-02</span>
<span class="commit_author">Rafael Aquini</span>
</span>
<span class="commit_link">
<a class="tree_link" href="/commit/e8675d291ac007e1c636870db880f837a9ea112a/"><img alt="" class="tree_icon" src="/static/gitrepo/tux.svg"/> <span class="tree_name">linux</span></a>
</span>
</div>
为了解析这个,我做了以下操作:
for commit in soup.find_all('div', {"class": "commit_item"}):
print(commit)
url = commit.find('span', {"class": "commit_id"})
subject = commit.find('span', {"class": "commit_subject"})
author = commit.find('span', {"class": "commit_date"})
date = commit.find('span', {"class": "commit_author"})
commit_link = commit.find('span', {"class": "commit_link"})
但是,现在我正在努力获取
【问题讨论】:
标签: python html parsing beautifulsoup python-requests