【问题标题】:Python: BeautifulSoup get.tag to tablePython:BeautifulSoup get.tag 到表
【发布时间】:2017-11-27 08:13:30
【问题描述】:

我的代码中还有另一个问题:

for tag in bs.find_all('a'):
   print(tag.get('rel')[0])
   print(tag.get('title'))

打印值是否正确,但如何将输出放入表中?

它必须看起来像这样:{None: {"rel"}, "title": {"rel1", "rel2"}} 等等...

【问题讨论】:

  • 你能准确地说出你的问题吗?
  • @glegoux 我正在尝试将变量从“tag.get”保存到 dict
  • {tag.get('rel')[0] : tag.get('title') for tag in self.loadtree_parser.find_all('a')} 粗略,但你明白了。

标签: python beautifulsoup tags


【解决方案1】:

这里是你的解决方案:

from bs4 import BeautifulSoup
from collections import defaultdict

html = """
<a href="page1.html" title="1" rel="nofollow">link1</a>
<a href="page2.html" title="2" rel="author">link2</a>
<a href="page3.html" title="1">link3</a>
<a href="page4.html" title="3" rel="nofollow">link4</a>
<a href="page5.html" title="3" rel="bookmark">link5</a>
<a href="page6.html" title="1" rel="nofollow bookmark">link6</a>
<a href="page7.html" title="1" rel="">link7</a>
<a href="page8.html" title="1">link8</a>
<a href="page9.html" rel="unfollow">link9</a>
<a href="page10.html">link10</a>
"""

bs = BeautifulSoup(html, 'html.parser')

table = defaultdict(set)
for tag in bs.find_all('a'):
    title = tag.get('title')
    rel = tag.get('rel')
    rel = rel if rel else [None]
    table[title] = table[title].union(set(rel))
print(table)

输出:

defaultdict(<class 'set'>, {
    '1': {'bookmark', '', None, 'nofollow'}, 
    None: {'unfollow', None}, 
    '3': {'bookmark', 'nofollow'},
    '2': {'author'}
})

【讨论】:

    猜你喜欢
    • 2019-04-07
    • 2019-05-27
    • 2014-09-16
    • 1970-01-01
    • 2020-06-09
    • 2021-05-19
    • 2013-05-26
    • 1970-01-01
    • 2017-12-22
    相关资源
    最近更新 更多