BeautifulSoup 带有 div 标签但没有属性答案

【问题标题】：BeautifulSoup with div tag and no attributesBeautifulSoup 带有 div 标签但没有属性
【发布时间】：2018-03-12 05:02:23
【问题描述】：

尝试从网页中抓取数据：

在 html 中会有多个结果，寻找最有效的方式使用 find_all 来检索 div 和 span 标签中的项目，我唯一能让每个条目不同的是/results?phoneno=999999999&amp;rid=0x0。

它会有一个 rid=0x0 rid=0x1 等。不知道如何获取下面列出的所有这些元素

<div class="card-summary" data-detail="/results?phoneno=999999999&amp;rid=0x0">
    <div class="row">
        <div class="col-md-8">
            <div class="h4">Kevin Johnson</div>
            <div>
                 <span class="content-label">Age </span>
                 <span class="content-value">54 </span>
            </div>
            <div>
                 <span class="content-label">Lives in </span>
                 <span class="content-value">Las Vegas, NV</span>
            </div>
        </div>
    </div>
</div>
<div class="card-summary" data-detail="/results?phoneno=6666666666&amp;rid=0x02">
    <div class="row">
        <div class="col-md-8">
            <div class="h4">Amy Smith</div>
            <div>
                <span class="content-label">Age </span>
                <span class="content-value">25 </span>
            </div>
            <div>
                <span class="content-label">Lives in </span>
                <span class="content-value">New York, NY</span>
            </div>
        </div>
    </div>
</div>

即：["Kevin Johnson", "54", "Las Vegas, NV", "/results?phoneno=999999999&amp;rid=0x0"]

将每个人放入列表然后将其输出以打印喜欢data = [["Name","Age","Location","URL"]]

【问题讨论】：

那么……这里的问题是什么？ find_all所有div标签最有效的方法是find_all所有div标签；实际上没有任何替代方案可以满足该要求。另外，“最有效的方式”真的是首要要求吗？
你能把你到现在做的代码或脚本包括进来
sorry html 第一次没有正确发布，已编辑

标签： python beautifulsoup

【解决方案1】：

您可以使用name、age、contact、lives_in 的键为每个人创建字典。找到每个人的这些详细信息，然后将这些字典附加到一个列表中。

代码：

soup = BeautifulSoup(html, 'lxml')
information = []
for person in soup.find_all('div', class_='card-summary'):
    person_info = {}
    person_info['contact'] = person['data-detail']
    person_info['name'] = person.find('div', class_='h4').text
    person_info['age'] = person.find('span', text='Age ').find_next('span').text
    person_info['location'] = person.find('span', text='Lives in ').find_next('span').text
    information.append(person_info)

print(information)

输出：

[{'age': '54 ',
  'contact': '/results?phoneno=999999999&rid=0x0',
  'location': 'Las Vegas, NV',
  'name': 'Kevin Johnson'},
 {'age': '25 ',
  'contact': '/results?phoneno=6666666666&rid=0x02',
  'location': 'New York, NY',
  'name': 'Amy Smith'}]

如果你想要列表中的信息，你可以使用这个代码：

soup = BeautifulSoup(html, 'lxml')
information = []
for person in soup.find_all('div', class_='card-summary'):
    contact = person['data-detail']
    name = person.find('div', class_='h4').text
    age = person.find('span', text='Age ').find_next('span').text
    location = person.find('span', text='Lives in ').find_next('span').text
    information.append([name, age, location, contact])

print(information)

输出：

[['Kevin Johnson', '54 ', 'Las Vegas, NV', '/results?phoneno=999999999&rid=0x0'], ['Amy Smith', '25 ', 'New York, NY', '/results?phoneno=6666666666&rid=0x02']]

【讨论】：

@inc0gnit0c0de，如果您需要解释任何部分，请随时询问。
感谢您的快速响应。当我尝试打印时，它会打印一些数据，然后我得到：ValueError：要解压的值太多，我如何从 html 中提取数据以进行打印？我是 python 和 beautifulsoup 的新手
如果您运行代码的 HTML 与您提供的 HTML 不一致，则会引发这些错误。而且，我如何从 html 中提取数据以打印出来是什么意思？
@inc0gnit0c0de，请检查编辑后的代码。我做了一些改变。它现在不应该给出任何错误。