【问题标题】:BeautifulSoup web scraping all 'li' text to dataframeBeautifulSoup 网络将所有“li”文本抓取到数据框
【发布时间】:2017-12-10 00:40:17
【问题描述】:

我正在尝试使用 BeautifulSoup 从房地产网站上抓取属性列表并将它们传递到数据表中。我正在使用 python 3。

以下代码可以打印所需的数据。但我需要一种将数据输出到表中的方法。每个 li 标签之间是 3 个项目,一个物业编号 (1 - 50),租户名称和平方英尺。理想情况下,输出将在具有列标题编号、租户、平方英尺的数据框中构建。

from bs4 import BeautifulSoup
import requests
import pandas as pd

page = requests.get("http://properties.kimcorealty.com/properties/0014/")
soup = BeautifulSoup(page.content, 'html.parser')

start = soup.find('div', {'id' : 'units_box_1'})
for litag in start.find_all('li'):
    print(litag.text)

start = soup.find('div', {'id' : 'units_box_2'})
for litag in start.find_all('li'):
    print(litag.text)

start = soup.find('div', {'id' : 'units_box_3'})
for litag in start.find_all('li'):
    print(litag.text)

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    您可以这样做,一次性获取所有 div,为包含一组数据的 3 个“li”标签组找到封闭的“a”标签。

    from bs4 import BeautifulSoup
    import requests
    import unicodedata
    from pandas import DataFrame
    
    page = requests.get("http://properties.kimcorealty.com/properties/0014/")
    soup = BeautifulSoup(page.content, 'html.parser')
    table = []
    # Find all the divs we need in one go.
    divs = soup.find_all('div', {'id':['units_box_1', 'units_box_2', 'units_box_3']})
    for div in divs:
        # find all the enclosing a tags.
        anchors = div.find_all('a')
        for anchor in anchors:
            # Now we have groups of 3 list items (li) tags
            lis = anchor.find_all('li')
            # we clean up the text from the group of 3 li tags and add them as a list to our table list.
            table.append([unicodedata.normalize("NFKD",lis[0].text).strip(), lis[1].text, lis[2].text.strip()])
    # We have all the data so we add it to a DataFrame.
    headers = ['Number', 'Tenant', 'Square Footage']
    df = DataFrame(table, columns=headers)
    print (df)
    

    输出:

       Number                          Tenant Square Footage
    0       1                  Nordstrom Rack         34,032
    1       2               Total Wine & More         29,981
    2       3           Thomasville Furniture         10,628
    ...
    47     49                  Jo-Ann Fabrics         45,940
    48     50                       Available         32,572
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-03-27
      • 1970-01-01
      • 2018-10-29
      • 2019-03-22
      • 1970-01-01
      相关资源
      最近更新 更多