【问题标题】:Pandas only printing last bs4 element to csv filePandas 仅将最后一个 bs4 元素打印到 csv 文件
【发布时间】:2020-12-06 14:04:27
【问题描述】:

我正在从zoopla.co.uk 抓取房屋数据

数据框似乎打印正确,但 pandas 仅将最后一个元素(最后一个房子)打印到 csv 文件。

我还尝试将每个对象转换为 pd.DataFrame({}) 语句中的列表,但这并没有改变 csv 输出。

代码

import requests
from bs4 import BeautifulSoup
import re
import pandas as pd

my_url = 'https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine'
res = requests.get(my_url)
soup = BeautifulSoup(res.text, "html.parser")
lis = soup.find("ul", class_="listing-results clearfix js-gtm-list").find_all("li", class_="srp clearfix")

for li in lis:
    bedrooms = li.find("span", class_="num-beds")
    bathrooms = li.find("span", class_="num-baths")

    price = li.find("a", class_="text-price")
    house_price = re.findall('\£(\d+)', str(price))

    style = li.find("h2", class_="listing-results-attr")
    house_type = re.findall('(?<=bed ).*(?= for)', str(style))

    distance = li.find("li", class_="clearfix")
    station_distance = re.findall('\d+\.?\d*', str(distance))

    if bedrooms:
        bedrooms = bedrooms.get_text(strip=True)
    if bathrooms:
        bathrooms = bathrooms.get_text(strip=True)
    if house_price:
        house_price = house_price
    if house_type:
        house_type = house_type
    if station_distance:
        station_distance = station_distance

    df = pd.DataFrame({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
    print(df)

    df.to_csv('zoopla.csv')

输出

house_price house_type station_distance bedrooms bathrooms
0          90       flat              0.2        1         1
  house_price      house_type station_distance bedrooms bathrooms
0         210  detached house              0.6        3      None
  house_price         house_type station_distance bedrooms bathrooms
0         160  end terrace house              0.7        2         1
  house_price      house_type station_distance bedrooms bathrooms
0         325  detached house              1.2        4         1
  house_price           house_type station_distance bedrooms bathrooms
0         195  semi-detached house              1.1        3         1
  house_price      house_type station_distance bedrooms bathrooms
0          24  terraced house              0.9        3      None
  house_price house_type station_distance bedrooms bathrooms
0         115       flat              0.2        2         1

Excel 输出 - pandas 只输出网站的最后一个元素(房子)

【问题讨论】:

    标签: python regex web-scraping beautifulsoup


    【解决方案1】:

    每次迭代都会覆盖数据框。

    用途:

    result = []
    for li in lis:
        ...
    
        result.append({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
        
    df = pd.DataFrame(result)
    print(df)
    
    df.to_csv('zoopla.csv')
    

    【讨论】:

    • TypeError: 'list' 对象不可调用
    • 抱歉错过了.append
    • 我的错。由于列表为空,我应该抓住它。
    • 一切都很好。我对 df 输出有疑问。 [90] [flat] [0.2] 1 1[210] [detached house] [0.6] 3 None。如何摆脱列表括号? imgur.com/0E2tCbT
    猜你喜欢
    • 1970-01-01
    • 2011-03-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-04-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多