【发布时间】:2018-01-19 02:20:27
【问题描述】:
我想在这里抓取这个网站:
但是,它需要我向下滚动才能收集更多数据。我不知道如何使用 Beautiful soup 或 python 向下滚动。这里有人知道怎么做吗?
代码有点乱,但就是这样。
import scrapy
from scrapy.selector import Selector
from testtest.items import TesttestItem
import datetime
from selenium import webdriver
from bs4 import BeautifulSoup
from HTMLParser import HTMLParser
import re
import time
class MLStripper(HTMLParser):
class MySpider(scrapy.Spider):
name = "A1Locker"
def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()
allowed_domains = ['https://www.a1lockerrental.com']
start_urls = ['http://www.a1lockerrental.com/self-storage/mo/st-
louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?
category=all']
def parse(self, response):
url='http://www.a1lockerrental.com/self-storage/mo/st-
louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?
category=Small'
driver = webdriver.Firefox()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
url2='http://www.a1lockerrental.com/self-storage/mo/st-louis/4427-
meramec-bottom-rd-facility/unit-sizes-prices#/units?category=Medium'
driver2 = webdriver.Firefox()
driver2.get(url2)
html2 = driver.page_source
soup2 = BeautifulSoup(html2, 'html.parser')
#soup.append(soup2)
#print soup
items = []
inside = "Indoor"
outside = "Outdoor"
inside_units = ["5 x 5", "5 x 10"]
outside_units = ["10 x 15","5 x 15", "8 x 10","10 x 10","10 x
20","10 x 25","10 x 30"]
sizeTagz = soup.findAll('span',{"class":"sss-unit-size"})
sizeTagz2 = soup2.findAll('span',{"class":"sss-unit-size"})
#print soup.findAll('span',{"class":"sss-unit-size"})
rateTagz = soup.findAll('p',{"class":"unit-special-offer"})
specialTagz = soup.findAll('span',{"class":"unit-special-offer"})
typesTagz = soup.findAll('div',{"class":"unit-info"},)
rateTagz2 = soup2.findAll('p',{"class":"unit-special-offer"})
specialTagz2 = soup2.findAll('span',{"class":"unit-special-offer"})
typesTagz2 = soup2.findAll('div',{"class":"unit-info"},)
yield {'date': datetime.datetime.now().strftime("%m-%d-%y"),
'name': "A1Locker"
}
size = []
for n in range(len(sizeTagz)):
print len(rateTagz)
print len(typesTagz)
if "Outside" in (typesTagz[n]).get_text():
size.append(re.findall(r'\d+',
(sizeTagz[n]).get_text()))
size.append(re.findall(r'\d+',
(sizeTagz2[n]).get_text()))
print "logic hit"
for i in range(len(size)):
yield {
#soup.findAll('p',{"class":"icon-bg"})
#'name': soup.find('strong', {'class':'high'}).text
'size': size[i]
#"special": (specialTagz[n]).get_text(),
#"rate": re.findall(r'\d+',(rateTagz[n]).get_text()),
#"size": i.css(".sss-unit-size::text").extract(),
#"types": "Outside"
}
driver.close()
代码的期望输出是让它显示从该网页收集的数据:http://www.a1lockerrental.com/self-storage/mo/st-louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?category=all
这样做需要能够向下滚动以查看其余数据。至少我的想法是这样的。
谢谢, DM123
【问题讨论】:
-
你试过什么?当前输出是什么,期望的输出是什么?
-
您需要提供您尝试过的代码,它的作用和输出,以及它无法输出的内容,此外我无法看到指向网站的链接
-
好的,请稍等,我将添加代码
-
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 我设法让它与这段代码一起工作
标签: javascript python dynamic beautifulsoup bs4