【发布时间】:2020-10-06 16:45:32
【问题描述】:
我正在尝试使用 scrapy-splash 从这个 website 获取数据,但我无法提取数据。我想获取有关每个真实状态的数据,例如 href、价格等。这是我的代码:
在 settings.py 中:
ROBOTSTXT_OBEY = False
USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
SPLASH_ENABLED = True
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPLASH_URL = 'http://localhost:8050/'
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
我的蜘蛛:
class M2Spider(scrapy.Spider):
name = "m2"
allowed_domains = ['metrocuadrado.com']
start_urls = [
'https://www.metrocuadrado.com/bodega/arriendo'
]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url,callback= self.parse,
endpoint='render.html',
args={'wait': 10},)
def parse(self, response):
print("--------------------------------------------------------------")
real_states= response.selector.xpath(".//a[@class='sc-bdVaJa ebNrSm']").getall()
print("real_states")
输出打印是一个空列表[]。我是新手。有什么建议吗?
【问题讨论】:
标签: python scrapy scrapy-splash