【发布时间】:2020-04-09 13:30:18
【问题描述】:
通过此代码,我只能从该网站获取最后的数据。因为我想获取所有数据,但它无法正常工作。我在这段代码中做错了什么?
import scrapy
class YellowPages(scrapy.Spider):
name = 'yp'
start_urls = [
"https://www.yellowpages.com/search?search_terms=agent&geo_location_terms=Los%20Angeles%2C%20CA&page=1",
]
def parse(self, response):
agent_name = response.xpath("//a[@class='business-name']/span/text()").extract()
phone_number = response.xpath("//div[@class='phones phone primary']/text()").extract()
address = response.xpath("//div[@class='street-address']/text()").extract()
locality = response.xpath("//div[@class='locality']/text()").extract()
data = zip(agent_name, phone_number, address, locality)
for item in data:
info = {
'page' : response.url,
'Agent_name': item[0],
'Phone_number': item[1],
'address': item[2],
'locality':item[3],
}
yield info
我在setting.py文件中也做过这个。
FEED_FORMAT="csv"
FEED_URI="yp.csv"
我在 csv 文件中获取的数据--
page,Agent_name,Phone_number,address,locality
https://www.yellowpages.com/search?search_terms=agent&geo_location_terms=Los%20Angeles%2C%20CA&page=1,Pedro Maldonado - State Farm Insurance Agent,(323) 583-1111,1933 E Florence Ave,"Los Angeles, CA 90001"
所有数据应该有 30 行,但我只得到 1 行,这应该是我首选数据集的最后一行
【问题讨论】:
标签: python-3.x web-scraping scrapy