【问题标题】:How to get all data in a csv file by scrapy?如何通过scrapy获取csv文件中的所有数据?
【发布时间】:2020-04-09 13:30:18
【问题描述】:

通过此代码,我只能从该网站获取最后的数据。因为我想获取所有数据,但它无法正常工作。我在这段代码中做错了什么?

import scrapy


class YellowPages(scrapy.Spider):
    name = 'yp'

    start_urls = [
           "https://www.yellowpages.com/search?search_terms=agent&geo_location_terms=Los%20Angeles%2C%20CA&page=1",
           ]

    def parse(self, response):
        agent_name = response.xpath("//a[@class='business-name']/span/text()").extract()
        phone_number = response.xpath("//div[@class='phones phone primary']/text()").extract()
        address = response.xpath("//div[@class='street-address']/text()").extract()
        locality = response.xpath("//div[@class='locality']/text()").extract()


        data = zip(agent_name, phone_number, address, locality)

        for item in data:
            info = {
                'page' : response.url,
                'Agent_name': item[0],
                'Phone_number': item[1],
                'address': item[2],
                'locality':item[3],
            }

        yield info 

我在setting.py文件中也做过这个。

FEED_FORMAT="csv"
FEED_URI="yp.csv"

我在 csv 文件中获取的数据--

page,Agent_name,Phone_number,address,locality
https://www.yellowpages.com/search?search_terms=agent&geo_location_terms=Los%20Angeles%2C%20CA&page=1,Pedro Maldonado - State Farm Insurance Agent,(323) 583-1111,1933 E Florence Ave,"Los Angeles, CA 90001"

所有数据应该有 30 行,但我只得到 1 行,这应该是我首选数据集的最后一行

【问题讨论】:

    标签: python-3.x web-scraping scrapy


    【解决方案1】:

    您只需将最后一行正确缩进到 for 循环中,您的代码就会按预期工作。现在,您在 for 循环之外调用 yield info,因此只有在最后一个循环之后绑定到您的变量的值才会写入您的 csv。将其缩进到 for 循环中,每个结果都将正确写入您的 csv 文件。因此您的代码将如下所示:

    class YellowPages(scrapy.Spider):
        name = 'yp'
    
        start_urls = [
               "https://www.yellowpages.com/search?search_terms=agent&geo_location_terms=Los%20Angeles%2C%20CA&page=1",
               ]
    
        def parse(self, response):
            agent_name = response.xpath("//a[@class='business-name']/span/text()").extract()
            phone_number = response.xpath("//div[@class='phones phone primary']/text()").extract()
            address = response.xpath("//div[@class='street-address']/text()").extract()
            locality = response.xpath("//div[@class='locality']/text()").extract()
    
    
            data = zip(agent_name, phone_number, address, locality)
    
            for item in data:
                info = {
                    'page' : response.url,
                    'Agent_name': item[0],
                    'Phone_number': item[1],
                    'address': item[2],
                    'locality': item[3],
                }
                yield info
    

    【讨论】:

      猜你喜欢
      • 2023-03-30
      • 2020-10-17
      • 1970-01-01
      • 2020-07-21
      • 2017-05-21
      • 2021-07-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多