Scrapy 错误：无法绑定：24：打开的文件太多答案

【问题标题】：Scrapy Error: Couldn't bind: 24: Too many open filesScrapy 错误：无法绑定：24：打开的文件太多
【发布时间】：2017-09-24 18:07:38
【问题描述】：

我在域列表上运行 Scrapy，并且很多页面都出现此错误： Couldn't bind: 24: Too many open files.

我在我的 linux 机器上没有收到此错误，但我现在在我的 Mac 上遇到了此错误。我不确定这是否与在 Sierra 上运行有关，或者我是否遗漏了 Scrapy 配置。我检查了ulimit，它返回unlimited，所以我不认为就是这样。

如果是我的蜘蛛，这里是：

class JakeSpider(CrawlSpider):
    name = 'jake'
    allowed_domains = allowedDomains
    start_urls = startUrls
    rules = (
        Rule(LinkExtractor(), callback='parse_page', follow=True),
    )


    def parse_page(self, response):
        page = response.url
        domain = urlparse(page).netloc
        domain = domain.replace('www.','')
        #print(domain, 'is domain and page is', page)
        linksToGet = getHotelUrlsForDomain(domain)
        #if(len(linksToGet) == 0):
        #    print('\n ... links to get was zero \n')
        #print('linksToGet = ', linksToGet)
        links = response.xpath('//a/@href').extract()
        for link in links:
            if link in linksToGet:
                print('\n\n\n   found one! ', link, 'is on', domain, ' and the page is', page,'\n\n\n')
                with open('hotelBacklinks.csv', 'a') as csvfile:
                    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
                    writer.writerow({'hotelURL':link, 'targetDomain': domain})

编辑：这是其中之一的完整错误行。它不会导致刮擦崩溃，但是有很多这样的行，所以我认为我没有得到尽可能多的页面。错误行： 2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files.

提前感谢您的任何提示。

【问题讨论】：

你让我们猜测错误发生在哪里。编辑您的问题以包含完整的错误回溯，包括导致错误的代码行。
另外，最好在函数顶部打开 csv 文件一次，而不是为每个链接关闭并重新打开它。
@JohnGordon，谢谢，我已添加其中之一。这是从 Scrapy 记录的错误，它不是严重错误，所以我没有得到导致它的代码的特定行的回溯。另外，感谢 csv 提示，我已经解决了这个问题。

标签： python web-scraping scrapy

【解决方案1】：

您应该使用pipeline 来保存所有抓取的数据。
你有这个错误是因为你有很多调用函数parse_page。每个函数都试图打开和写入同一个文件。写入文件是块操作这是来自 Scrapy https://doc.scrapy.org/en/latest/topics/item-pipeline.html 的文档

【讨论】：