【问题标题】:How can the start_urls for scrapy be imported from csv?如何从csv导入scrapy的start_urls?
【发布时间】:2021-04-19 17:21:19
【问题描述】:

我尝试从 csv 文件中抓取多个网址(全部在 1 列中)。但是,代码不返回任何内容。 谢谢, 妮可

import scrapy
from scrapy.http import HtmlResponse
from scrapy.http import Request
import csv

scrapurls = ""

def get_urls_from_csv():
    with open("produktlink_test.csv", 'rbU') as csv_file:
        data = csv.reader(csv_file)
        scrapurls = []
        for row in data:
            scrapurls.append(column)
            return scrapurls

class GetlinksgalaxusSpider(scrapy.Spider):
    name = 'getlinksgalaxus'
    allowed_domains = []
    
    # An dieser Stelle definieren wir unsere Zieldomains
    start_urls = scrapurls

    def parse(self, response):

    ....

【问题讨论】:

    标签: python csv scrapy geturl


    【解决方案1】:

    Previous Answer: How to loop through multiple URLs to scrape from a CSV file in Scrapy?l

    另外,最好将所有方法放在 Scrapy spider 中,并显式添加 start_requests。

    【讨论】:

    • 太棒了!你能给答案投赞成票吗?谢谢! :)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-06-22
    • 2013-12-05
    • 1970-01-01
    • 2021-11-05
    • 2019-03-10
    • 2020-12-13
    相关资源
    最近更新 更多