【问题标题】:connecting scrapy container to mongo container将scrapy容器连接到mongo容器
【发布时间】:2020-12-29 20:38:17
【问题描述】:

我正在尝试使用 docker-compose 旋转和连接两个容器(mongo 和 scrapy spider)。作为 Docker 的新手,我很难对网络端口(容器内部和外部)进行故障排除。为了尊重你的时间,我会尽量缩短。

问题:

无法将蜘蛛连接到 mongo db 容器并出现超时错误。我认为它必须与我试图从容器连接的 IP 地址不正确。但是,蜘蛛在本地工作(非 docker 化版本),可以将数据传递给正在运行的 mongo 容器。

从代码中删除姓名和电子邮件的小编辑。

错误:

pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 5feb8bdcf912ec8797c25497, topology_type: Single

管道代码:

from scrapy.exceptions import DropItem
# scrappy log is deprecated
#from scrapy.utils import log
import logging
import scrapy
from itemadapter import ItemAdapter
import pymongo

class xkcdMongoDBStorage:
    """
    Class that handles the connection of 
    Input:
        MongoDB
    Output
    """
    def __init__(self):
        # requires two arguments(address and port)
        #* connecting to the db
        self.conn = pymongo.MongoClient(
            '127.0.0.1',27017) # works with spider local and container running
            # '0.0.0.0',27017)
        # connecting to the db
        dbnames = self.conn.list_database_names()
        if 'randallMunroe' not in dbnames:
            # creating the database
            self.db = self.conn['randallMunroe']

        #if database already exists we want access
        else:
            self.db = self.conn.randallMunroe
        #* connecting to the table
        dbCollectionNames = self.db.list_collection_names()
        if 'webComic' not in dbCollectionNames:
            self.collection = self.db['webComic']
        
        else:
            # the table already exist so we access it
            self.collection = self.db.webComic

    def process_item(self, item, spider):
        valid = True
        for data in item:
            if not data:
                valid = False
                raise DropItem("Missing {0}!".format(data))
        if valid:
            self.collection.insert(dict(item))
            logging.info(f"Question added to MongoDB database!")
        return item

蜘蛛的 Dockerfile

# base image
FROM python:3
# metadata info
LABEL maintainer="first last name" email="something@gmail.com"
# exposing container port to be the same as scrapy default
EXPOSE 6023
# set work directly so that paths can be relative
WORKDIR /usr/src/app
# copy to make usage of caching
COPY requirements.txt ./
#install dependencies
RUN pip3 install --no-cache-dir -r requirements.txt
# copy code itself from local file to image
COPY . .
CMD scrapy crawl xkcdDocker
version: '3'
services:
  db:
    image: mongo:latest
    container_name: NoSQLDB
    restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: password
    volumes:
      - ./data/bin:/data/db
    ports:
      - 27017:27017
    expose:
      - 27017

  xkcd-scraper:
    build: ./scraperDocker
    container_name: xkcd-scraper-container
    volumes: 
      - ./scraperDocker:/usr/src/app/scraper
    ports:
      - 5000:6023
    expose:
      - 6023
    depends_on:
      - db

感谢您的帮助

【问题讨论】:

    标签: mongodb docker-compose scrapy


    【解决方案1】:

    试试:

    self.conn = pymongo.MongoClient('NoSQLDB',27017)
    

    在 docker compose 中,您可以根据服务名称引用其他容器。

    【讨论】:

    • 嘿,这很接近,我需要在 pymongo 中再运行几次才能让它运行。 self.conn = pymongo.MongoClient(host='NoSQLDB',port=27017,username='root',password='password',authSource="admin") 另一个 pymongo question 帮助了我。我在容器的 /data/db 卷中看到了白老虎文件。我希望使用 mongo compass 来访问数据,但是没有用。我真的需要同时暴露 27017 和 6023 吗?
    • 您可能确实需要公开两个端口,因为它们用于不同的服务。
    猜你喜欢
    • 1970-01-01
    • 2019-06-21
    • 2021-06-04
    • 2017-06-26
    • 2016-01-25
    • 2021-09-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多