【问题标题】:Cannot deploy mongodb StatefulSet with volumes for replicas grater than one无法部署具有大于一个副本的卷的 mongodb StatefulSet
【发布时间】:2019-12-22 14:25:00
【问题描述】:

上下文

我正在共享 /data/db 目录,该目录作为网络文件系统卷安装在 StatefulSet 控制的所有 pod 中。

问题

当我设置replicas: 1 stateful set 时正确部署了 mongodb。当我扩大规模时问题就开始了(副本数量大于一个,例如replicas: 2) 所有连续的 pod 都具有CrashLoopBackOff 状态。

问题

我了解错误消息 - 检查下面的调试部分。但是,我不明白。基本上,我试图实现的是 mongodb 的有状态部署,所以即使 删除 pod 后,它们将保留数据。不知何故,mongo 阻止了我这样做,因为Another mongod instance is already running on the /data/db director。 我的问题是:我做错了什么?如何部署 mongodb 使其成为有状态并持久化数据,同时扩展有状态集?

调试

集群状态

$ kubectl get svc,sts,po,pv,pvc --output=wide
NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE   SELECTOR
service/mongo   ClusterIP   None         <none>        27017/TCP   10h   run=mongo

NAME                     READY   AGE     CONTAINERS   IMAGES
statefulset.apps/mongo   1/2     8m50s   mongo        mongo:4.2.0-bionic

NAME          READY   STATUS             RESTARTS   AGE     IP          NODE        NOMINATED NODE   READINESS GATES
pod/mongo-0   1/1     Running            0          8m50s   10.44.0.2   web01       <none>           <none>
pod/mongo-1   0/1     CrashLoopBackOff   6          8m48s   10.36.0.3   compute01   <none>           <none>

NAME                                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE   VOLUMEMODE
persistentvolume/phenex-nfs-mongo   1Gi        RWX            Retain           Bound    phenex-nfs-mongo                           22m   Filesystem

NAME                                     STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS   AGE   VOLUMEMODE
persistentvolumeclaim/phenex-nfs-mongo   Bound    phenex-nfs-mongo   1Gi        RWX                           22m   Filesystem

日志

$ kubectl logs -f mongo-1
2019-08-14T23:52:30.632+0000 I  CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongo-1
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] db version v4.2.0
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] git version: a4b751dcf51dd249c5865812b390cfd1c0129c30
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.1  11 Sep 2018
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] allocator: tcmalloc
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] modules: none
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] build environment:
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten]     distmod: ubuntu1804
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten]     distarch: x86_64
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten]     target_arch: x86_64
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] options: { net: { bindIp: "0.0.0.0" }, replication: { replSet: "rs0" } }
2019-08-14T23:52:30.642+0000 I  STORAGE  [initandlisten] exception in initAndListen: DBPathInUse: Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory, terminating
2019-08-14T23:52:30.643+0000 I  NETWORK  [initandlisten] shutdown: going to close listening sockets...
2019-08-14T23:52:30.643+0000 I  NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2019-08-14T23:52:30.643+0000 I  -        [initandlisten] Stopping further Flow Control ticket acquisitions.
2019-08-14T23:52:30.643+0000 I  CONTROL  [initandlisten] now exiting
2019-08-14T23:52:30.643+0000 I  CONTROL  [initandlisten] shutting down with code:100

错误

Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). 
Another mongod instance is already running on the /data/db directory, terminating

YAML 文件

# StatefulSet
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: mongo
  replicas: 2
  selector:
    matchLabels:
      run: mongo
      tier: backend
  template:
    metadata:
      labels:
        run: mongo
        tier: backend
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: mongo
          image: mongo:4.2.0-bionic
          command:
            - mongod
          args:
            - "--replSet=rs0"
            - "--bind_ip=0.0.0.0"
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: phenex-nfs-mongo
              mountPath: /data/db
      volumes:
      - name: phenex-nfs-mongo
        persistentVolumeClaim:
          claimName: phenex-nfs-mongo

# PersistentVolume
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: phenex-nfs-mongo
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1Gi
  nfs:
    server: master
    path: /nfs/data/phenex/production/permastore/mongo
  claimRef:
    name: phenex-nfs-mongo
  persistentVolumeReclaimPolicy: Retain

# PersistentVolumeClaim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: phenex-nfs-mongo
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi

【问题讨论】:

    标签: mongodb kubernetes


    【解决方案1】:

    问题:

    您正在使用相同的 pvc 和 pv 部署多个 pod。

    解决方案:

    使用volumeClaimTemplatesexample

    示例:

    # StatefulSet
    ---
    apiVersion: apps/v1beta1
    kind: StatefulSet
    metadata:
      name: mongo
    spec:
      serviceName: mongo
      replicas: 2
      selector:
        matchLabels:
          run: mongo
          tier: backend
      template:
        metadata:
          labels:
            run: mongo
            tier: backend
        spec:
          terminationGracePeriodSeconds: 10
          containers:
            - name: mongo
              image: mongo:4.2.0-bionic
              command:
                - mongod
              args:
                - "--replSet=rs0"
                - "--bind_ip=0.0.0.0"
              ports:
                - containerPort: 27017
              volumeMounts:
                - name: phenex-nfs-mongo
                  mountPath: /data/db
      volumeClaimTemplates:
      - metadata:
          name: phenex-nfs-mongo
        spec:
          accessModes:
            - ReadWriteMany
          resources:
            requests:
              storage: 100Mi
    

    【讨论】:

    • You are deploying more than one pod using the same pvc and pv. 别误会我的意思。但这正是我想要实现的。我想在所有 pod 之间共享 db director,所以 Stateful Set 中的所有 pod 都将具有相同的内容 - 因此,我的 pod 将是有状态的。为什么这是个问题?这与使用 nginx 部署的所有 pod 之间共享 index.html 的用例相同。非常有义务解释:)
    • 使用 MongoDB 无法实现这一点,因为如果为多个 pod 挂载一个 pv,它必须是只读的,而 MongoDB 需要对存储的写访问权限。
    • 如果我错了,请纠正我。但是,我通过在 pv 和 pvc 上将 accessModes 选项设置为 ReadWriteMany 来为我的节点指定写访问权限。所以理论上我应该能够读写共享卷。
    • 理论上是的,但是 MongoDB 在/data/db 目录中创建了一些文件,并且能够发现另一个实例已经在使用相同的存储空间
    • 您能否指出我如何正确部署 mongo 作为具有持久数据的有状态集的资源?谢谢!
    猜你喜欢
    • 2020-03-14
    • 1970-01-01
    • 2018-11-30
    • 1970-01-01
    • 2017-03-14
    • 2019-08-27
    • 1970-01-01
    • 2020-08-09
    • 2021-05-07
    相关资源
    最近更新 更多