【问题标题】:Kubernetes pod FileLockException: Lock file has been locked by another process - neo4j 4.2Kubernetes pod FileLockException:锁定文件已被另一个进程锁定-neo4j 4.2
【发布时间】:2021-11-11 19:14:17
【问题描述】:

根据我们对 neo4j 4.2 的 Kubernetes 部署的一部分,我们遇到了这个异常。我们只运行了 1 个 pod,现在它进入了 CrashloopBackoff。后端是谷歌文件存储,但目前没有人访问它。需要帮助 -

2021-09-16 17:43:00.981+0000 INFO  Starting...
2021-09-16 17:43:02.873+0000 INFO  ======== Neo4j 4.2.9 ========
2021-09-16 17:43:08.511+0000 ERROR Failed to start Neo4j on dbms.connector.http.listen_address, a socket address. If missing port or hostname it is acquired from dbms.default_listen_address.
java.lang.RuntimeException: Error starting Neo4j database server at /data/databases
        at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:207) ~[neo4j-4.2.9.jar:4.2.9]
        at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.build(DatabaseManagementServiceFactory.java:163) ~[neo4j-4.2.9.jar:4.2.9]
        at org.neo4j.server.CommunityBootstrapper.createNeo(CommunityBootstrapper.java:36) ~[neo4j-4.2.9.jar:4.2.9]
        at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:134) [neo4j-4.2.9.jar:4.2.9]
        at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:90) [neo4j-4.2.9.jar:4.2.9]
        at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:35) [neo4j-4.2.9.jar:4.2.9]
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.internal.locker.LockerLifecycleAdapter@a8a8b75' was successfully initialized, but failed to start. Please see the attached cause exception "Lock file has been locked by another process: /data/databases/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)".
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:463) ~[neo4j-common-4.2.9.jar:4.2.9]
        at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:110) ~[neo4j-common-4.2.9.jar:4.2.9]
        at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:198) ~[neo4j-4.2.9.jar:4.2.9]
        ... 5 more
Caused by: org.neo4j.kernel.internal.locker.FileLockException: Lock file has been locked by another process: /data/databases/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)
        at org.neo4j.kernel.internal.locker.Locker.storeLockException(Locker.java:175) ~[neo4j-kernel-4.2.9.jar:4.2.9]
        at org.neo4j.kernel.internal.locker.Locker.checkLock(Locker.java:95) ~[neo4j-kernel-4.2.9.jar:4.2.9]
        at org.neo4j.kernel.internal.locker.GlobalFileLocker.checkLock(GlobalFileLocker.java:58) ~[neo4j-kernel-4.2.9.jar:4.2.9]
        at org.neo4j.kernel.internal.locker.GlobalLocker.checkLock(GlobalLocker.java:28) ~[neo4j-kernel-4.2.9.jar:4.2.9]
        at org.neo4j.kernel.internal.locker.LockerLifecycleAdapter.start(LockerLifecycleAdapter.java:36) ~[neo4j-kernel-4.2.9.jar:4.2.9]
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:442) ~[neo4j-common-4.2.9.jar:4.2.9]
        at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:110) ~[neo4j-common-4.2.9.jar:4.2.9]
        at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:198) ~[neo4j-4.2.9.jar:4.2.9]
        ... 5 more
2021-09-16 17:43:08.515+0000 INFO  Neo4j Server shutdown initiated by request
2021-09-16 17:43:08.515+0000 INFO  Stopped.

【问题讨论】:

  • 可能在你的部署中包含一个初始化容器,以确保该文件夹是可写的,并且如果存在的话也可以 rm 锁
  • 删除锁怎么办?
  • 你能提供一个minimal reproducible example吗? (您是否尝试将相同的 PersistentVolumeClaim 挂载到多个 Pod 中;可能作为 Deployment 的一部分而不是 StatefulSet?)
  • @Hackerman 我已经有了 init 容器,并且整个数据目录都是可写的。我在其他两个环境中运行了相同的设置,并且该环境自过去 10 天以来一直在运行,但突然开始出现此错误。

标签: java docker kubernetes neo4j


【解决方案1】:

不建议使用 google 文件存储来代替常规的持久卷声明;出于性能原因,Neo4j 需要高速、高性能的本地磁盘,通常他们推荐 SSD。使用像云抽象这样的文件存储,如果可行,可能会导致性能下降。

现在,一般来说,您看到的错误是指磁盘上的文件被不同的进程锁定。这可能是因为 Neo4j 的先前(崩溃)版本将其锁定,或者可能是由于容量问题,在这里您必须深入研究 filestore 如何在 GKE 之类的设备上提供 POSIX 投诉驱动器。

【讨论】:

  • 感谢您的回复,是的,我们发现了之前崩溃版本锁定它的问题。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-01-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多