【问题标题】:WAL-E: Cannot restart postgresql after backup-fetchWAL-E:备份获取后无法重新启动 postgresql
【发布时间】:2015-07-11 21:02:25
【问题描述】:

这是在 Ubuntu 14.04 LTS 上

编辑

根据https://gist.github.com/elithrar/8682235https://github.com/wal-e/wal-e#dependencies 的说明,使用pip 安装WAL-E,密钥由envdir 管理。

我正在尝试使用 WAL-E 恢复数据库,最初一切似乎都很顺利,因为我安装并运行了 Postgres,并且可以轻松地创建或恢复数据库并通过 pgadmin 在本地和远程访问它。当我尝试使用 wal-e fetch-backup 从 S3 备份执行恢复时,它会出错。在启动 postgres 之前,它似乎进展顺利。

出现了许多错误,似乎是缺少软件包或权限问题,如下所示:

* Starting PostgreSQL 9.3 database server 
    * The PostgreSQL server failed to start. Please check the log output:
    2015-07-11 00:41:11 EDT LOG:  database system was interrupted; last known up at 2015-06-30 05:00:02 EDT
        2015-07-11 00:41:11 EDT LOG:  starting archive recovery
        Traceback (most recent call last):
          File "/usr/local/bin/wal-e", line 5, in <module>
            from pkg_resources import load_entry_point
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3084, in <module>
            @_call_aside
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3070, in _call_aside
            f(*args, **kwargs)
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3097, in _initialize_master_working_set
            working_set = WorkingSet._build_master()
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 651, in _build_master
            ws.require(__requires__)
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 952, in require
            needed = self.resolve(parse_requirements(requirements))
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 847, in resolve
            new_requirements = dist.requires(req.extras)[::-1]
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2602, in requires
            dm = self._dep_map
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2803, in _dep_map
            self.__dep_map = self._compute_dependencies()
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2825, in _compute_dependencies
            for req in self._parsed_pkg_info.get_all('Requires-Dist') or []:
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2794, in _parsed_pkg_info
            metadata = self.get_metadata(self.PKG_INFO)
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1617, in get_metadata
            return self._get(self._fn(self.egg_info, name))
          File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1728, in _get
            with open(path, 'rb') as stream:
        IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/six-1.9.0.dist-info/METADATA'
        2015-07-11 00:41:11 EDT LOG:  invalid checkpoint record
        2015-07-11 00:41:11 EDT FATAL:  could not locate required checkpoint record
        2015-07-11 00:41:11 EDT HINT:  If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label".
        2015-07-11 00:41:11 EDT LOG:  startup process (PID 1693) exited with exit code 1
        2015-07-11 00:41:11 EDT LOG:  aborting startup due to startup process failure

我有其中几个,并且能够通过更改组和修改记录文件的权限以匹配目录中的其他文件来解决它们,但我怀疑问题更多地与这些包的方式和/或位置有关已安装。解决上述问题后,postgres仍然无法启动,返回如下:

* Starting PostgreSQL 9.3 database server
* The PostgreSQL server failed to start. Please check the log output:
2015-07-11 00:30:04 EDT LOG:  database system was interrupted; last known up at 2015-06-30 05:00:02 EDT
2015-07-11 00:30:04 EDT LOG:  starting archive recovery
Traceback (most recent call last):
  File "/usr/local/bin/wal-e", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3084, in <module>
    @_call_aside
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3070, in _call_aside
    f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3097, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 651, in _build_master
    ws.require(__requires__)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 952, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 839, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'wal-e==0.8.1' distribution was not found and is required by the application
2015-07-11 00:30:04 EDT LOG:  invalid checkpoint record
2015-07-11 00:30:04 EDT FATAL:  could not locate required checkpoint record
2015-07-11 00:30:04 EDT HINT:  If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label".
2015-07-11 00:30:04 EDT LOG:  startup process (PID 1495) exited with exit code 1
2015-07-11 00:30:04 EDT LOG:  aborting startup due to startup process failure

它在抱怨the 'wal-e==0.8.1' distribution was not found...,但它显然已安装并可执行:

ls -l /usr/local/lib/python2.7/dist-packages
total 608
drwxr-sr-x  2 root staff  4096 Jul 10 20:14 argparse-1.3.0.dist-info
-rw-r--r--  1 root staff 88400 Jul 10 20:14 argparse.py
-rw-r--r--  1 root staff 65659 Jul 10 20:14 argparse.pyc
drwxr-sr-x  6 root staff  4096 Jul  9 22:36 azure
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 azure-0.11.1.egg-info
drwxr-sr-x  5 root staff  4096 Jul  9 22:36 babel
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 Babel-1.3.egg-info
drwxr-sr-x 57 root staff  4096 Jul  9 22:36 boto
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 boto-2.38.0.dist-info
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 concurrent
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 dateutil
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 debtcollector
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 debtcollector-0.5.0.dist-info
-rw-r--r--  1 root staff   207 Jul 10 20:10 easy-install.pth
-rw-r--r--  1 root staff   126 Jul 10 20:33 easy_install.py
-rw-r--r--  1 root staff   315 Jul 10 20:33 easy_install.pyc
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 futures-3.0.3.dist-info
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 gevent
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 gevent-1.0.2.egg-info
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 greenlet-0.4.7.egg-info
-rwxr-xr-x  1 root staff 82869 Jul  9 22:36 greenlet.so
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 iso8601
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 iso8601-0.1.10.egg-info
drwxr-sr-x 15 root staff  4096 Jul  9 22:36 keystoneclient
drwxr-sr-x  2 root staff  4096 Jul 10 20:33 _markerlib
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 msgpack
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 msgpack_python-0.4.6.egg-info
drwxr-sr-x  5 root staff  4096 Jul  9 22:36 netaddr
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 netaddr-0.7.15.dist-info
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 netifaces-0.10.4.egg-info
-rwxr-xr-x  1 root staff 58386 Jul  9 22:36 netifaces.so
drwxr-sr-x  4 root staff  4096 Jul  9 22:36 oslo
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 oslo_config
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 oslo.config-1.14.0.dist-info
-rw-r--r--  1 root root    299 Jul  9 22:35 oslo.config-1.14.0-py2.7-nspkg.pth
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 oslo_i18n
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 oslo.i18n-2.1.0.dist-info
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 oslo_serialization
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 oslo.serialization-1.7.0.dist-info
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 oslo_utils
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 oslo.utils-1.8.0.dist-info
-rw-r--r--  1 root root    299 Jul  9 22:35 oslo.utils-1.8.0-py2.7-nspkg.pth
drwxr-sr-x  5 root staff  4096 Jul 10 20:14 pbr
drwxr-sr-x  2 root staff  4096 Jul 10 20:14 pbr-1.3.0.dist-info
drwxr-sr-x  4 root staff  4096 Jul 10 20:10 pip-7.1.0-py2.7.egg
drwxr-sr-x  3 root staff  4096 Jul 10 20:33 pkg_resources
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 python_dateutil-2.4.2.dist-info
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 python_keystoneclient-1.6.0.dist-info
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 python_swiftclient-2.4.0.dist-info
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 pytz
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 pytz-2015.4.dist-info
drwxr-sr--  3 root staff  4096 Jul  9 23:25 requests
drwxr-sr--  2 root staff  4096 Jul  9 23:25 requests-2.7.0.dist-info
drwxr-sr-x  3 root staff  4096 Jul 10 20:33 setuptools
drwxr-sr-x  2 root staff  4096 Jul 10 20:33 setuptools-18.0.1.dist-info
drwxr-sr-x  3 root staff  4096 Jul  9 22:36 simplejson
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 simplejson-3.7.3.egg-info
drwxr-sr--  2 root staff  4096 Jul  9 23:26 six-1.9.0.dist-info
-rw-r--r--  1 root root  29664 Jul  9 23:26 six.py
-rw-r--r--  1 root root  29006 Jul  9 23:26 six.pyc
drwxr-sr-x  4 root staff  4096 Jul  9 22:36 stevedore
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 stevedore-1.6.0.dist-info
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 swiftclient
drwxr-sr-x  7 root staff  4096 Jul  9 22:35 wal_e
drwxr-sr-x  2 root staff  4096 Jul  9 22:35 wal_e-0.8.1.egg-info
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 wrapt
drwxr-sr-x  2 root staff  4096 Jul  9 22:36 wrapt-1.10.5.egg-info

我做了很多搜索,但没有找到任何有用的东西。任何正确方向的建议或观点表示赞赏。

此外,解决这个问题以确保我的主要安装在需要恢复的情况下能够正常运行将非常有趣。如果无法恢复,备份就毫无意义。

【问题讨论】:

  • 如果在 dba.stackexchange.com 上询问会更好。祝你好运。
  • 猜测您已经在每个用户的某个位置安装了 WAL-E,而不是在站点范围内,因此当它以不同的操作系统用户运行时找不到它。您是如何准确地安装它的?
  • WAL-E 通过 pip 安装:sudo pip install wal-egist.github.com/elithrar/8682235github.com/wal-e/wal-e#dependencies

标签: postgresql python-2.7 ubuntu-14.04 wal-e


【解决方案1】:

我不完全确定是什么解决了 The 'wal-e==0.8.1' distribution was not found 错误,但在重新启动时我再也没有看到过。

除此之外,解决这个问题相当简单。

许多 python 发行版上的可执行位未设置。

改写这些可以修复错误:

      chmod o+x /usr/local/lib/python2.7/dist-packages/requests-2.7.0.dist-info/
      chmod o+x /usr/local/lib/python2.7/dist-packages/requests
...

【讨论】:

    【解决方案2】:

    没有正确创建备份标签的原因

    在你的 Master 中运行以下命令

    • su postgres

      psql -c "选择 pg_start_backup('initial_backup');"

      rsync -cva --inplace --exclude=pg_xlog /var/lib/postgresql/9.1/main/ slave_IP_address:/var/lib/postgresql/9.1/main/

      psql -c "选择 pg_stop_backup();"

    • cd /var/lib/postgresql/9.1/main

    创建文件名recovery.conf 添加以下行

    • standby_mode = '开' primary_conninfo = '主机=master_IP_address 端口=5432 用户=rep 密码=你的密码' trigger_file = '/tmp/postgresql.trigger.5432'

    服务 postgresql 重启

    【讨论】:

    • 感谢您的反馈,但我认为这不是问题所在,因为我能够通过权限管理解决我的问题。我也让主人愉快地生成每晚的基本备份,并且到目前为止在那台机器上没有任何问题。我赞成你的回答,但我还没有支持它的代表。
    猜你喜欢
    • 2018-09-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多