如何在运行时检查 python 模块是否有效而不导入它？答案

【问题标题】：How can I check on runtime that a python module is valid without importing it?如何在运行时检查 python 模块是否有效而不导入它？
【发布时间】：2017-06-13 08:11:45
【问题描述】：

我有一个包含子包的包，其中只有一个我需要在运行时导入 - 但我需要测试它们是否有效。这是我的文件夹结构：

game/
 __init__.py
 game1/
   __init__.py
   constants.py
   ...
 game2/
   __init__.py
   constants.py
   ...

目前在启动时运行的代码是：

import pkgutil
import game as _game
# Detect the known games
for importer,modname,ispkg in pkgutil.iter_modules(_game.__path__):
    if not ispkg: continue # game support modules are packages
    # Equivalent of "from game import <modname>"
    try:
        module = __import__('game',globals(),locals(),[modname],-1)
    except ImportError:
        deprint(u'Error in game support module:', modname, traceback=True)
        continue
    submod = getattr(module,modname)
    if not hasattr(submod,'fsName') or not hasattr(submod,'exe'): continue
    _allGames[submod.fsName.lower()] = submod

但这样做的缺点是所有的子包都被导入，导入了子包中的其他模块（例如 constants.py 等），这相当于几兆字节的垃圾。所以我想用子模块是否有效的测试来替换这段代码（它们会很好地导入）。我想我应该以某种方式使用 eval - 但是如何？或者我该怎么办？

编辑： tldr;

我正在寻找与上面循环的核心等效的：

    try:
        probaly_eval(game, modname) # fails iff `from game import modname` fails
        # but does _not_ import the module
    except: # I'd rather have a more specific error here but methinks not possible
        deprint(u'Error in game support module:', modname, traceback=True)
        continue

因此，如果存在与 vis 错误检查的导入语句的完全等效 - 没有导入模块，我想要一个明确的答案。这是我的问题，很多回答者和评论者回答了不同的问题。

【问题讨论】：

类似：python -m py_compile script.py ?
我需要从正在运行的程序中执行此操作，如上所述
或python -m compileall ?
是的，但是您应该能够从程序内部使用它，作为模块加载....docs.python.org/2/library/py_compile.html
@Mr_and_Mrs_D：编译是一个很好的步骤，但由于 Python 中缺少许多静态（编译时）检查，它在较小程度上验证了文件。您可以成功编译一个文件，该文件将在导入时使用AttributeError 或ArithmeticError 或KeyError 等进行轰炸。 OTOH 仅仅导入并不能保证导入的函数无论如何都不会在运行时崩溃。

标签： python python-2.7 eval python-import python-importlib

【解决方案1】：

也许您正在寻找 py_compile 或 compileall 模块。
这里是文档：
https://docs.python.org/2/library/py_compile.html
https://docs.python.org/2/library/compileall.html#module-compileall

您可以将所需的模块加载为模块，然后在程序中调用它。
例如：

import py_compile

try:
    py_compile.compile(your_py_file, doraise=True)
    module_ok = True
except py_compile.PyCompileError:
    module_ok = False

【讨论】：

【解决方案2】：

如果你想编译文件而不导入它（在当前解释器中），你可以使用py_compile.compile作为：

>>> import py_compile

# valid python file
>>> py_compile.compile('/path/to/valid/python/file.py')

# invalid python file
>>> py_compile.compile('/path/to/in-valid/python/file.txt')
Sorry: TypeError: compile() expected string without null bytes

以上代码将错误写入std.error。如果您想引发异常，您必须将doraise 设置为True（默认为False）。因此，您的代码将是：

from py_compile import compile, PyCompileError

try:
    compile('/path/to/valid/python/file.py', doraise=True)
    valid_file = True
except PyCompileError:
    valid_file = False

根据py_compile.compile's documents：

将源文件编译为字节码并写出字节码缓存文件。源代码是从名为 file 的文件中加载的。字节码写入cfile，默认为文件+'c'（如果在当前解释器中启用了优化，则为'o'）。如果指定了 dfile，它将在错误消息中用作源文件的名称，而不是文件。如果doraise 为真，则在编译文件时遇到错误时会引发PyCompileError。如果doraise 为假（默认值），则会将错误字符串写入sys.stderr，但不会引发异常。

检查以确保未导入已编译的模块 （在当前解释器中）：

>>> import py_compile, sys
>>> py_compile.compile('/path/to/main.py')

>>> print [key for key in locals().keys() if isinstance(locals()[key], type(sys)) and not key.startswith('__')]
['py_compile', 'sys']  # main not present

【讨论】：

@Mr_and_Mrs_D 这是你需要的吗？
您确定这不会将模块添加到sys.modules 吗？
@Mr_and_Mrs_D 检查编辑。我做了一个小测试脚本来验证这一点，你可以看到它在编译时没有被导入
我回来后将在我的设置中进行测试
这要求你知道'foo.py'和'foo/__init__'的区别，100%确定文件会在python路径中执行，满足所有导入要求，并且模块执行.例如，导入不会有一个“raise ValueError()”。

【解决方案3】：

你不能真正有效地做你想做的事。为了查看一个包是否“有效”，您需要运行它——不仅仅是检查它是否存在——因为它可能有错误或未满足的依赖关系。

使用pycompile 和compileall 只会测试您是否可以编译python 文件，而不是导入模块。两者差别很大。

这种方法意味着您知道模块的实际文件结构——import foo 可以代表/foo.py 或/foo/__init__.py。
这种方法不能保证该模块位于您的解释器的 pythonpath 中，或者是您的解释器将加载的模块。如果您在 /site-packages/ 中有多个版本，或者 python 正在寻找模块的许多可能位置之一，事情就会变得棘手。
仅仅因为您的文件“编译”并不意味着它会“运行”。作为一个包，它可能存在未满足的依赖关系，甚至引发错误。

想象一下这是你的 python 文件：

 from makebelieve import nothing
 raise ValueError("ABORT")

上面会编译，但是如果你导入它们......如果你没有安装makebelieve，它会引发一个 ImportError，如果你安装了它会引发一个 ValueError。

我的建议是：

导入包然后卸载模块。要卸载它们，只需遍历 sys.modules.keys() 中的内容。如果您担心加载的外部模块，您可以覆盖import 以记录您的包加载的内容。这方面的一个例子是在我写的一个 terrible 分析包中：https://github.com/jvanasco/import_logger [我忘记了我从哪里得到覆盖导入的想法。也许celery?] 正如一些人所指出的，卸载模块完全依赖于解释器——但几乎每个选项都有很多缺点。
使用子进程通过popen 启动新的解释器。即popen('python', '-m', 'module_name')。如果您对每个需要的模块执行此操作（每个解释器和导入的开销），这将产生很多开销，但您可以编写一个“.py”文件来导入您需要的所有内容并尝试运行它。无论哪种情况，您都必须分析输出——因为导入“有效”包可能会在执行期间导致可接受的错误。我不记得子进程是否继承了您的环境变量，但我相信它确实如此。子进程是一个全新的操作系统进程/解释器，因此模块将被加载到该短期进程的内存中。明确的答案。

【讨论】：

卸载模块完全取决于 Python 的编译器，您无法确定编译器何时会这样做。您可以使用delete_module（来自我创建的库），但在这里您还必须确保 imported 模块不包含任何引用，并且它在释放此内存时完全在编译器上
卸载包不是一种选择——建议不要在任何地方使用。我正在寻找一种非 hackish 的方式 - 正确的方式。如果我想进入卸载包的痛苦/黑客，我已经这样做了 - 不容易。并且加载一个 interpereter 似乎有很多开销 - 我也不确定它不会将模块留在命名空间中
好吧，你要求做一件骇人听闻的事情。我建议不要使用“编译”选项，因为它们要求您将模块作为文件访问——这意味着您需要知道“foo.py”或“foo/__init__.py”之间的区别，并且这些文件可能在您的蟒蛇路径。如果你使用子进程启动一个新的解释器，模块将被加载到该解释器的进程中——而不是你的。
不，我不是，即使我是，我的问题也不同。你开始解释不适用的陷阱 - 我清楚地声明我知道我的文件夹结构并给出遍历它的循环 - 那么为什么你添加一个带有编译陷阱的整个段落（这不是我考虑的路径，我相信一种需要评估）？

【解决方案4】：

我相信imp.find_module 至少可以满足您的一些要求：https://docs.python.org/2/library/imp.html#imp.find_module

快速测试表明它不会触发导入：

>>> import imp
>>> import sys
>>> len(sys.modules)
47
>>> imp.find_module('email')
(None, 'C:\\Python27\\lib\\email', ('', '', 5))
>>> len(sys.modules)
47
>>> import email
>>> len(sys.modules)
70

这是我的一些代码（尝试对模块进行分类）中的一个示例用法：https://github.com/asottile/aspy.refactor_imports/blob/2b9bf8bd2cf22ef114bcc2eb3e157b99825204e0/aspy/refactor_imports/classify.py#L38-L44

【讨论】：

【解决方案5】：

我们已经有一个custom importer（免责声明：我没有编写该代码，我只是当前的维护者）其load_module：

def load_module(self,fullname):
    if fullname in sys.modules:
        return sys.modules[fullname]
    else: # set to avoid reimporting recursively
        sys.modules[fullname] = imp.new_module(fullname)
    if isinstance(fullname,unicode):
        filename = fullname.replace(u'.',u'\\')
        ext = u'.py'
        initfile = u'__init__'
    else:
        filename = fullname.replace('.','\\')
        ext = '.py'
        initfile = '__init__'
    try:
        if os.path.exists(filename+ext):
            with open(filename+ext,'U') as fp:
                mod = imp.load_source(fullname,filename+ext,fp)
                sys.modules[fullname] = mod
                mod.__loader__ = self
        else:
            mod = sys.modules[fullname]
            mod.__loader__ = self
            mod.__file__ = os.path.join(os.getcwd(),filename)
            mod.__path__ = [filename]
            #init file
            initfile = os.path.join(filename,initfile+ext)
            if os.path.exists(initfile):
                with open(initfile,'U') as fp:
                    code = fp.read()
                exec compile(code, initfile, 'exec') in mod.__dict__
        return mod
    except Exception as e: # wrap in ImportError a la python2 - will keep
        # the original traceback even if import errors nest
        print 'fail', filename+ext
        raise ImportError, u'caused by ' + repr(e), sys.exc_info()[2]

所以我想我可以用可覆盖的方法替换访问sys.modules 缓存的部分，这些方法将在我的覆盖中单独保留该缓存：

所以：

@@ -48,2 +55,2 @@ class UnicodeImporter(object):
-        if fullname in sys.modules:
-            return sys.modules[fullname]
+        if self._check_imported(fullname):
+            return self._get_imported(fullname)
@@ -51 +58 @@ class UnicodeImporter(object):
-            sys.modules[fullname] = imp.new_module(fullname)
+            self._add_to_imported(fullname, imp.new_module(fullname))
@@ -64 +71 @@ class UnicodeImporter(object):
-                    sys.modules[fullname] = mod
+                    self._add_to_imported(fullname, mod)
@@ -67 +74 @@ class UnicodeImporter(object):
-                mod = sys.modules[fullname]
+                mod = self._get_imported(fullname)

并定义：

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = {}

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _get_imported(self, fullname):
        try:
            return sys.modules[fullname]
        except KeyError:
            return self._modules_to_discard[fullname]

    def _add_to_imported(self, fullname, mod):
        self._modules_to_discard[fullname] = mod

    @classmethod
    def cleanup(cls):
        cls._modules_to_discard.clear()

然后我在 sys.meta_path 中添加了导入器，一切顺利：

importer = sys.meta_path[0]
try:
    if not hasattr(sys,'frozen'):
        sys.meta_path = [fake_importer()]
    perform_the_imports() # see question
finally:
    fake_importer.cleanup()
    sys.meta_path = [importer]

对吗？错了！

Traceback (most recent call last):
  File "bash\bush.py", line 74, in __supportedGames
    module = __import__('game',globals(),locals(),[modname],-1)
  File "Wrye Bash Launcher.pyw", line 83, in load_module
    exec compile(code, initfile, 'exec') in mod.__dict__
  File "bash\game\game1\__init__.py", line 29, in <module>
    from .constants import *
ImportError: caused by SystemError("Parent module 'bash.game.game1' not loaded, cannot perform relative import",)

嗯？我目前正在导入相同的模块。那么答案大概在import's docs

如果在缓存中找不到模块，则搜索 sys.meta_path（sys.meta_path 的规范可以在 PEP 302 中找到）。

这并不完全正确，但我猜是语句 from .constants import * 查找 sys.modules 以检查父模块是否存在，并且我认为没有办法绕过它（请注意，我们的自定义加载器正在使用内置的模块导入机制，mod.__loader__ = self 是事后设置的）。

所以我更新了我的 FakeImporter 以使用 sys.modules 缓存，然后清理它。

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = set()

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _add_to_imported(self, fullname, mod):
        super(FakeUnicodeImporter, self)._add_to_imported(fullname, mod)
        self._modules_to_discard.add(fullname)

    @classmethod
    def cleanup(cls):
        for m in cls._modules_to_discard: del sys.modules[m]

然而，这以一种新的方式发生了 - 或者更确切地说是两种方式：

对 game/ 包的引用保存在 sys.modules 中的 bash 顶级包实例中：
```
bash\
  __init__.py
  the_code_in_question_is_here.py
  game\
    ...
```
因为game 被导入为bash.game。该引用包含对所有 game1, game2,... 子包的引用，因此这些子包永远不会被垃圾收集
对另一个模块 (brec) 的引用被同一个 bash 模块实例保存为 bash.brec。此引用在 game\game1 未触发导入中以 from .. import brec 的形式导入，以更新 SomeClass。然而，在另一个模块中，from ...brec import SomeClass 形式的导入 did 触发了导入，并且 brec 模块的 另一个 实例最终在系统模块。该实例有一个未更新的 SomeClass 并出现 AttributeError。

通过手动删除这些引用来修复两者 - 因此 gc 收集了所有模块（75 个中的 5 MB 内存）并且 from .. import brec 确实触发了导入（这 from ... import foo 与 from ...foo import bar 值得提问）。

这个故事的寓意是它是可能的，但是：

包和子包只能相互引用
应从顶级包属性中删除对外部模块/包的所有引用
包引用本身应该从顶级包属性中删除

如果这听起来很复杂且容易出错，那么 - 至少现在我对相互依赖关系及其危险有了更清晰的认识 - 是时候解决这个问题了。

这篇文章是由 Pydev 的调试器赞助的 - 我发现 gc 模块对于了解正在发生的事情非常有用 - 来自 here 的提示。当然有很多变量是调试器的和那些复杂的东西

【讨论】：

我对我们的加载程序有一个未解决的问题：stackoverflow.com/q/41921098/281545