探测python函数答案

【问题标题】：Probing a python function探测python函数
【发布时间】：2014-12-03 04:22:36
【问题描述】：

我可以在 python 中执行此操作，它为我提供了函数中可用的子模块/参数。

在解释器中，我可以这样做：

>>> from nltk import pos_tag
>>> dir(pos_tag)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']

顺便说一句，dir(function) 的电话是什么？

我如何知道调用函数需要哪些参数？ 例如pos_tag的情况，源码说需要token，见https://github.com/nltk/nltk/blob/develop/nltk/tag/init.py

def pos_tag(tokens):
    """
    Use NLTK's currently recommended part of speech tagger to
    tag the given list of tokens.
        >>> from nltk.tag import pos_tag # doctest: +SKIP
        >>> from nltk.tokenize import word_tokenize # doctest: +SKIP
        >>> pos_tag(word_tokenize("John's big idea isn't all that bad.")) # doctest: +SKIP
        [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
        'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
        ('.', '.')]
    :param tokens: Sequence of tokens to be tagged
    :type tokens: list(str)
    :return: The tagged tokens
    :rtype: list(tuple(str, str))
    """
    tagger = load(_POS_TAGGER)
    return tagger.tag(tokens)

如果函数有可用的文档字符串，是否有办法知道函数期望特定参数的参数类型是什么？，例如在上面pos_tag 的情况下是:param tokens: Sequence of tokens to be tagged 和:type tokens: list(str) 在不阅读代码的情况下运行解释器时可以获取这些信息吗？

最后，有没有办法知道返回类型是什么？

为了清楚起见，我并不期待文档字符串的打印输出，但上面的问题是为了让我稍后可以使用 isinstance(output_object, type) 进行某种类型检查

【问题讨论】：

标签： python function parameters return

【解决方案1】：

以下是您四个问题的答案。恐怕你想做的一些事情在标准库中是不可能的，除非你想自己解析文档字符串。

(1) 顺便说一句，dir(function) 调用是什么？

如果我正确理解了这个问题，我相信文档会回答这个问题here：

如果对象有一个名为__dir__()的方法，这个方法会被调用并且必须返回属性列表。这允许对象实现自定义 __getattr__() 或 __getattribute__() 函数以自定义 dir() 报告其属性的方式。

如果对象不提供__dir__()，函数会尽力而为从对象的__dict__ 属性中收集信息，如果定义，并来自其类型对象。

(2)如何知道调用函数需要哪些参数？

最好的方法是使用inspect：

>>> from nltk import pos_tag
>>> from inspect import getargspec
>>> getargspec(pos_tag)
ArgSpec(args=['tokens'], varargs=None, keywords=None, defaults=None)  # a named tuple
>>> getargspec(pos_tag).args
['tokens']

(3) 如果函数有可用的文档字符串，有没有办法知道函数期望的参数类型是什么具体参数？

不在标准库中，除非您想自己解析文档字符串。您可能已经知道可以像这样访问文档字符串：

>>> from inspect import getdoc
>>> print getdoc(pos_tag)
Use NLTK's currently recommended part of speech tagger to
tag the given list of tokens.

    >>> from nltk.tag import pos_tag
    >>> from nltk.tokenize import word_tokenize
    >>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
    [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
    'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
    ('.', '.')]

:param tokens: Sequence of tokens to be tagged
:type tokens: list(str)
:return: The tagged tokens
:rtype: list(tuple(str, str))

或者这个：

>>> print pos_tag.func_code.co_consts[0]

    Use NLTK's currently recommended part of speech tagger to
    tag the given list of tokens.

        >>> from nltk.tag import pos_tag
        >>> from nltk.tokenize import word_tokenize
        >>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
        [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
        'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
        ('.', '.')]

    :param tokens: Sequence of tokens to be tagged
    :type tokens: list(str)
    :return: The tagged tokens
    :rtype: list(tuple(str, str))

如果您想尝试自己解析参数和“类型”，您可以从正则表达式开始。不过，很明显，我使用“类型”这个词是松散的。此外，这种方法仅适用于以这种特定方式列出其参数和类型的文档字符串：

>>> import re
>>> params = re.findall(r'(?<=:)type\s+([\w]+):\s*(.*?)(?=\n|$)', getdoc(pos_tag))
>>> for param, type_ in params:
    print param, '=>', type_

tokens => list(str)

这种方法的结果当然会为您提供参数及其相应的描述。您还可以通过拆分字符串并仅保留满足以下要求的单词来检查描述中的每个单词：

>>> isinstance(eval(word), type)
True
>>> isinstance(eval('list'), type)
True

但是这种方法很快就会变得复杂，尤其是在尝试解析 pos_tag 的最后一个参数时。此外，文档字符串通常根本没有这种格式。所以这可能只适用于nltk，但即便如此也不是一直有效。

(4) 最后，有没有办法知道返回类型是什么？

再次，恐怕不是，除非您想使用上面的正则表达式示例来梳理文档字符串。返回类型可能会因 arg(s) 类型而异。（考虑任何可与任何可迭代对象一起使用的函数。）如果您想尝试从文档字符串中提取此信息（同样，以pos_tag 文档字符串的确切格式），您可以尝试另一个正则表达式：

>>> return_ = re.search(r'(?<=:)rtype:\s*(.*?)(?=\n|$)', getdoc(pos_tag))
>>> if return_:
    print 'return "type" =', return_.group()

return "type" = rtype: list(tuple(str, str))

否则，我们在这里能做的最好的事情就是获取源代码（这又是你不想要的）：

>>> import inspect
>>> print inspect.getsource(pos_tag)
def pos_tag(tokens):
    """
    Use NLTK's currently recommended part of speech tagger to
    tag the given list of tokens.

        >>> from nltk.tag import pos_tag
        >>> from nltk.tokenize import word_tokenize
        >>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
        [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
        'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
        ('.', '.')]

    :param tokens: Sequence of tokens to be tagged
    :type tokens: list(str)
    :return: The tagged tokens
    :rtype: list(tuple(str, str))
    """
    tagger = load(_POS_TAGGER)
    return tagger.tag(tokens)

【讨论】：

@alvas 你提出了很好的问题！我喜欢你深入挖掘 nltk 的方式。