在 Python Pandas 的源代码中，哪些 URL 是由 pd.read_csv 处理的？答案

【问题标题】：Where in Python Pandas' source code are URLs handled by pd.read_csv?在 Python Pandas 的源代码中，哪些 URL 是由 pd.read_csv 处理的？
【发布时间】：2023-03-12 19:20:01
【问题描述】：

pandas.read_csv 函数非常灵活，最近开始支持 URL 输入，如 here 所述

df = pd.read_csv('http://www.somefile.csv')

我试图在源代码中找到处理这种情况的地方。以下是我目前所知道的：

1) read_csv 是由_make_parser_function 在io/parsers.py 中生成的相当通用的包装器

2) _make_parser_function 生成的函数将数据的读取委托给 _read(filepath_or_buffer, kwds) 函数，该函数在 io/parsers.py 的其他地方定义

3) 这个函数_read(filepath_or_buffer, kwds) 创建一个TextFileReader 并返回TextFileReader.read() 的结果。但是，TextFileReader 似乎只负责文本文件。它提供了处理各种类型压缩的功能，但我没有看到任何检查 URL 输入的内容。

4) 另一方面，io/html.py 包含一个函数_read(obj)，显然是访问一个 URL 并返回一个 http 查询的结果。

在我看来，这个问题的简单解决方案是检查输入字符串是否是 URL，如果是，则分派到 html 模块；但是，通过read_csv 进行跟踪时，我找不到发生这种情况的位置。有人能指出我正确的方向吗？

【问题讨论】：

标签： python pandas http url

【解决方案1】：

你错过了 2 到 3 之间的一步。

2.5) _read calls get_filepath_or_buffer() 网址被识别并读取。

filepath_or_buffer, _, compression = get_filepath_or_buffer(
    filepath_or_buffer, encoding, compression)

get_filepath_or_buffer() is defined in pandas.io.common:

def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
                           compression=None):
    """
    If the filepath_or_buffer is a url, translate and return the buffer.
    Otherwise passthrough.
    Parameters
    ----------
    filepath_or_buffer : a url, filepath (str, py.path.local or pathlib.Path),
                         or buffer
    encoding : the encoding to use to decode py3 bytes, default is 'utf-8'
    Returns
    -------
    a filepath_or_buffer, the encoding, the compression
    """
    ...

【讨论】：

谢谢 - 相关行实际上是您在文档字符串 if _is_url(filepath_or_buffer): 之后指出的函数中的第二行代码