在带有多个分隔符的 pandas 中读取多索引 CSV答案

【问题标题】：reading a multi-indexed CSV in pandas with multiple delimiters在带有多个分隔符的 pandas 中读取多索引 CSV
【发布时间】：2018-05-01 12:32:57
【问题描述】：

我正在尝试创建一个非常易于阅读的脚本，该脚本将被多索引。它看起来像这样：

A
    one    : some data
    two    : some other data

B
    one    : foo
    three  : bar

我想使用 pandas 的 read_csv 将其作为多索引文件自动读入，\t 和 : 用作分隔符，以便我可以轻松地按部分切片（即 A 和乙）。我理解类似header=[0,1] 和tupleize_cols 可能用于此目的，但我无法做到这一点，因为它似乎不想正确读取制表符和冒号。如果我使用sep='[\t:]'，它会使用前导标签。如果我不使用正则表达式并使用sep='\t' 阅读，它会正确显示制表符，但不处理冒号。这可以使用read_csv 吗？我可以逐行做，但必须有更简单的方法:)

这是我想到的输出。我在索引和列中添加了标签，希望在阅读时可以应用：

                  value      
index_1   index_2
A         one     some data
          two     some other data
B         one     foo
          three   bar

编辑：我使用 Ben.T 的部分答案来获得我需要的东西。我不喜欢我的解决方案，因为我正在写入临时文件，但它确实有效：

with open('temp.csv','w') as outfile:
    for line in open(reader.filename,'r'):
        if line[0] != '\t' or not line.strip():
            index1 = line.split('\n')[0]
        else:
            outfile.write(index1+':'+re.sub('[\t]+','',line))

pd.read_csv('temp.csv', sep=':', header=None, \
    names = ['index_1', 'index_2', 'Value'] ).set_index(['index_1', 'index_2'])

【问题讨论】：

你能发布预期的输出吗？
啊，是的，我应该这样做 - 谢谢！我在帖子中添加了一个示例输出部分

标签： python pandas

【解决方案1】：

您可以在 read_csv 中使用两个分隔符，例如：

pd.read_csv( path_file, sep=':|\t', engine='python')

注意engine='python' 以防止出现警告。

编辑：使用您的输入格式似乎很困难，但输入如下：

A   one    : some data
A   two    : some other data
B   one    : foo
B   three  : bar

在 A 或 B 之后使用 \t 作为分隔符，然后您将通过以下方式获得多索引：

pd.read_csv(path_file, sep=':|\t', header = None, engine='python', names = ['index_1', 'index_2', 'Value'] ).set_index(['index_1', 'index_2'])

【讨论】：