使用 Pandas 对多索引数据进行切片答案

【问题标题】：Slicing Mutliindex data with Pandas使用 Pandas 对多索引数据进行切片
【发布时间】：2025-12-10 18:50:01
【问题描述】：

我已将 csv 作为多索引数据框导入。这是数据的模型：

df = pd.read_csv("coursedata2.csv", index_col=[0,2])

print (df)

                                  COURSE

ID Course List 12345 Interior Environments DESN10000 Rendering & Present Skills DESN20065 Lighting DESN20025 22345 Drawing Techniques DESN10016 Colour Theory DESN14049 Finishes & Sustainable Issues DESN12758 Lighting DESN20025 32345 Window Treatments&Soft Furnish DESN27370 42345 Introduction to CADD INFO16859 Principles of Drafting DESN10065 Drawing Techniques DESN10016 The Fundamentals of Design DESN15436 Colour Theory DESN14049 Interior Environments DESN10000 Drafting DESN10123 Textiles and Applications DESN10199 Finishes & Sustainable Issues DESN12758

[17 rows x 1 columns]

我可以使用 .xs 轻松地按标签对其进行切片——例如：

selected = df.xs (12345, level='ID') print selected

                        COURSE
Course List                          
Interior Environments       DESN10000
Rendering & Present Skills  DESN20065
Lighting                    DESN20025

[3 rows x 1 columns]

但我想做的是逐步浏览数据框并按 ID 对每个课程块执行操作。真实数据中的 ID 值是相当随机的整数，按升序排列。

df.index 显示：

df.index MultiIndex(levels=[[12345, 22345, 32345, 42345], [u'Colour Theory', u'Colour Theory ', u'Drafting', u'Drawing Techniques', u'Finishes & Sustainable Issues', u'Interior Environments', u'Introduction to CADD', u'Lighting', u'Principles of Drafting', u'Rendering & Present Skills', u'Textiles and Applications', u'The Fundamentals of Design', u'Window Treatments&Soft Furnish']], labels=[[0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3], [5, 9, 7, 3, 1, 4, 7, 12, 6, 8, 3, 11, 0, 5, 2, 10, 4]], names=[u'ID', u'Course List'])

在我看来，我应该能够使用第一个索引标签来增加数据框。 IE。获取标签 0 然后 1 然后 2 然后 3 的所有课程，......但看起来 .xs 不会按标签切片。

我错过了什么吗？

【问题讨论】：

试试``df.groupby(level='ID').apply(func)`，见这里：pandas.pydata.org/pandas-docs/stable/…

标签： python csv pandas indexing slice

【解决方案1】：

因此，可能有更有效的方法来执行此操作，具体取决于您尝试对数据执行的操作。但是，有两种方法立即浮现在脑海中：

for id_label in df.index.levels[0]:
    some_func(df.xs(id_label, level='ID'))

和

for id_label in df.index.levels[0]:
    df.xs(id_label, level='ID').apply(some_func, axis=1)

取决于您是要对整个组进行操作，还是对其中的每一行进行操作。

【讨论】：

我想将切片与静态课程列表进行比较，所以我认为第一种方法看起来很有前途。非常感谢。