Pandas 中双括号 `[[...]]` 和单括号 `[..]` 索引的区别答案

【问题标题】：The difference between double brace `[[...]]` and single brace `[..]` indexing in PandasPandas 中双括号 `[[...]]` 和单括号 `[..]` 索引的区别
【发布时间】：2017-07-19 21:12:48
【问题描述】：

我对以下代码行的语法感到困惑：

x_values = dataframe[['Brains']]

数据框对象由 2 列（大脑和身体）组成

Brains Bodies
42     34
32     23

当我打印 x_values 时，我会得到这样的结果：

Brains
0  42
1  32

就数据框对象的属性和方法而言，我知道 pandas 文档，但双括号语法让我感到困惑。

【问题讨论】：

标签： python pandas syntax

【解决方案1】：

考虑一下：

来源 DF：

In [79]: df
Out[79]:
   Brains  Bodies
0      42      34
1      32      23

选择一列 - 生成 Pandas.Series：

In [80]: df['Brains']
Out[80]:
0    42
1    32
Name: Brains, dtype: int64

In [81]: type(df['Brains'])
Out[81]: pandas.core.series.Series

选择 DataFrame 的子集 - 生成 DataFrame：

In [82]: df[['Brains']]
Out[82]:
   Brains
0      42
1      32

In [83]: type(df[['Brains']])
Out[83]: pandas.core.frame.DataFrame

结论：第二种方法允许我们从 DataFrame 中选择多个列。第一个仅用于选择单列...

演示：

In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))

In [85]: df
Out[85]:
          a         b         c         d         e         f
0  0.065196  0.257422  0.273534  0.831993  0.487693  0.660252
1  0.641677  0.462979  0.207757  0.597599  0.117029  0.429324
2  0.345314  0.053551  0.634602  0.143417  0.946373  0.770590
3  0.860276  0.223166  0.001615  0.212880  0.907163  0.437295
4  0.670969  0.218909  0.382810  0.275696  0.012626  0.347549

In [86]: df[['e','a','c']]
Out[86]:
          e         a         c
0  0.487693  0.065196  0.273534
1  0.117029  0.641677  0.207757
2  0.946373  0.345314  0.634602
3  0.907163  0.860276  0.001615
4  0.012626  0.670969  0.382810

如果我们在列表中只指定一列，我们将得到一个包含一列的 DataFrame：

In [87]: df[['e']]
Out[87]:
          e
0  0.487693
1  0.117029
2  0.946373
3  0.907163
4  0.012626

【讨论】：

只是为了密封任何可能的混淆，第一种形式相当于column = 'Brains'; df[column]，第二种相当于subset = ['Brains']; df[subset]。第一个传递一个字符串，第二个传递一个列表。并不是使用[[ 和]] 做了一种特殊形式的索引，而是传递的对象是不同的类型。
谢谢，有道理。你知道双括号是 Python 语法还是特定于数据框对象。我试着回顾一下数组和对象的 python 语法，但找不到任何东西。
@MikeFellner 查看我的评论。它准确地回答了您的问题。
我已将评论作为答案，因为似乎有两件事使 OP 感到困惑。
所以简单来说，如果一个索引是一个数组，它会返回匹配列的DataFrame，否则它会返回一个单一匹配列的系列。

【解决方案2】：

[[ 和 ]] 在 Python 中没有特殊语法。相反，正在创建一个列表，然后将该列表作为参数传递给 DataFrame 索引函数。

根据@MaxU 的回答，如果您将单个字符串传递给 DataFrame，则会返回一个表示该列的系列。如果您传递一个字符串列表，则会返回一个包含给定列的 DataFrame。

所以，当您执行以下操作时

# Print "Brains" column as Series
print(df['Brains'])
# Return a DataFrame with only one column called "Brains"
print(df[['Brains']])

相当于下面的

# Print "Brains" column as Series
column_to_get = 'Brains'
print(df[column_to_get])
# Return a DataFrame with only one column called "Brains"
subset_of_columns_to_get = ['Brains']
print(df[subset_of_columns_to_get])

在这两种情况下，DataFrame 都使用 [] 运算符进行索引。

Python 使用[] 运算符来索引和构造列表文字，最终我相信这是你的困惑。 df[['Brains']] 中的外部[ 和] 正在执行索引，而内部正在创建列表。

>>> some_list = ['Brains']
>>> some_list_of_lists = [['Brains']]
>>> ['Brains'] == [['Brains']][0]
True
>>> 'Brains' == [['Brains']][0][0] == [['Brains'][0]][0]
True

我在上面说明的是，Python 从来没有看到[[ 并专门解释它。在最后一个复杂的示例（[['Brains'][0]][0]）中，没有特殊的 ][ 运算符或 ]][ 运算符......发生了什么

创建了一个单元素列表 (['Brains'])
该列表的第一个元素已编入索引 (['Brains'][0] => 'Brains')
将其放入另一个列表 ([['Brains'][0]] => ['Brains'])
然后该列表的第一个元素被编入索引 ([['Brains'][0]][0] => 'Brains')

【讨论】：

这应该是公认的答案。我也对语法很困惑。在理解了双 [[]] 只是两个 [] 之后，这一切都变得有意义了。例如，df[ df['col'] == val ] 返回一个数据帧而不是系列，因为df['col'] == val 是一个系列，结果是一系列系列，即一个数据帧。
是的，这个答案是最好的。它澄清了一些我（可能是大多数人）忽略的明显事情。 [['Brains']][0][0][0] 访问字符串 'Brains' 中的 'B'。 [['Brains']][0][0] 访问字符串'Brains'。 [['Brains']][0] 访问列表 [['Brains']] 的单元素列表中的第一个（也是唯一的）列表 ['Brains']。如果这是一个多元素列表 [['Brains']、['Bodies']、['Souls']]，那么访问操作员可以钻取三个列表中的任何一个。 [['Brains'], ['Bodies'], ['Souls']][2][0][0] 访问列表 3 中的 'S'。 [['Brains'], ['Bodies'], [ '灵魂']][0] = ['大脑']。
很棒的插图！谢谢

【解决方案3】：

其他解决方案展示了系列和数据框之间的区别。对于有数学头脑的人，您可能希望考虑输入和输出的维度。总结如下：

Object                                Series          DataFrame
Dimensions (obj.ndim)                      1                  2
Syntax arg dim                             0                  1
Syntax                             df['col']        df[['col']]
Max indexing dim                           1                  2
Label indexing              df['col'].loc[x]   df.loc[x, 'col']
Label indexing (scalar)      df['col'].at[x]    df.at[x, 'col']
Integer indexing           df['col'].iloc[x]  df.iloc[x, 'col']
Integer indexing (scalar)   df['col'].iat[x]   dfi.at[x, 'col']

当您为pd.DataFrame.__getitem__ 指定标量或列表参数时，[] 是语法糖，您的参数的维度比结果的维度小一。因此，标量（0 维）给出了 1 维序列。列表（一维）给出一个二维数据框。这是有道理的，因为附加维度是数据帧索引，即行。即使您的数据框恰好没有行也是如此。

【讨论】：

【解决方案4】：

[ ] 和 [[ ]] 是 NumPy 的概念。

尝试了解np.array创建和使用reshape的基础知识并检查ndim，你就会明白。在这里查看我的答案。

https://stackoverflow.com/a/70194733/7660981

【讨论】：

您的答案可以通过额外的支持信息得到改进。请edit 添加更多详细信息，例如引用或文档，以便其他人可以确认您的答案是正确的。你可以找到更多关于如何写好答案的信息in the help center。