从数据框中选择特定列答案

【问题标题】：Selecting specific columns from a Data Frame从数据框中选择特定列
【发布时间】：2022-07-08 09:55:49
【问题描述】：

大家！！我有个问题。想象一个包含 [a, b, c, e, f, g, h, i, j] 列的数据框。我想创建一个只有 a、c-g 列的第二个 DF。我怎样才能在一个单一的coman中做到这一点而不创建一个列表来放置列？比如我是这样写的：

columns = ['a', 'c', 'e', 'f', 'g']
df2 = df.loc[:,~df.columns.isin(columns)]

我会知道是否有类似的东西：

df2 = df.loc[:,'a': 'g']

但不包括“b”列。

第二种方式我做了两个命令，一个从 a-g 中选择，第二个是删除 b。

我想知道我是否可以同时从 a-g 中选择并删除 b

【问题讨论】：

标签： python pandas

【解决方案1】：

最简单的方法是使用切片表示法 .loc，正如您演示的那样，同时调用 .drop 来删除任何特定的不需要的列：

创建数据

>>> df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])
>>> df
   a  b  c  d  e  f  g  h  i  j
0  0  1  2  3  4  5  6  7  8  9
1  0  1  2  3  4  5  6  7  8  9
2  0  1  2  3  4  5  6  7  8  9
3  0  1  2  3  4  5  6  7  8  9
4  0  1  2  3  4  5  6  7  8  9

`.loc` 和丢弃

相当简单，使用.loc 执行切片，然后使用drop 执行您不想要的任何内容。

>>> df.loc[:, 'a':'g'].drop(columns='b')
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

使用索引

如果您想尽可能高效地使用索引，可以使用 Index.slice_indexer 和 .drop，这样您就不会创建数据的临时子集（就像我们在上面所做的那样）：

>>> columns = df.columns[df.columns.slice_indexer('a', 'g')].drop('b')
>>> df[columns]
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

【讨论】：

谢谢！！锁定和丢弃方法真的很有帮助！

【解决方案2】：

你可以使用

df2 = df[[a, c, d, e, f, g]].copy()

或

df2 = df.copy()
del df2[b]

【讨论】：

我想知道是否有像您的第一个这样的解决方案，但需要逐列指定。

【解决方案3】：

如果您不想手动将列写入列表，有几种方法可以解决此问题

#Firstly, if you wanted to simply pull back only columns that are sequential you could use an np.arange() to get the column indexes pulled back
df.iloc[:,np.arange(2, 5).tolist()]

#Secondly, if you wanted to pull back some columns sequential, but remove one in the middle you could use a pop on a list of ints to represent your column index
column_list = np.arange(2, 5).tolist()
#This pop will remove the 1 index of the list you created in the np.arange() above
column_list.pop(1)
df.iloc[:,column_list]

【讨论】：

感谢您的回答。我想在一些人身上做这一切作为一个班轮

【解决方案4】：

一个选项是使用来自pyjanitor 的select_columns，它提供了一个抽象。

我将重用@CameronRiddell 的示例数据：

# pip install pyjanitor
import pandas as pd 
import janitor

df = pd.DataFrame([[*range(10)]]*5, columns=[*'abcdefghij'])

# pass in the arguments:
df.select_columns('a', slice('c','g'))

   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

您可以在没有其他库的情况下使用 Pandas filter 完成此操作：

df.filter(regex = '[ac-g]')
   a  c  d  e  f  g
0  0  2  3  4  5  6
1  0  2  3  4  5  6
2  0  2  3  4  5  6
3  0  2  3  4  5  6
4  0  2  3  4  5  6

【讨论】：

创建数据

.loc 和丢弃

使用索引

`.loc` 和丢弃