从熊猫数据框中选择不连续和连续的列答案

【问题标题】：Selecting non-consecutive and consecutive columns from a pandas dataframe从熊猫数据框中选择不连续和连续的列
【发布时间】：2020-07-04 18:49:49
【问题描述】：

我正在尝试从 pandas DataFrame 中选择多个列，但这样做时遇到了麻烦。假设我有以下 DataFrame：

import pandas as pd
import numpy as np

cols = ['test','one','two','three','four','five','six','seven','eight','nine','ten']
df = pd.DataFrame(np.random.rand(10,11).round(2),columns=cols)

我想选择列test、two、four、five、six、seven、eight

我知道如果我想选择单个列，

df[['test','two']]

如果我想选择连续的列，

df.loc[:,'four':'eight']

工作得很好，但我如何简洁地将两者结合起来？

我意识到对于这个具体的例子，写作

df[['test', 'two', 'four', 'five', 'six', 'seven', 'eight']]

也可以，但我想知道是否有办法利用这里大多数列是连续的这一事实来节省一些时间来写它们。

【问题讨论】：

你可以使用iloc
df.iloc[:,start:end:step] 是列表切片的常规语法。
@Datanovice 那么如何使用它来获得我想要的输出呢？看来step 会越过我想保留的列。 df0.iloc[:,0:9:2] 跳过我想保留的 five 和 seven。
参考(stackoverflow.com/a/48545390/8953890)
你的序列逻辑是什么？对我来说看起来很随意。

标签： python python-3.x pandas

【解决方案1】：

np.r_ 建议使用 @Pooja，但使用 get_loc 和 get_indexer 进行基于标签的切片：

a = ['test','two']
b = ['four','eight']
idx= np.r_[df.columns.get_indexer(a),df.columns.get_loc(b[0]):df.columns.get_loc(b[1])+1]
print(df.iloc[:,idx])

   test   two  four  five   six  seven  eight
0  0.11  0.91  0.13  0.99  0.17   0.56   0.21
1  0.70  0.94  0.72  0.48  0.53   0.99   0.27
2  0.37  0.03  0.81  0.18  0.47   0.94   0.77
3  0.13  0.69  0.16  0.80  0.02   0.42   0.48
4  0.79  0.91  0.97  0.83  0.20   0.32   0.58
5  0.12  0.86  0.44  0.01  0.71   0.65   0.03
6  0.77  0.31  0.21  0.73  0.70   0.95   0.11
7  0.09  0.91  0.45  0.35  0.91   0.21   0.92
8  0.28  0.32  0.73  0.93  0.97   0.03   0.93
9  0.55  0.77  0.02  0.18  0.65   0.50   0.85

【讨论】：