从熊猫数据框中获取数组的子集答案

【问题标题】：getting a subset of arrays from a pandas data frame从熊猫数据框中获取数组的子集
【发布时间】：2015-05-10 07:07:24
【问题描述】：

我有一个名为 arr 的 numpy 数组，其中包含 1154 个元素。

array([502, 502, 503, ..., 853, 853, 853], dtype=int64)

我有一个名为df 的数据框

    team    Count
0   512     11
1   513     21
2   515     18
3   516     8
4   517     4

如何获取数据框 df 的子集，其中仅包含数组 arr 中的值

例如：

team         count
arr1_value1    45
arr1_value2    67

为了让这个问题更清楚：我有一个 numpy 数组 ['45', '55', '65']

我有一个数据框如下：

team  count
34      156
45      189
53       90
65       99
23       77
55       91

我需要一个新的数据框如下：

team    count
 45      189
 55       91
 65       99

【问题讨论】：

标签： python python-2.7 numpy pandas

【解决方案1】：

我不知道这是否是拼写错误，如果您的数组值看起来像字符串，假设它不是并且它们实际上是整数，那么您可以通过调用 isin 过滤您的 df：

In [6]:

a = np.array([45, 55, 65])
df[df.team.isin(a)]
Out[6]:
   team  count
1    45    189
3    65     99
5    55     91

【讨论】：

完美，与我的尝试相同的想法但更好的实现！ +1

【解决方案2】：

你可以使用DataFrame.loc方法

使用您的示例（注意团队是索引）：

arr = np.array(['45', '55', '65'])
frame = pd.DataFrame([156, 189, 90, 99, 77, 91], index=['34', '45', '53', '65', '23', '55'])
ans = frame.loc[arr]

这种索引是类型敏感的，所以如果 frame.index 是 int 则确保你的索引数组也是 int 类型，而不是像本例中的 str 类型。

【讨论】：

如果 arr 包含不在 frame.index 中的额外元素，那么它们将被添加 NaN 值，然后您需要从 ans 表。

【解决方案3】：

我正在回答“为了让这个问题更清楚”之后提出的问题。附带说明：前 4 行可能是您提供的，因此我不必自己输入它们，这也可能导致错误/误解。

这个想法是创建一个系列作为索引，然后简单地基于该索引创建一个新的数据框。刚开始玩pandas，或许这样可以更高效。

import numpy as np
import pandas as pd

# starting with the df and teams as string
df = pd.DataFrame(data={'team': [34, 45, 53, 65, 23, 55], 'count': [156, 189, 90, 99, 77, 91]})
teams = np.array(['45', '55', '65'])

# we want the team number as int
teams_int = [int(t) for t in teams]

# mini function to check, if the team is to be kept
def filter_teams(x):
    return True if x in teams_int else False

# create the series as index and only keep those values from our original df
index = df['team'].apply(filter_teams)
df_filtered = df[index]

它返回这个数据框：

count  team
1    189    45
3     99    65
5     91    55

请注意，在这种情况下，df_filtered 使用 1、3、5 作为索引（原始数据帧的索引）。您的问题对此不清楚，因为索引未显示给我们。

【讨论】：