访问熊猫系列的索引答案

【问题标题】：Access the index of a pandas series访问熊猫系列的索引
【发布时间】：2016-02-06 02:14:24
【问题描述】：

我正在尝试确定在 pandas 数据帧（我的代码中的 df_temp）中哪个单词计数最多。我也有这个：

 l = df_temp['word'].count_values()

l 显然是一个 pandas 系列，其中第一行指向 df_temp['word'] 中计数最多的索引（在我的情况下是计数最多的单词）。虽然我可以在控制台中看到这个词，但我无法正确理解它。到目前为止，我发现的唯一方法是将其转换为字典，所以我有：

dl = dict(l)

然后我可以轻松地检索我的索引...在对字典进行排序之后。显然，这可以完成工作，但我很确定您有一个更聪明的解决方案，因为这个解决方案非常肮脏和不雅。

【问题讨论】：

标签： python dictionary pandas series

【解决方案1】：

value_counts() 的结果中的index 是您的值：

l.index

会给你计算的值

例子：

In [163]:
df = pd.DataFrame({'a':['hello','world','python','hello','python','python']})
df

Out[163]:
        a
0   hello
1   world
2  python
3   hello
4  python
5  python

In [165]:    
df['a'].value_counts()

Out[165]:
python    3
hello     2
world     1
Name: a, dtype: int64

In [164]:    
df['a'].value_counts().index

Out[164]:
Index(['python', 'hello', 'world'], dtype='object')

所以基本上你可以通过索引系列来获得特定的字数：

In [167]:
l = df['a'].value_counts()
l['hello']

Out[167]:
2

【讨论】：

如果您想要一个负索引搜索，如“所有非 python 单词的值”。是l[-['python']] 还是l[l != ['python']]？
@PierreLafortune 你必须这样做df[~df['a'].str.contains('python')]
但这并没有给出价值。然后你会把它扩展到df[~df['a'].str.contains('python')]['a'].value_counts()吗？
@PierreLafortune 对不起，我不明白你的问题，请l[~l.index.str.contains('python')]

【解决方案2】：

使用 Pandas，您可以在 word 列中找到最常见的值：

df['word'].value_counts().idxmax()

下面的代码将为您提供该值的计数，即该列中的最大计数：

df['word'].value_counts().max()

【讨论】：