pandas：检测和打印数据框中的异常值答案

【问题标题】：pandas: detect and print outliers in a dataframepandas：检测和打印数据框中的异常值
【发布时间】：2021-06-28 12:51:33
【问题描述】：

我正在尝试识别和打印包含异常值的数据框的行。就像一个实验一样，我正在考虑 6 到 10 之间的“xy”列下的所有值的异常值，这些值对应于“x”列下的“C”类。我不知道为什么，我的代码打印了一个空输出。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data=[['A', 1,2 ,5],
        ['B', 5,5,6],
        ['C', 4,6,7]
        ,['A', 6,5,4],
        ['B',9,9,3],
        ['C', 7,9,1]
        ,['A', 2,3,1],
        ['B', 5,1,2],
        ['C',2,10,9]
        ,['B', 8,2,8],
        ['B', 5,4,3],
        ['C', 8,5 ,3]]
df = pd.DataFrame(data, columns=['x','y','z','xy'])
plt.scatter(df['x'], df['xy']) 
outliers= (df['xy'].between(6,10,inclusive=False)  & df['x']=='C')
outliers_location=(df[outliers].index.values.tolist())

print(outliers_location) # should not print an empty list

【问题讨论】：

标签： python pandas dataframe outliers

【解决方案1】：

您需要将第二个条件带入()，否则会被错误解析。没有它，它会尝试将df['xy'].between(6,10,inclusive=False) & df['x'] 与C 进行比较

>>> outliers= (df['xy'].between(6,10,inclusive=False)  & (df['x']=='C'))
>>> outliers_location=(df[outliers].index.values.tolist())
>>> print(outliers_location)
[2, 8]

【讨论】：