从具有大于 5 个缺失值的行中删除缺失值，然后打印每列中缺失值的百分比答案

【问题标题】：Remove the missing values from the rows having greater than 5 missing values and then print the percentage of missing values in each column从具有大于 5 个缺失值的行中删除缺失值，然后打印每列中缺失值的百分比
【发布时间】：2019-08-08 00:41:42
【问题描述】：

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d= df.loc[df.isnull().sum(axis=1)>5]
d.dropna(axis=0,inplace=True)
print(round(100*(1-df.count()/len(df)),2))

我得到的输出为

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.24
Discount               0.65
Order_Quantity         0.65
Profit                 0.65
Shipping_Cost          0.65
Product_Base_Margin    1.30

dtype: float64

但输出是

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06

dtype: float64

【问题讨论】：

你能创建一个小例子来复制这个问题吗？现在它不是很清楚（IMO）你到底想要实现什么。检查this

标签： python pandas

【解决方案1】：

试试这个方法：

df.drop(df[df.isnull().sum(axis=1)>5].index,axis=0,inplace=True)

print(round(100*(1-df.count()/len(df)),2))

【讨论】：

而不是print(round(100*(1-df.count()/len(df)),2)) 使用df.isnull().sum()

【解决方案2】：

我认为您正在尝试查找空值总和大于 5 的 行索引。使用 np.where 而不是 df.loc 来查找索引，然后删除它们。

试试：

import pandas as pd
import numpy as np
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d = np.where(df.isnull().sum(axis=1)>5)
df= df.drop(df.index[d])
print(round(100*(1-df.count()/len(df)),2))

输出：

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06
dtype: float64

【讨论】：

【解决方案3】：

试试这个，应该可以的

df = df[df.isnull().sum(axis=1) <= 5]
print(round(100*(1-df.count()/len(df)),2))

【讨论】：

【解决方案4】：

试试这个解决方案


import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<=5]
print(round(100*(df.isnull().sum()/len(df.index)),2))

【讨论】：

【解决方案5】：

这应该可以。

df = df.drop(df[df.isnull().sum(axis=1) > 5].index)

print(round(100 * (df.isnull().sum() / len(df.index)), 2))

【讨论】：