如何从 pandas dict 中删除包含 None 的行？答案

【问题标题】：How can I delete row containing None from pandas dict?如何从 pandas dict 中删除包含 None 的行？
【发布时间】：2021-04-12 19:48:40
【问题描述】：

我的数据框如下

df
    time                 home_team     away_team           full_time_result                   both_teams_to_score        double_chance                         League
--  -------------------  ------------  ------------------  ---------------------------------  -------------------------  ------------------------------------  ----------------
 0  2021-01-08 19:45:00  Charlton      Accrington Stanley  {'1': 2370, 'X': 3400, '2': 3000}  {'yes': 1900, 'no': 1900}  {'1X': 1360, '12': 1300, '2X': 1530}  England League 1
 1  2021-01-09 12:30:00  Lincoln City  Peterborough        {'1': 2290, 'X': 3400, '2': 3100}  {'yes': 1800, 'no': 1950}  {'1X': 1360, '12': 1300, '2X': 1570}  England League 1
 2  2021-01-09 13:00:00  Gillingham    Burton Albion       {'1': 2200, 'X': 3400, '2': 3300}  {'yes': 1700, 'no': 2040}  {'1X': 1330, '12': 1300, '2X': 1610}  England League 1
 3  2021-01-09 17:30:00  Ipswich       Swindon             {'1': None, 'X': None, '2': None}  {'yes': 1750, 'no': 2000}  {'1X': 1220, '12': 1250, '2X': 1900}  England League 1

如何删除包含 None 的行？就像在 col full_time_result 中的这个例子一样，我想删除行 {'1': None, 'X': None, '2': None}

谢谢

【问题讨论】：

作为一个仅供参考，因为您已经按照您之前的@987654321@ 将您的字典列扩展为单独的行。最好的选择是在对列进行规范化之后使用df_normalized = df_normalized.dropna()。这将比使用任何提供的解决方案快得多。
这正是我在等待您的解决方案时所做的，但是，我想创建一个更强大的代码处理解决方案，因此，我采用了解决方案 bu @david-erickson

标签： python pandas list dictionary

【解决方案1】：

您可以创建一个布尔掩码来过滤掉full_time_result 和None 在'1' 和'2' 中的值。 Tp 提取值我们可以使用operator.itemgetter 然后使用__eq__ 来检查相等性，即检查它是否是(None, None)

from operator import itemgetter
m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
df[~m]

# Alternative
# m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__ne__)
# df[m]

详情

_.map(itemgetter('1', '2')).map((None, None).__eq__)
# All of this can be written using lambda in single line.

_.map(lambda x: itemgetter('1', '2')(x).__eq__((None, None)))

example_dict = {'1': 10, '2': 20}
itemgetter('1', '2')(example_dict)
# (10, 20)

# Since you want to identify values with `None`. We can leverage on __eq__
itemgetter('1', '2')(example_dict).__eq__((10, 20))
# True # equivalent to (10, 20) == (10, 20)

timeit 结果

# Benchmarking setup
s = pd.Series([{'1':10, '2':20}, {'1':None, '2':None}, {'1':1, '2':2}])
df = s.repeat(1_000_000).to_frame('full_time_result')
df.shape
# (3000000, 1) # 3 million rows, 1 column


# @david's
In [33]: %timeit df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]
1.59 s ± 82.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# @Ch3steR's
In [34]: %%timeit
    ...: m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
    ...: df[~m]
    ...:
    ...:
834 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

≈ 2X 比使用 lambda

【讨论】：

当print(df[m]) 我得到这个行值{'1': None, 'X': None, '2': None} 但我无法删除它。
df[~m]我的坏@PyNoob

【解决方案2】：

使用lambda x:，您将遍历指定列的每一行。从那里，您可以执行普通的 python 操作，如any() 并访问每行字典的values() 并检查是否有任何等于None。那将返回True，所以我们想用~过滤掉这些True结果：

df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]

【讨论】：