【发布时间】:2018-11-08 23:01:25
【问题描述】:
我有一个包含“Self_Employed”列的数据集。在这些列中是值“是”、“否”和“NaN”。我想用 calc() 中计算的值替换 NaN 值。我尝试了一些我在这里找到的方法,但我找不到适用于我的方法。 这是我的代码,我把我尝试过的东西放在 cmets 中。:
# Handling missing data - Self_employed
SEyes = (df['Self_Employed']=='Yes').sum()
SEno = (df['Self_Employed']=='No').sum()
def calc():
rand_SE = randint(0,(SEno+SEyes))
if rand_SE > 81:
return 'No'
else:
return 'Yes'
> # df['Self_Employed'] = df['Self_Employed'].fillna(randint(0,100))
> #df['Self_Employed'].isnull().apply(lambda v: calc())
>
>
> # df[df['Self_Employed'].isnull()] = df[df['Self_Employed'].isnull()].apply(lambda v: calc())
> # df[df['Self_Employed']]
>
> # df_nan['Self_Employed'] = df_nan['Self_Employed'].isnull().apply(lambda v: calc())
> # df_nan
>
> # for i in range(df['Self_Employed'].isnull().sum()):
> # print(df.Self_Employed[i]
df[df['Self_Employed'].isnull()] = df[df['Self_Employed'].isnull()].apply(lambda v: calc())
df
现在我用 df_nan 尝试的那一行似乎可以工作,但是我有一个单独的集合,其中只有以前的缺失值,但我想填充整个数据集中的缺失值。对于我遇到错误的最后一行,我链接到它的屏幕截图。 你明白我的问题吗?如果是,你能帮忙吗?
This is the set with only the rows where Self_Employed is NaN
【问题讨论】:
标签: python pandas data-cleaning