【问题标题】:How to deal with NaN values in data in Python?如何在 Python 中处理数据中的 NaN 值?
【发布时间】:2019-11-09 15:48:54
【问题描述】:

我有一个大型数据集,在多列中包含许多 NaN 值。

我尝试了以下代码,但它没有从数据集中删除 Nan 值

df = pd.read_excel('sec3_data.xlsx')
df.dropna(subset=["Deviation from Partisanship"])
df['Deviation from Partisanship'].unique()

输出:

array([nan, 'Vote for opposing party', 'Vote for own party'], dtype=object)

它清楚地表明仍有一些可用的 nan 值。如何删除它们?

【问题讨论】:

    标签: python data-science data-analysis missing-data


    【解决方案1】:

    您需要重新分配给新的数据框:

    df2 = df.dropna(subset=["Deviation from Partisanship"])
    

    或者执行dropinplace

    df.dropna(subset=["Deviation from Partisanship"], inplace=True)
    

    您可以在此处的文档中找到更多信息:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

    【讨论】:

      【解决方案2】:

      你需要写成,

      df = df.dropna(subset=["Deviation from Partisanship"])
      

      或者,

      df.dropna(subset=["Deviation from Partisanship"], inplace=True)
      

      【讨论】:

        【解决方案3】:
        # Method 1
        df = pd.read_excel('sec3_data.xlsx')
        df.dropna(subset=["Deviation from Partisanship"], inplace=True)
        df['Deviation from Partisanship'].unique()
        
        # Method 2
        df = pd.read_excel('sec3_data.xlsx')
        df2 = df.dropna(subset=["Deviation from Partisanship"])
        df2['Deviation from Partisanship'].unique()
        

        【讨论】:

          猜你喜欢
          • 2021-10-01
          • 2020-11-19
          • 2020-09-25
          • 2019-04-28
          • 2012-04-18
          • 2017-08-30
          • 2018-02-27
          • 2018-08-22
          • 1970-01-01
          相关资源
          最近更新 更多