【问题标题】:Creating column on filtered pandas DataFrame在过滤的熊猫 DataFrame 上创建列
【发布时间】:2017-03-02 11:26:03
【问题描述】:

从从csv 文件加载的初始 DataFrame,

df = pd.read_csv("file.csv",sep=";")

我得到一个过滤副本

df_filtered = df[df["filter_col_name"]== value]

但是,当使用diff() 方法创建新列时,

df_filtered["diff"] = df_filtered["feature"].diff()

我收到以下警告:

/usr/local/bin/ipython3:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/bin/python3

我还注意到处理时间很长。

令人惊讶的是(至少对我来说......),如果我在未过滤的 DataFrame 上做同样的事情,我运行良好。

我应该如何继续在过滤后的数据上创建一个“差异”列?

【问题讨论】:

    标签: python pandas data-science


    【解决方案1】:

    你需要copy:

    如果您稍后修改 df_filtered 中的值,您会发现修改不会传播回原始数据 (df),并且 Pandas 会发出警告。

    #need process sliced df, return sliced df
    df_filtered = df[df["filter_col_name"]== value].copy()
    

    或者:

    #need process sliced df, return all df
    df.loc[df["filter_col_name"]== value, 'feature'] = 
    df.loc[df["filter_col_name"]== value , 'feature'].diff()
    

    示例:

    df = pd.DataFrame({'filter_col_name':[1,1,3],
                       'feature':[4,5,6],
                       'C':[7,8,9],
                       'D':[1,3,5],
                       'E':[5,3,6],
                       'F':[7,4,3]})
    
    print (df)
       C  D  E  F  feature  filter_col_name
    0  7  1  5  7        4                1
    1  8  3  3  4        5                1
    2  9  5  6  3        6                3
    
    value = 1
    
    df_filtered = df[df["filter_col_name"]== value].copy()
    df_filtered["diff"] = df_filtered["feature"].diff()
    print (df_filtered)
       C  D  E  F  feature  filter_col_name  diff
    0  7  1  5  7        4                1   NaN
    1  8  3  3  4        5                1   1.0
    

    value = 1
    
    df.loc[df["filter_col_name"]== value, 'feature'] = 
    df.loc[df["filter_col_name"]== value , 'feature'].diff()
    
    print (df)
       C  D  E  F  feature  filter_col_name
    0  7  1  5  7      NaN                1
    1  8  3  3  4      1.0                1
    2  9  5  6  3      6.0                3
    

    【讨论】:

      【解决方案2】:

      尝试使用

      df_filtered.loc[:, "diff"] = df_filtered["feature"].diff()
      

      【讨论】:

        猜你喜欢
        • 2017-11-14
        • 1970-01-01
        • 2013-06-29
        • 2018-08-16
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-07-02
        相关资源
        最近更新 更多