【问题标题】:Applying Pandas Create Column Method With a Function应用 Pandas 创建具有函数的列方法
【发布时间】:2018-08-17 08:11:15
【问题描述】:

我正在尝试优化我的代码并节省时间。我当前的解决方案有效,但是当我将类似功能应用于多个数据帧时,它是多余的且不可维护。

如何根据另一列的条件自动创建新列?

一些数据:

import pandas as pd

df = {'Column1': [1,2,3,4,5],
        'Column2': ["A","B","C","D","E"]}
df = pd.DataFrame(df, columns=['Column1','Column2'])

df


Column1 Column2
0   1   A
1   2   B
2   3   C
3   4   D
4   5   E

方法 1:有效,但每次我需要对新数据帧执行类似操作时都无法维护

# create band if column 2 contains A-C
df['Col_2_Band V1'] = "D-E"
df['Col_2_Band V1'][df['Column2'].isin(['A','B','C'])] = "A-C"
df


Column1 Column2 Col_2_Band V1
0   1   A   A-C
1   2   B   A-C
2   3   C   A-C
3   4   D   D-E
4   5   E   D-E

方法二:无法上班

def applyV2(row):
    row['Col_2_Band V2'] = "D-E"
    row['Col_2_Band V2'][df['Column2'].isin(['A','B','C'])] = "A-C"
    return row

df = df.apply(applyV2, axis=1)

**Error:**
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-cf5d31427d02> in <module>()
      4     return row
      5 
----> 6 df = df.apply(applyV2, axis=1)

C:\Users\cfeld\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4852                         f, axis,
   4853                         reduce=reduce,
-> 4854                         ignore_failures=ignore_failures)
   4855             else:
   4856                 return self._apply_broadcast(f, axis)

C:\Users\cfeld\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4948             try:
   4949                 for i, v in enumerate(series_gen):
-> 4950                     results[i] = func(v)
   4951                     keys.append(v.name)
   4952             except Exception as e:

<ipython-input-8-cf5d31427d02> in applyV2(row)
      1 def applyV2(row):
      2     row['Col_2_Band V2'] = "D-E"
----> 3     row['Col_2_Band V2'][df['Column2'].isin(['A','B','C'])] = "A-C"
      4     return row
      5 

TypeError: ("'str' object does not support item assignment", 'occurred at index 0')

最终目标:将此方法应用于多个 dfs

# for example

df_10 = df10.apply(applyV2, axis=1)
df_20 = df20.apply(applyV2, axis=1)
df_30 = df30.apply(applyV2, axis=1)

【问题讨论】:

    标签: python python-3.x pandas loops dataframe


    【解决方案1】:

    它不是最干净的,但你可以这样做:

    import pandas as pd
    df = {'Column1': [1,2,3,4,5],
      'Column2': ["A","B","C","D","E"]}
    df = pd.DataFrame(df, columns=['Column1','Column2'])
    
    def applyV2(x):
        df['Col_2_Band v2'] = df['Column2'].map(lambda x: "A-C" if "A" in x
                                                           else 'A-C' if 'B' in x
                                                           else 'A-C' if 'C' in x
                                                           else 'D-E')
        return x
    df.apply(applyV2)
    

    输出:

    Column1 Column2 Col_2_Band v2
    0   1      A        A-C
    1   2      B        A-C
    2   3      C        A-C
    3   4      D        D-E
    4   5      E        D-E
    

    【讨论】:

      【解决方案2】:

      如果可能,不要pd.DataFrame.apply 用于易于矢量化的函数。 df.apply 只是一个隐蔽的循环。

      在这种情况下,以下内容更高效且可维护。 pd.DataFrame.pipe 只是将数据框放入一个函数中。我们使用.loc 访问器根据给定条件分配值。

      def add_row(df):
          df['Col_2_Band V2'] = 'D-E'
          df.loc[df['Column2'].isin({'A','B','C'}), 'Col_2_Band V2'] = 'A-C'
          return df
      
      df = df.pipe(add_row)
      

      【讨论】:

        猜你喜欢
        • 2013-11-23
        • 2013-10-03
        • 2018-06-07
        • 2021-12-13
        • 2017-06-17
        • 2020-02-22
        • 2020-07-09
        • 1970-01-01
        • 2017-09-07
        相关资源
        最近更新 更多