【问题标题】:Python get data from columns based on conditionPython根据条件从列中获取数据
【发布时间】:2021-11-29 21:46:51
【问题描述】:

给定一个数据框,我想检查 DS1.ColA 或 DS1.ColB 是否包含“类型 1”,如果是,我想将相应的 DS1.Val 插入到列值中。 DS2 也是如此,检查 DS2.ColA 或 DS2.ColB 是否包含“Type 1”,如果是,我想将相应的 DS2.Val 插入到列 Value 中。

df = pd.DataFrame(
{
        'ID': ['AB01', 'AB02', 'AB03', 'AB04', 'AB05','AB06'],
        'DS1.ColA': ["Type 1","Undef",np.nan,"Undef",
                 "Type 1", ""],
        'DS1.ColB': ["N","Type 1","","",
                 "Y", np.nan],
        'DS1.Val': [85,87,18,94,
                 81, 54],
        'DS2.ColA': ["Type 1","Undef","Type 1","Undef",
                 "Type 1", ""],
        'DS2.ColB': ["N","Type 2","","",
                 "Y", "Type 1"],
        'DS2.Val': [45,98,1,45,66,36]
}
)

var_check = "Type 1"
ds1_col_check = ["DS1.ColA","DS1.ColB","DS1.Val"]
ds2_col_check = ["DS2.ColA","DS2.ColB","DS2.Val"]

ds1_col_check 和 ds2_col_check 的最后一个元素始终是要放置在新列中的元素(列表中可能有更多列要检查)。最终结果 df 应该如下所示。我如何在 python 中实现这一点?

【问题讨论】:

  • 最后一列值在所需输出中表示什么?
  • 它来自 DS1.Val 或 DS2.Val,如果 DS1 列具有所需的字符串,则从 DS1.Val 获得值,否则,如果 DS2 列具有所需的字符串,则从 DS2.Val 获得值跨度>
  • 在所需的输出中,为什么 AB04 有一行,因为没有列 DS1.ColA、DS1.ColB、DS2.ColA、DS2.ColB 有“type1”?
  • 在某些情况下,DS1 和 DS2 的任何列中都没有出现“类型 1”,因此它的值是 nan

标签: python pandas dataframe numpy data-manipulation


【解决方案1】:

如果有多个列表可以创建列表L 并为每个子列表测试是否匹配条件并将值设置为列Value,以避免覆盖值使用Series.fillna

var_check = "Type 1"
ds1_col_check = ["DS1.ColA","DS1.ColB","DS1.Val"]
ds2_col_check = ["DS2.ColA","DS2.ColB","DS2.Val"]

L = [ds1_col_check, ds2_col_check]

df['Value'] = np.nan
for val in L:
    df.loc[df[val[:-1]].eq(var_check).any(axis=1), 'Value'] = df['Value'].fillna(df[val[-1]])
    
print (df)
     ID DS1.ColA DS1.ColB  DS1.Val DS2.ColA DS2.ColB  DS2.Val  Value
0  AB01   Type 1        N       85   Type 1        N       45   85.0
1  AB02    Undef   Type 1       87    Undef   Type 2       98   87.0
2  AB03      NaN                18   Type 1                 1    1.0
3  AB04    Undef                94    Undef                45    NaN
4  AB05   Type 1        Y       81   Type 1        Y       66   81.0
5  AB06               NaN       54            Type 1       36   36.0

或者:

var_check = "Type 1"
ds1_col_check = ["DS1.ColA","DS1.ColB","DS1.Val"]
ds2_col_check = ["DS2.ColA","DS2.ColB","DS2.Val"]

df.loc[df[ds1_col_check[:-1]].eq(var_check).any(axis=1), 'Value'] = df[ds1_col_check[-1]]
df.loc[df[ds2_col_check[:-1]].eq(var_check).any(axis=1), 'Value'] = df['Value'].fillna(df[ds2_col_check[-1]])
    

【讨论】:

    【解决方案2】:

    pyjanitordev 中有一个case_when 实现,在这种情况下可能会有所帮助,以抽象多个条件(在后台,它使用 pd.Series.mask):

    # pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
    import pandas as pd
    import janitor as jn
    
     # it has a syntax of 
    # condition, value, 
    # condition, value, 
    # more condition, value pairing, 
    # default if none of the conditions match
    # column name to assign values to
    # similar to a case when in SQL
     df.case_when(
          df['DS1.ColA'].str.contains('Type 1') | df['DS1.ColB'].str.contains('Type 1'), df['DS1.Val'],
          df['DS2.ColA'].str.contains('Type 1') | df['DS2.ColB'].str.contains('Type 1'), df['DS2.Val'],
          np.nan,
          column_name = 'Value')
    
         ID DS1.ColA DS1.ColB  DS1.Val DS2.ColA DS2.ColB  DS2.Val  Value
    0  AB01   Type 1        N       85   Type 1        N       45   85.0
    1  AB02    Undef   Type 1       87    Undef   Type 2       98   87.0
    2  AB03      NaN                18   Type 1                 1    1.0
    3  AB04    Undef                94    Undef                45    NaN
    4  AB05   Type 1        Y       81   Type 1        Y       66   81.0
    5  AB06               NaN       54            Type 1       36   36.0
    

    【讨论】:

      猜你喜欢
      • 2017-10-26
      • 2014-12-17
      • 1970-01-01
      • 2020-09-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-11-30
      相关资源
      最近更新 更多