在 pandas 数据框中的列上应用自定义函数答案

【问题标题】：Apply a custom function on columns in a pandas dataframe在 pandas 数据框中的列上应用自定义函数
【发布时间】：2018-03-13 13:57:26
【问题描述】：

我想做一些相当于

的事情

Select x,y,z from data where f(x, Y);

而 f 是我自定义的函数，它查看一行中特定列的值并返回 True 或 False。我尝试了以下方法：

df = df.ix[_is_detection_in_window(df['Product'], df['CreatedDate'])== True]

但我明白了

TypeError: 'Series' objects are mutable, thus they cannot be hashed

我认为它不会遍历行。我也试过了：

 i = 0
   for index, row in df.iterrows():
           if _is_detection_in_window(row['Product'], row['CreatedDate']):
                   print 'in range '
                   new_df.iloc[i] = row
                   i+= 1
   df = new_df

但我明白了：

IndexError: single positional indexer is out-of-bounds

【问题讨论】：

标签： python string pandas numpy dataframe

【解决方案1】：

您的函数似乎不接受Series，但可以使用np.vectorize 进行更改：

v = np.vectorize(_is_detection_in_window)
df = df.loc[v(df['Product'], df['CreatedDate'])]

此外，您应该避免使用 .ix，它在 v20 中已被弃用。

【讨论】：

是的，我的函数接受两个字符串，所以我想遍历行并选择与在特定列上定义的函数 f 匹配的行。我已经在我的帖子中写了我的另一个尝试，但它不起作用。
完美的男人！太棒了，谢谢。你能告诉我我的第二个解决方案有什么问题吗？
@Alex 您的索引可能有问题。你可能想要new_df.iloc[index]。即便如此，如果你想过滤掉行，你所做的是不正确的。

【解决方案2】：

不确定您的函数的外观如何，但我假设它返回的布尔值列表等于您的 df 中的行数：

df = df.iloc[_is_detection_in_window(df['Product'], df['CreatedDate']), :]

【讨论】：