给列赋值如何避免警告答案

【问题标题】：giving values to a column how to avoid the WARNING给列赋值如何避免警告
【发布时间】：2020-11-26 19:16:36
【问题描述】：

我有一个包含各种列的数据框。我想检查每一行是否满足条件。条件来自另一个 CSV 文件，但在这里我提供一个简化示例来说明我的问题：

条件是价格低于26000。

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

mydata = pd.DataFrame(cars, columns = ['Brand','Price'], index=['Car_1','Car_2','Car_3','Car_4'])

数据如下所示：

print (df)
            Brand  Price
Car_1     Honda Civic  22000
Car_2  Toyota Corolla  25000
Car_3      Ford Focus  27000
Car_4         Audi A4  35000

因此，我使用np.nan 创建了另一列，并在for 循环中检查该行是否满足该条件，如果是，则将True 的值赋予该单元格。

mydata['condition'] = np.nan


                Brand  Price  condition
Car_1     Honda Civic  22000        NaN
Car_2  Toyota Corolla  25000        NaN
Car_3      Ford Focus  27000        NaN
Car_4         Audi A4  35000        NaN

我的前循环是这样的：

for i in range(mydata.shape[0]):
 
    mydata.condition.iloc[i] = None

   if (mydata.Price.iloc[i] <= 26000):
                mydata.condition.iloc[i] = True

现在，mydata 看起来像这样：

            Brand  Price condition
Car_1     Honda Civic  22000      True
Car_2  Toyota Corolla  25000      True
Car_3      Ford Focus  27000      None
Car_4         Audi A4  35000      None

如果我使用dropna() 我会得到我想要的结果：

filtered_results=mydata.dropna()


                Brand  Price condition
Car_1     Honda Civic  22000      True
Car_2  Toyota Corolla  25000      True

我的问题是我收到警告，如下所示：

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)

我的问题是，在这一行中为数据框分配值以避免上述错误的正确/有效方法是什么： mydata.condition.iloc[i] = True

感谢您的帮助。

【问题讨论】：

标签： python python-3.x pandas dataframe for-loop

【解决方案1】：

不要循环，你可以一次性完成：

mydata.loc[mydata.Price <= 26000, 'condition'] = True

【讨论】：

我需要循环，因为有两个条件，而不仅仅是一个。这个例子只是简化了。另外，我的问题在于给出错误的分配部分。
这有什么问题：mydata.condition.iloc[i] = True ？ @Quang
这就是不循环的关键，你不需要处理那个错误。您可以并且应该在 SO 上搜索该错误以了解原因。如果您有 2 个条件，请针对其他条件再做一次。
@Sean 至于为什么它不起作用或给你一个警告：见this doc。
@Sean 这在您的原始帖子中没有提到。听我说，循环不是唯一的方法，也不是最好的方法我想检查...的每一行。你问的很可能是XY-problem。正如我所看到的，我的回答解决了您发布的问题。我还向您提供了有关您的代码为何不起作用的其他信息。抱歉，如果您认为这不适合您。

【解决方案2】：

您可以将一个函数应用于每一行，无论您有多少条件。您可以向 price_check 添加更多条件以满足您的要求。从您的问题中以及在查看您的 cmets 时，您的确切问题是什么并不完全清楚。如果 Quang Hoang 的解决方案适用于您的问题，那么 Quang Hoang 有一个比使用 apply 更有效的解决方案。

def price_check(row):
    if row['Price'] <= 26000:
        return True
    else:
        return False

mydata['Price_check'] = mydata.apply(price_check, axis=1)

Brand   Price           Price  Price_check
Car_1   Honda Civic     22000   True
Car_2   Toyota Corolla  25000   True
Car_3   Ford Focus      27000   False
Car_4   Audi A4         35000   False

【讨论】：

你为什么不应用一个函数：因为apply 通常不好？更多信息in this question.
OP需要应用多个条件；所以 apply 将迭代一次并在一次迭代中应用两个条件，而不是必须迭代多次，这将证明开销 apply 函数会占用；不是吗？
不，不是。 Apply 一般是循环的，而循环一般比较慢。您仍然需要进行尽可能多的比较，但是您丢失了矢量化。此外，您还有np.select 用于多个条件匹配。
同意你的看法。更好的文章比较、朴素循环、应用和矢量化；这也可以解决 OP 的困惑：engineering.upside.com/…