熊猫，对于每一行获取两列之间最大列的值答案

【问题标题】：Pandas, for each row getting value of largest column between two columns熊猫，对于每一行获取两列之间最大列的值
【发布时间】：2018-04-14 04:19:02
【问题描述】：

我想在 pandas 数据框上表达以下内容，但除了缓慢手动迭代所有单元格之外，我不知道该怎么做。

对于上下文：我有一个包含两类列的数据框，我们将它们称为 read_columns 和 non_read_columns。给定一个列名，我有一个函数可以返回 true 或 false 来告诉您该列属于哪个类别。

Given a specific read column A:
    For each row:
        1. Inspect the read column A to get the value X
        2. Find the read column with the smallest value Y that is greater than X.
            If no read column has a value greater than X, then substitute the largest value
            found in all of the *non*-read columns, call it Z, and skip to step 4.
        3. Find the non-read column with the greatest value between X and Y and call its value Z.
        4. Compute Z - X

最后我希望有一系列与原始数据框具有相同索引的 Z - X 值。 请注意，列值的排序顺序在各行之间不一致。

最好的方法是什么？

【问题讨论】：

为什么投反对票？似乎是一个严格陈述的正常编程问题......？
你能创建样本输入和预期输出吗？ How to ask a good Pandas question?
我认为我所说的完全明确地描述了这个问题。如果您是其他可能感兴趣的人，但我不明白为什么它值得反对，那么缺乏示例数据可能是不花时间回答问题的一个很好的理由。现在这个问题将被埋没，即使我添加数据也不会得到实际回答的可见性。
“什么是最好的方法”非常广泛，任何人都可以回答零方向。提供示例和尝试可以为我们提供基准测试的起点。
@AndrewL 另一种方法是手动迭代所有单元格

标签： python pandas dataframe

【解决方案1】：

如果不查看示例 DF，很难给出答案，但您可以执行以下操作：

将您的读取列与 Y 值分离到一个新的 DF 中。
转置这个新的 DF 以获得列中的 Y 值，而不是行中的值。
对 Y 值系列使用内置矢量化函数，而不是手动迭代行和列。您可以先过滤大于 X 的值，然后在过滤后的 Series 上应用 min()。

【讨论】：