根据某些列条件从熊猫数据框中获取所有行组合？答案

【问题标题】：Getting all row combinations from a pandas dataframe based on certain column conditions?根据某些列条件从熊猫数据框中获取所有行组合？
【发布时间】：2019-08-27 15:47:16
【问题描述】：

我有一个 Pandas 数据框，它以以下格式在每一行存储一个食物项目 -

Id   Calories   Protein   IsBreakfast   IsLunch   IsDinner
1      300        6           0           1          0    
2      400        12          1           1          0
.
.
.   
100    700        25          0           1          1

我想打印所有符合以下条件的三行组合 -

组合应至少包含早餐、午餐和晚餐之一。
卡路里总和应该在一定范围内（比如 minCal
蛋白质的情况也类似。

现在，我首先迭代所有早餐项目，选择午餐项目。然后遍历所有晚餐项目。选择组合后，我将添加相关列并检查值是否在所需范围内

【问题讨论】：

三行是什么意思？你能举一个期望输出的例子吗？

标签： python pandas

【解决方案1】：

您可以使用this answer 中描述的方法生成一个新的 DataFrame，其中包含原始数据中三行的所有组合：

from itertools import combinations
import pandas as pd

# Using skbrhmn's df
df = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                   "Protein": [10, 20, 30, 40, 50],
                   "IsBreakfast": [1, 1, 0, 0, 0],
                   "IsLunch": [1, 0, 0, 0, 1],
                   "IsDinner": [1, 1, 1, 0, 1]})

comb_rows = list(combinations(df.index, 3))
comb_rows

输出：

[(0, 1, 2),
 (0, 1, 3),
 (0, 1, 4),
 (0, 2, 3),
 (0, 2, 4),
 (0, 3, 4),
 (1, 2, 3),
 (1, 2, 4),
 (1, 3, 4),
 (2, 3, 4)]

然后创建一个新的 DataFrame，其中包含原始帧中所有数字字段的总和，包括三行的所有可能组合：

combinations = pd.DataFrame([df.loc[c,:].sum() for c in comb_rows], index=comb_rows)

print(combinations)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 2)       600       60            2        1         3
(0, 1, 3)       700       70            2        1         2
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3
(0, 3, 4)      1000      100            1        2         2
(1, 2, 3)       900       90            1        0         2
(1, 2, 4)      1000      100            1        1         3
(1, 3, 4)      1100      110            1        1         2
(2, 3, 4)      1200      120            0        1         2

最后你可以应用你需要的任何过滤器：

filtered = combinations[
    (combinations.IsBreakfast>0) &
    (combinations.IsLunch>0) &
    (combinations.IsDinner>0) &
    (combinations.Calories>600) &
    (combinations.Calories<1000) &
    (combinations.Protein>=80) &
    (combinations.Protein<120)
]
print(filtered)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3

【讨论】：

【解决方案2】：

您可以使用 | 和 & 运算符将过滤器组合添加到数据框。例如创建一个虚拟数据框：

df1 = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                    "Protein": [10, 20, 30, 40, 50],
                    "IsBreakfast": [1, 1, 0, 0, 0],
                    "IsLunch": [1, 0, 0, 0, 1],
                    "IsDinner": [1, 1, 1, 0, 1]})
print(df1)

输出：

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
0       100       10            1        1         1
1       200       20            1        0         1
2       300       30            0        0         1
3       400       40            0        0         0
4       500       50            0        1         1

现在添加所有条件：

min_cal = 100
max_cal = 600
min_prot = 10
max_prot = 40
df_filtered = df1[
    ((df1['IsBreakfast']==1) | (df1['IsLunch']==1) | (df1['IsDinner']==1)) &
    ((df1['Calories'] > min_cal) & (df1['Calories'] < max_cal)) &
    ((df1['Protein'] > min_prot) & (df1['Protein'] < max_prot))
]

print(df_filtered)

输出：

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
1       200       20            1        0         1
2       300       30            0        0         1

【讨论】：