您可以使用this answer 中描述的方法生成一个新的 DataFrame,其中包含原始数据中三行的所有组合:
from itertools import combinations
import pandas as pd
# Using skbrhmn's df
df = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
"Protein": [10, 20, 30, 40, 50],
"IsBreakfast": [1, 1, 0, 0, 0],
"IsLunch": [1, 0, 0, 0, 1],
"IsDinner": [1, 1, 1, 0, 1]})
comb_rows = list(combinations(df.index, 3))
comb_rows
输出:
[(0, 1, 2),
(0, 1, 3),
(0, 1, 4),
(0, 2, 3),
(0, 2, 4),
(0, 3, 4),
(1, 2, 3),
(1, 2, 4),
(1, 3, 4),
(2, 3, 4)]
然后创建一个新的 DataFrame,其中包含原始帧中所有数字字段的总和,包括三行的所有可能组合:
combinations = pd.DataFrame([df.loc[c,:].sum() for c in comb_rows], index=comb_rows)
print(combinations)
Calories Protein IsBreakfast IsLunch IsDinner
(0, 1, 2) 600 60 2 1 3
(0, 1, 3) 700 70 2 1 2
(0, 1, 4) 800 80 2 2 3
(0, 2, 3) 800 80 1 1 2
(0, 2, 4) 900 90 1 2 3
(0, 3, 4) 1000 100 1 2 2
(1, 2, 3) 900 90 1 0 2
(1, 2, 4) 1000 100 1 1 3
(1, 3, 4) 1100 110 1 1 2
(2, 3, 4) 1200 120 0 1 2
最后你可以应用你需要的任何过滤器:
filtered = combinations[
(combinations.IsBreakfast>0) &
(combinations.IsLunch>0) &
(combinations.IsDinner>0) &
(combinations.Calories>600) &
(combinations.Calories<1000) &
(combinations.Protein>=80) &
(combinations.Protein<120)
]
print(filtered)
Calories Protein IsBreakfast IsLunch IsDinner
(0, 1, 4) 800 80 2 2 3
(0, 2, 3) 800 80 1 1 2
(0, 2, 4) 900 90 1 2 3