如何根据python中的另一列值获取一列的平均值（熊猫，jupyter）答案

【问题标题】：how to get the average of values for one column based on another column value in python (pandas, jupyter)如何根据python中的另一列值获取一列的平均值（熊猫，jupyter）
【发布时间】：2020-05-13 00:07:46
【问题描述】：

the image shows the test dataset I am using to verify if the right averages are being calculated.

我希望能够根据“T”列中的过滤值获得“G”列中相应值的平均值。

所以我设置了“T”列的值，我想根据这些值对“G”列中的值求和，然后将总数除以计数以获得平均值，该平均值附加到变量中。但是平均值没有正确计算。见下文 screenshot

total=0
g_avg=[]
output=[]
counter=0
for i, row in df_new.iterrows():
    if (row['T'] > 2):
        counter+=1
        total+=row['G']
    if (counter != 0 and row['T']==10):
        g_avg.append(total/counter)
        counter = 0
        total = 0

print(g_avg)

下面是一组更好的数据，因为“T”值存在重复，所以当 T 值在一定范围内（即从凌晨 2 点到 10 点）时，我需要一个计数器来获得 G 值的平均值上午等 sorry it wont allow me to just paste the dataset so ive took a snippy of it

【问题讨论】：

欢迎使用 stackoverflow。请提供包含数据的minimal reproducible example。列"T" 似乎没有任何值10，因此您甚至永远不会输入第二个if。您也可以只使用df_new[df_new['T'] > 2]['G'].mean()
@sim 它不允许我粘贴数据集，但我已经上传了代码。
@sim 基本上该程序的要点是数据集但包含 1 周的数据，我们希望在“T”列在例如 2 之间时取“g”列的平均值-7pm 并将其附加到列表中

标签： python pandas jupyter-notebook

【解决方案1】：

如果您想要 T 介于 2 和 7 之间时的列 G 值的平均值：

df_new.loc[(df_new['T']>2) & (df_new['T']<7), 'G'].mean()

更新

如果没有任何预期的输出，很难准确地知道您想要什么。如果你有一些看起来像这样的数据：

你想要这样的东西：

然后你可以使用布尔索引和DataFrame.loc:

avg = df.loc[(df['T']>2) & (df['T']<7), 'G'].mean()
df.loc[(df['T']>2) & (df['T']<7), 'G'] = avg

更新 2

如果你有一些样本数据：

方法 1： 要简单地获取这些均值的列表，您可以为间隔创建组并过滤 m：

m = df['T'].between(0,5,inclusive=False)
g = m.ne(m.shift()).cumsum()[m]
lst = df.groupby(g).mean()['G'].tolist()

print(lst)                                                                              
[2.0, 5.0]

方法 2：如果您想在它们各自的 T 值中包含这些均值，那么您可以这样做：

m = df['T'].between(0,5,inclusive=False)
g = m.ne(m.shift()).cumsum()
df['G_new'] = df.groupby(g)['G'].transform('mean')

print(df)                                                                               
    T  G  G_new
0   0  1      1
1   2  2      2
2   3  3      2
3   3  1      2
4   3  2      2
5  10  4      4
6   2  5      5
7   2  5      5
8   2  5      5
9  10  5      5

【讨论】：

是的，但我不想要一个平均值，我想在每次 t 值介于该范围之间时将平均值附加到列表中。例如，t 是时间（凌晨 2 点到 7 点），因此这将在一周内一次又一次地重复，因此我想每次都将平均值添加到列表中
我已经更新了回复。如果这回答了您的问题，请告诉我。
是的，这是正确的，但它只能解决部分问题（可能是由于我的模糊解释）。 'T' 值将具有重复值，例如 t=[0,2,3,3,3,10,2,2,2,10] 和 g=[1,2,3,1,2,4, 5,5,5,5]。在这种情况下，假设我想在每个实例的“T”>0 &
我已经更新了答案以满足这个例子。这是你要找的吗？