【发布时间】:2020-01-11 13:07:29
【问题描述】:
我想对男性和女性员工的小时工资均值进行 t 检验。
`df1 = df[["gender","hourly_wage"]] #creating a sub-dataframe with only the columns of gender and hourly wage
staff_wages=df1.groupby(['gender']).mean() #grouping the data frame by gender and assigning it to a new variable 'staff_wages'
staff_wages.head()`
说实话,我想我已经弄糊涂了。我想做一个t检验所以我写了代码
`mean_val_salary_female = df1[staff_wages['gender'] == 'female'].mean()
mean_val_salary_female = df1[staff_wages['gender'] == 'male'].mean()
t_val, p_val = stats.ttest_ind(mean_val_salary_female, mean_val_salary_male)
# obtain a one-tail p-value
p_val /= 2
print(f"t-value: {t_val}, p-value: {p_val}")`
它只会返回错误。
我有点疯狂地尝试不同的东西......
`#married_vs_dependents = df[['married', 'num_dependents', 'years_in_employment']]
#married_vs_dependents = df[['married', 'num_dependents', 'years_in_employment']]
#married_vs_dependents.head()
#my_data = df(married_vs_dependents)
#my_data.groupby('married').mean()
mean_gender = df.groupby("gender")["hourly_wage"].mean()
married_vs_dependents.head()
mean_gender.groupby('gender').mean()
mean_val_salary_female = df[staff_wages['gender'] == 'female'].mean()
mean_val_salary_female = df[staff_wages['gender'] == 'male'].mean()
#cat1 = mean_gender['male']==['cat1']
#cat2 = mean_gender['female']==['cat2']
ttest_ind(cat1['gender'], cat2['hourly_wage'])`
请谁能指导我采取正确的步骤?
【问题讨论】:
-
你的 mean_val_salary_male 是从哪来的(t_val, p_val = stats.ttest_ind(mean_val_salary_female, mean_val_salary_male))???
-
你有两个 (mean_val_salary_female) 但没有 (mean_val_salary_male)
-
@MehdiHamzeloee 哦,我明白了。让我看看
标签: python pandas p-value t-test