R多重T检验：分组因子必须有2个变量答案

【问题标题】：R Multiple T-test: Grouping factor must have 2 variablesR多重T检验：分组因子必须有2个变量
【发布时间】：2019-07-16 09:07:26
【问题描述】：

我试图在一系列变量上将对照组与实验组进行比较，以表明它们相似（基线）。

因此，我需要进行多次 t 检验（未配对/Welch t 检验）。我的数据采用长格式，第一个变量名为“组”，数字 1 或数字 2。在我的其他一些变量中存在一些缺失值，但它非常随机。

所以当我使用这行代码手动运行 t-test 时：

t.test(variable_1 ~ Group,df)

它有效。

然后我尝试使用这行代码一次完成所有操作：

 sapply(df[,2:71], function(i) t.test(i ~ df$Group)$p.value)

但我收到以下错误：

分组因子必须恰好有 2 个级别

有人可以帮忙吗？

这是结构的样子

structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 2, 2), EM_Accuracy_Time_Airport = c(3, 3, 0, 
1, 1, 2, 2, 1, 1, 3, 3, 2, 2, 2, 1, 3, 1, 3, 1, 1), EM_Accuracy_Place_Airport = c(2, 
2, 1, 2, 1, 2, 2, 1, 1, 2, 0, 2, 2, 0, 2, 2, 2, 1, 1, 1), EM_Accuracy_Expl_Airport = c(2, 
2, 2, 0, 2, 2, 2, 1, 2, 2, 2, 2, 2, 0, 0, 1, 0, 2, 2, 1), EM_Accuracy_Death_Airport = c(0, 
2, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0), EM_Accuracy_Time_Metro = c(3, 
1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 2, 1, 3, 1, 1, 2, 1, 3, 3), EM_Accuracy_Death_Metro = c(3, 
0, 1, 0, 1, 1, 0, 0, 0, 3, 0, 0, 1, 0, 3, 1, 1, 1, 0, 0), EM_Accuracy_PC_Time_Airpot = c(100, 
100, 0, 33.3333333333333, 33.3333333333333, 66.6666666666667, 
66.6666666666667, 33.3333333333333, 33.3333333333333, 100, 100, 
66.6666666666667, 66.6666666666667, 66.6666666666667, 33.3333333333333, 
100, 33.3333333333333, 100, 33.3333333333333, 33.3333333333333
), EM_Accuracy_PC_Place_Airport = c(100, 100, 50, 100, 50, 100, 
100, 50, 50, 100, 0, 100, 100, 0, 100, 100, 100, 50, 50, 50), 
    EM_Accuracy_PC_Expl_Airport = c(100, 100, 100, 0, 100, 100, 
    100, 50, 100, 100, 100, 100, 100, 0, 0, 50, 0, 100, 100, 
    50), EM_Accuracy_PC_Death_Airport = c(0, 66.6666666666667, 
    0, 0, 33.3333333333333, 66.6666666666667, 0, 0, 0, 0, 0, 
    0, 66.6666666666667, 0, 0, 0, 100, 0, 0, 0), EM_Accuracy_PC_Time_Metro = c(100, 
    33.3333333333333, 0, 0, 33.3333333333333, 33.3333333333333, 
    0, 33.3333333333333, 33.3333333333333, 33.3333333333333, 
    33.3333333333333, 66.6666666666667, 33.3333333333333, 100, 
    33.3333333333333, 33.3333333333333, 66.6666666666667, 33.3333333333333, 
    100, 100), EM_Accuracy_PC_Death_Metro = c(100, 0, 33.3333333333333, 
    0, 33.3333333333333, 33.3333333333333, 0, 0, 0, 100, 0, 0, 
    33.3333333333333, 0, 100, 33.3333333333333, 33.3333333333333, 
    33.3333333333333, 0, 0), EM_ACCURACY_PC = c(83.3333333333333, 
    66.6666666666667, 30.5555555555556, 22.2222222222222, 47.2222222222222, 
    66.6666666666666, 44.4444444444444, 27.7777777777778, 36.1111111111111, 
    72.2222222222222, 38.8888888888889, 55.5555555555555, 66.6666666666666, 
    27.7777777777778, 44.4444444444444, 52.7777777777778, 55.5555555555556, 
    52.7777777777778, 47.2222222222222, 38.8888888888889), EM_Certainty_Time_Airport = c(3, 
    1, 1, 1, 2, 2, 1, 1, 2, 3, 3, 2, 2, 2, 4, 2, 3, 3, 2, 2), 
    EM_Certainty__Place_Airport = c(3, 4, 2, 2, 2, 2, 4, 1, 3, 
    4, 4, 4, 4, 3, 3, 4, 4, 3, 2, 3), EM_Certainty__Expl_Airport = c(4, 
    2, 3, 1, 2, 3, 2, 1, 2, 4, 1, 3, 2, 2, 1, 3, 1, 2, 2, 3), 
    EM_Certainty__Death_Airport = c(1, 1, NA, 1, 2, 1, 3, 1, 
    2, 3, NA, 3, 2, 1, 2, 1, 1, 1, 4, 4), EM_Certainty__Time_Metro = c(3, 
    3, 1, 1, 2, 2, 2, 1, 3, 2, 3, 2, 3, 2, 2, 2, 3, 1, 2, 2), 
    EM_Certainty__Death_Metro = c(2, 1, 1, NA, 2, 1, 1, 1, 2, 
    1, NA, 3, 2, 1, 1, 1, 1, 1, 1, 4), EM_CERTAINTY = c(2.66666666666667, 
    2, 1.6, 1.2, 2, 1.83333333333333, 2.16666666666667, 1, 2.33333333333333, 
    2.83333333333333, 2.75, 2.83333333333333, 2.5, 1.83333333333333, 
    2.16666666666667, 2.16666666666667, 2.16666666666667, 1.83333333333333, 
    2.16666666666667, 3), EM_CONFIDENCE = c(5, 5, 1, 2, 2, 4, 
    5, 2, 3, 4, 5, 5, 3, 3, 4, 4, 3, 2, 3, 2), FBM_CONFIDENCE = c(4, 
    6, 7, 7, 5, 4, 2, 7, 5, 6, 6, 7, 6, 7, 3, 6, 6, 4, 5, 6), 
    FBM_Vividness_Time = c(3, 3, 1, 4, 3, 2, 4, 3, 4, 4, 1, 3, 
    4, 4, 3, 3, 3, 2, 4, 3), FBM_Vividness_How = c(4, 4, 2, 4, 
    4, 3, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 4, 4), FBM_Vividness_Where = c(4, 
    4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), 
    FBM_Vividness_WithWhom = c(4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 
    4, 4, 4, 4, 4, 4, 4, 4, 4, 4), FBM_Vividness_WereDoing = c(4, 
    4, 1, 4, 3, 4, 4, 4, 4, 4, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4), 
    FBM_Vividness_Did_After = c(4, 4, 3, 4, 2, 3, 4, 4, 2, 4, 
    1, 4, 4, 4, 3, 4, 4, 3, 4, 4), FBM_VIVIDNESS = c(3.83333333333333, 
    3.83333333333333, 2, 4, 3.16666666666667, 3.33333333333333, 
    4, 3.83333333333333, 3.66666666666667, 4, 2.33333333333333, 
    3.83333333333333, 3.83333333333333, 4, 3.66666666666667, 
    3.83333333333333, 3.83333333333333, 3.5, 4, 3.83333333333333
    ), FBM_Details_NB_T2 = c(3, 5, 0, 5, 5, 5, 2, 5, 1, 5, 3, 
    5, 5, 5, 2, 4, 2, 3, 5, 5), P_Novelty_5 = c(5, 6.2, 6.5, 
    5.6, 4.8, 5.4, 4, 4.2, 4.4, 5.8, 3.4, 5.8, 6, 5.8, 3.8, 6.4, 
    6.8, 6.6, 7, 3), P_Suprise_emotion = c(6, 6, 6, 6, 4, 5, 
    1, 7, 1, 5, 4, 5, 7, 7, 6, 4, 7, 7, 2, 5), P_Surprise_Expected = c(1, 
    3, 5, 2, 4, 3, 6, 2, 2, 1, 6, 4, 3, 1, 5, 1, 1, 1, 5, 4), 
    P_Surprise_Unbelievable = c(5, 4, 1, 6, 4, 4, 2, 7, 1, 4, 
    1, 6, 7, 7, 6, 3, 7, 7, 5, 3), `P_Consequence-Importance_5` = c(5.6, 
    4.8, 3.4, 5, 4.8, 4, 5, 5.4, 3, 5.2, 6.8, 5.4, 4, 4.4, 6, 
    3.8, 4, 4.8, 5, 5.2), P_Emotional_Intensity_4 = c(5.25, 5.75, 
    3, 4.75, 4.75, 6, 4, 5.25, 2.5, 5.5, 7, 6.5, 5.75, 6.75, 
    6.75, 6, 6.25, 6, 5, 2.5), P_Social_Sharing_6 = c(3.66666666666667, 
    3.83333333333333, 3.4, 3.16666666666667, 3, 3.33333333333333, 
    3.8, 3.16666666666667, 2.16666666666667, 4.16666666666667, 
    4, 4.5, 4.5, 4.33333333333333, 4, 3.16666666666667, 3.66666666666667, 
    4, NA, NA), P_Media_3 = c(4.66666666666667, 4, 3, 2.66666666666667, 
    2.66666666666667, 2.33333333333333, 3, 2.33333333333333, 
    2.33333333333333, 3.33333333333333, 4.33333333333333, 5, 
    4.33333333333333, 5, 4, 2, 3, 3.33333333333333, 2, 1.66666666666667
    ), P_Ruminations = c(3, NA, 3, 2, 4, NA, 4, 2, 1, 4, 4, 4, 
    2, 4, 2, 3, 3, 3, 4, 3), P_Novelty_Common_rev = c(6, 7, 7, 
    7, 4, 6, 4, 7, 2, 6, 3, 7, 7, 7, 3, 6, 7, 7, 7, 3), P_Novelty_Unusual = c(2, 
    5, 7, 7, 3, 5, 3, 3, 5, 6, 1, 4, 7, 1, 4, 6, 6, 6, 7, 2), 
    P_Novelty_Special = c(6, 6, NA, 6, 5, 5, 4, 3, 5, 4, 1, 5, 
    6, 7, 4, 6, 7, 7, 7, 3), P_Novelty_Singular = c(4, 6, 5, 
    1, 5, 5, 4, 1, 3, 6, 5, 6, 4, 7, 3, 7, 7, 6, 7, 2), P_Novelty_Ordinary_rev = c(7, 
    7, 7, 7, 7, 6, 5, 7, 7, 7, 7, 7, 6, 7, 5, 7, 7, 7, 7, 5), 
    P_Consequence = c(6, 7, 5, 4, 5, 4, 5, 3, 5, 5, 7, 5, 5, 
    2, 6, 6, 1, 4, 6, 3), P_Importance_self = c(4, 3, 3, 4, 4, 
    3, 5, 6, 1, 5, 7, 5, 3, 3, 5, 2, 2, 4, 5, 3), `P_Importance_friends&family` = c(4, 
    4, 3, 4, 4, 4, 4, 6, 1, 5, 6, 5, 3, 3, 5, 2, 6, 4, 5, 10), 
    P_Importance_Belgium = c(7, 5, 3, 7, 6, 5, 6, 7, 3, 7, 7, 
    7, 5, 7, 7, 5, 6, 7, 6, 6), P_Importance_International = c(7, 
    5, 3, 6, 5, 4, 5, 5, 5, 4, 7, 5, 4, 7, 7, 4, 5, 5, 3, 4), 
    P_Emotional_Intensity_Upset = c(4, 5, NA, 3, 3, 5, 3, 5, 
    2, 5, 7, 5, 5, 6, 7, 6, 6, 5, 5, 3), P_Emotional_Intensity_Indiferent_rev = c(7, 
    7, 5, 7, 6, 7, 4, 6, 4, 7, 7, 7, 7, 7, 7, 7, 7, 7, NA, 4), 
    P_Emotional_Intensity_Affected = c(6, 6, 3, 5, 5, 6, 5, 6, 
    2, 5, 7, 7, 5, 7, 7, 6, 6, 6, NA, 2), P_Emotional_Intensity_Shaken = c(4, 
    5, 1, 4, 5, 6, 4, 4, 2, 5, 7, 7, 6, 7, 6, 5, 6, 6, 5, 1), 
    P_Rehearsal_Media_TV = c(5, 3, NA, 3, 2, 3, NA, 1, 1, 4, 
    3, 5, 5, 5, 2, 3, 2, 2, 2, 2), P_Rehearsal_Media_Internet = c(4, 
    4, 1, 3, 2, 2, 2, 4, 3, 2, 5, 5, 3, 5, 5, 1, 5, 4, 2, 1), 
    P_Rehearsal_Media_Social_Networks = c(5, 5, 5, 2, 4, 2, 4, 
    2, 3, 4, 5, 5, 5, 5, 5, 2, 2, 4, 2, 2), P_Social_Sharing_How_Often = c(4, 
    5, 4, 4, 4, 3, 3, 3, 3, 5, 4, 5, 5, 5, 5, 3, 4, 4, 5, NA), 
    P_Social_Sharing_With_How_Many_People = c(5, 4, NA, 3, 3, 
    3, 3, 3, 2, 5, 3, 5, 5, 3, 5, 3, 3, 4, 3, NA), PK_Shops_YN = c(0, 
    1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1), 
    PK_Comic = c(0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 
    0, 0, 0, 1, 0), PK_Hotel = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 
    0, 0, 1, 1, 0, 0, 0, 0, 0, 0), PK_Decoration_Maelbeek = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1), 
    PK_Stations_before_after_Maelbeek = c(0, 0.5, 0, 0, 0, 0, 
    0, 0, 0.5, 1, 0, 0, 0.5, 0.5, 0, 0, 0.5, 0, 0.5, 0), PK_TOTAL_PC = c(0, 
    50, 0, 40, 40, 40, 20, 0, 10, 60, 20, 40, 90, 70, 20, 0, 
    30, 20, 70, 40), SI_Attachment_BXL = c(6, 4, 1, 4, 2, 5, 
    1, 6, 5, 4, 2, 6, 6, 7, 1, 3, 6, 4, 5, 4), SI_Pride_BXL = c(1, 
    2, 1, 2, 1, 2, 1, 5, 1, 6, 1, 1, 7, 7, 1, 2, 6, 1, 3, 3), 
    SI_Attachment_Belgium = c(7, 3, 5, 5, 4, 6, 7, 6, 5, 6, 7, 
    7, 7, 7, 5, 6, 7, 6, 4, 2), SI_Pride_Belgium = c(7, 2, 6, 
    4, 2, 6, 4, 5, 1, 5, 1, 6, 7, 7, 5, 7, 7, 6, 2, 2), SI_Attachment_EU = c(6, 
    4, 2, 5, 4, 4, 5, 4, 7, 4, 1, 6, 7, 7, 5, 4, 6, 6, 2, 6), 
    SI_Pride_EU = c(7, 1, 1, 4, 3, 4, 4, 4, 1, 4, 1, 6, 7, 7, 
    4, 3, 6, 6, 2, 4)), .Names = c("Group", "EM_Accuracy_Time_Airport", 
"EM_Accuracy_Place_Airport", "EM_Accuracy_Expl_Airport", "EM_Accuracy_Death_Airport", 
"EM_Accuracy_Time_Metro", "EM_Accuracy_Death_Metro", "EM_Accuracy_PC_Time_Airpot", 
"EM_Accuracy_PC_Place_Airport", "EM_Accuracy_PC_Expl_Airport", 
"EM_Accuracy_PC_Death_Airport", "EM_Accuracy_PC_Time_Metro", 
"EM_Accuracy_PC_Death_Metro", "EM_ACCURACY_PC", "EM_Certainty_Time_Airport", 
"EM_Certainty__Place_Airport", "EM_Certainty__Expl_Airport", 
"EM_Certainty__Death_Airport", "EM_Certainty__Time_Metro", "EM_Certainty__Death_Metro", 
"EM_CERTAINTY", "EM_CONFIDENCE", "FBM_CONFIDENCE", "FBM_Vividness_Time", 
"FBM_Vividness_How", "FBM_Vividness_Where", "FBM_Vividness_WithWhom", 
"FBM_Vividness_WereDoing", "FBM_Vividness_Did_After", "FBM_VIVIDNESS", 
"FBM_Details_NB_T2", "P_Novelty_5", "P_Suprise_emotion", "P_Surprise_Expected", 
"P_Surprise_Unbelievable", "P_Consequence-Importance_5", "P_Emotional_Intensity_4", 
"P_Social_Sharing_6", "P_Media_3", "P_Ruminations", "P_Novelty_Common_rev", 
"P_Novelty_Unusual", "P_Novelty_Special", "P_Novelty_Singular", 
"P_Novelty_Ordinary_rev", "P_Consequence", "P_Importance_self", 
"P_Importance_friends&family", "P_Importance_Belgium", "P_Importance_International", 
"P_Emotional_Intensity_Upset", "P_Emotional_Intensity_Indiferent_rev", 
"P_Emotional_Intensity_Affected", "P_Emotional_Intensity_Shaken", 
"P_Rehearsal_Media_TV", "P_Rehearsal_Media_Internet", "P_Rehearsal_Media_Social_Networks", 
"P_Social_Sharing_How_Often", "P_Social_Sharing_With_How_Many_People", 
"PK_Shops_YN", "PK_Comic", "PK_Hotel", "PK_Decoration_Maelbeek", 
"PK_Stations_before_after_Maelbeek", "PK_TOTAL_PC", "SI_Attachment_BXL", 
"SI_Pride_BXL", "SI_Attachment_Belgium", "SI_Pride_Belgium", 
"SI_Attachment_EU", "SI_Pride_EU"), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

【问题讨论】：

请使用dput(head(df,n)) 添加您的数据样本。选择 n 可能足以重现性。
谷歌搜索你的错误信息会引导我到这篇文章stackoverflow.com/questions/29421475/…，它建议这样做t.test(i, df$Group)$p.value。您可以在sapply 通话中尝试一下吗？
@NelsonGon ：我已经添加了结构。
@Ronak Shah：使用逗号进行不同类型的测试，它比较两个变量（逗号之前的变量和逗号之后的变量）。它适用于宽格式，但我使用的是带有分组变量的长格式
什么是variable_1？

标签： r t-test

【解决方案1】：

您收到的错误意味着您的数据集中至少有一个变量存在问题。

这里有一个过程可以帮助您发现有问题的变量：

library(tidyverse)

df %>%
  group_by(Group) %>%                   # for each group value
  summarise_all(~sum(!is.na(.))) %>%    # count non NA values for each variable
  gather(var,value,-Group) %>%          # reshape
  spread(Group, value, sep = "_") %>%   # reshape
  filter(Group_2 < 2)                   # get problematic variables

# # A tibble: 5 x 3
#   var                                   Group_1 Group_2
#   <chr>                                   <int>   <int>
# 1 P_Emotional_Intensity_Affected             18       1
# 2 P_Emotional_Intensity_Indiferent_rev       18       1
# 3 P_Social_Sharing_6                         18       0
# 4 P_Social_Sharing_How_Often                 18       1
# 5 P_Social_Sharing_With_How_Many_People      17       1

0 计数将引发关于在分组变量中需要两个级别的错误。

1 计数将引发错误，提示您需要在其中一个组中进行更多观察。

在发现这些之后，您必须相应地对待它们，然后您的原始 t.test 代码应该可以工作。

【讨论】：

谢谢，以后会用到！

【解决方案2】：

所以我的问题只是缺少一个变量中的数据。

但是，如果您正在考虑以长格式进行多次 T 测试：这行代码有效：

sapply(df[,2:71], function(i) t.test(i ~ df$Group)$p.value)

【讨论】：