【问题标题】:Paired samples t.test with concatenated vectors带有连接向量的配对样本 t.test
【发布时间】:2017-02-16 14:06:33
【问题描述】:

我有一个非常大的数据集,想编写一个经济的数据分析代码。

这里是一个说明的例子

df <- data.frame(
ID = factor(sample(c("A","B","C","D","E","F","G"), 20, replace=TRUE)),
a1 = runif(20),
a2 = runif(20),
a3 = runif(20),
a4 = runif(20),
b1 = runif(20),
b2 = runif(20),
b3 = runif(20),
b4 = runif(20))

我想像这样进行配对样本 t 检验(示例):

t.test(df$a1, df$b1, paired=TRUE, na.rm=TRUE)
t.test(df$a2, df$b2, paired=TRUE, na.rm=TRUE)

这可行,但我想要更短的代码并尝试过:

object_a <- paste("a", 1:4, sep="")
object_b <- paste("b", 1:4, sep="")

t.test.func.paired <- function(x) {
 t.test(x, y, paired = TRUE, na.rm=TRUE)
}
df %>%
select_(.dots = c(object_a, object_b)) %>%
sapply(., t.test.func.paired) %>%
.[c("statistic", "parameter", "p.value"), ] %>%
View()

不幸的是,这不起作用。但是错误在哪里? 谢谢!

【问题讨论】:

  • 您可以使用df[, "a1"],而不是df$a1。然后你的粘贴就可以了。作为替代方案,您可以将 A 和 B 存储在单独的列表中,然后按位置引用列表元素。

标签: r statistics dplyr


【解决方案1】:

这是一个使用dplyrbroom 包的进程。 Broom 将帮助您将 t.test 结果自动保存在数据框中,因此您不必自己提取各种信息。

关键是创建您想要的所有变量组合,并为每个组合运行适当的测试。请注意,这涉及按顺序排列的列名(如 a1、a2、...、b1、b2、...)。 Dplyr 将帮助您避免每个变量组合的 for 循环。

library(dplyr)
library(broom)

# dataset
df <- data.frame(
  ID = factor(sample(c("A","B","C","D","E","F","G"), 20, replace=TRUE)),
  a1 = runif(20),
  a2 = runif(20),
  a3 = runif(20),
  a4 = runif(20),
  b1 = runif(20),
  b2 = runif(20),
  b3 = runif(20),
  b4 = runif(20))

# split dataset names based on matching 
object_a = names(df)[grep("a", names(df))]
object_b = names(df)[grep("b", names(df))]


cbind(object_a, object_b) %>%                  # combine dataset names
  data.frame(., stringsAsFactors = F) %>%      # create a dataset
  rowwise() %>%                                # for each row
  do(data.frame(.,                             # keep dataset names
                tidy(t.test(df[,.$object_a],   # get t.test results as a data frame based on the object names you have in that row
                            df[,.$object_b], 
                            paired = T, 
                            na.rm = T)))) %>%  
  ungroup                                      # forget the grouping

# # A tibble: 4 × 10
#   object_a object_b    estimate  statistic   p.value parameter   conf.low  conf.high        method alternative
# *    <chr>    <chr>       <dbl>      <dbl>     <dbl>     <dbl>      <dbl>      <dbl>        <fctr>      <fctr>
# 1       a1       b1 -0.03689665 -0.5253532 0.6054150        19 -0.1838941 0.11010078 Paired t-test   two.sided
# 2       a2       b2 -0.09111585 -1.2358669 0.2315703        19 -0.2454267 0.06319499 Paired t-test   two.sided
# 3       a3       b3  0.07515723  0.7721983 0.4494961        19 -0.1285545 0.27886900 Paired t-test   two.sided
# 4       a4       b4  0.04359102  0.4317255 0.6708003        19 -0.1677402 0.25492223 Paired t-test   two.sided

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-05-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-03
    • 1970-01-01
    相关资源
    最近更新 更多