总结具有连续和分类变量的数据集答案

【问题标题】：Summarizing a dataset with continuous and categorical variables总结具有连续和分类变量的数据集
【发布时间】：2015-08-23 11:04:52
【问题描述】：

如果一个数据集有混合变量：数值型和分类型，除了summary(dataset)之外，有没有办法总结它，其中每个类别的计数包含在分类变量和平均值中，sd包含在数值中变量？

当前我编写了一个代码 sn-p 在检查每一列是数字还是分类后生成一个列表。但是更简单的功能会很有用。

例如 data.frame(v1 = c(1:3),v2= c("a","b","b"))，其中所需的输出是：

V1，类型（num/cat），均值（v1），sd（v1） V2, type(num/cat), a, count(a), b, count(b)

【问题讨论】：

尝试dplyr 即library(dplyr);df1 %>% summarise_each(funs(class, mean, sd)) 要获得count，我猜你需要table(df1$v2)

标签： r summary categorical-data

【解决方案1】：

我认为您正在寻找“Hmisc”包中的函数describe()。详情请见the documentation。

【讨论】：

【解决方案2】：

是的，我正在查看分类变量的表格和数值变量的均值 + sd。对于研究论文中的描述性统计数据，通常会报告以下内容。

我写了以下内容：

agg_function <- function(data_agg)
{
desc_list <- list()

    for(j in 1:ncol(data_agg))
    {
        if(is.factor(data_agg[,j]))
        {
          desc_list[[j]] <- list(Variable = colnames(data_agg) [j],table(data_agg[,j]))   ## Table of counts of labels of categorical variables
        }
        else  
        {
          desc_list[[j]] <- data.frame(Variable = colnames(data_agg)[j],Mean=mean(data_agg[,j],na.rm=T),SD = sd(data_agg[,j],na.rm=T)) ## First and second moments of numerical variables
        }
}
return(desc_list)
}

但是有没有更有效的解决方案？

【讨论】：