将函数应用于拆分数据帧的所有子集的一列答案

【问题标题】：Applying a function to one column over all subsets of a split dataframe将函数应用于拆分数据帧的所有子集的一列
【发布时间】：2014-01-26 16:06:21
【问题描述】：

我已经根据一列连续数据的子区间范围拆分了我的数据框：

Data1 <- read.csv(file.choose(), header = T)

# Order (ascending)by size
Group.order <- order(GroupN)

# Assign label to data frame ordered by group
Data1.group.order <- Data1[Group.order, ]

# Set a range of sub-intervals we wish to split the ordered data into
range <- seq(0, 300, by=75)

# Use the split function to split the ordered data, using the cut function which will           
# cut the numeric vector GroupN by the value 'range'
Split.Data1 <- split(Data1.group.order, cut(Data1.group.order$GroupN, range))

通过数据拆分，我现在需要在数据框的所有子集中找到其中一列的平均值，但尽管付出了很多努力，我还是很挣扎。

但是，我已经能够使用 lapply 函数在整个拆分数据框中找到多列的平均值，但不能单独找到一列。

任何帮助将不胜感激。

编辑：我是 R 新手，所以我真正想做的是查看数据框每个子集的变量 x 的分布，即 x 轴 = 0-75、75-150、150 -225, 225-300, y 轴 = 变量 x。我的计划是拆分数据，为数据帧的每个子集找到变量 x 的平均值，然后按我对数据帧进行子集划分的间隔绘制变量 x。但是，我确信有更好的方法来做到这一点！

【问题讨论】：

How to make a great R reproducible example?
或许lapply(split(DF, f), function(x) mean(x$column_of_interest)) 之类的内容会有所帮助
为什么一开始就滑倒它？也许使用 plyr、dplyr 或 data.table 包更好。
如果您需要帮助，您确实需要发布您的数据（或代表性子集），并显示您尝试过的代码。

标签： r function split apply

【解决方案1】：

plyr 这样的事情怎么样：

require(plyr) # library

dat<-data.frame(x=sample(1:300,300),y=runif(300)*10)   # create random data
head(dat)

#    x        y
#1 193 2.580328
#2 119 4.519489
#3  51 5.340437
#4 114 9.249253
#5 236 4.756849
#6 108 5.926478

ddply(dat,                                                 # use dat
      .(grp=cut(dat$x,seq(0,300,75),seq(0,300,75)[-1])),   # group by formula (cut)
      summarise,                                           # tell ddply to summarise
      mean=mean(y),                                        # calc mean
      sum=sum(y))                                          # calc sum

#  grp     mean      sum
#1  75 4.620653 346.5490
#2 150 5.337813 400.3360
#3 225 4.238518 317.8889
#4 300 4.996709 374.7532

【讨论】：