【发布时间】:2018-08-31 19:17:47
【问题描述】:
我有一个数据帧列表(由初始数据帧的排列顺序生成),我想使用group_by_at() 和mutate() 对其应用复杂的微积分。它适用于单个数据框,但使用for 循环失败,因为mutate 需要数据框的名称和我的一些微积分。所以我想,好吧,让我们创建一个所有具有相同名称的不同数据帧的列表,并循环遍历最初的名称序列。不幸的是,这个技巧不起作用,我收到以下消息:
Error: object of type 'closure' is not subsettable.
这是显示我所有步骤的自包含示例。我认为问题来自mutate。那么,我怎样才能强制使用for loop 和mutate?
data <- read.table(text = 'obs gender ageclass weight year subdata income
1 F 1 10 yearA sub1 1000
2 M 2 25 yearA sub1 1200
3 M 2 5 yearB sub2 1400
4 M 1 11 yearB sub1 1350',
header = TRUE)
library(dplyr)
library(GiniWegNeg)
dataA <- select(data, gender, ageclass)
dataB <- select(data, -gender, -ageclass)
rm(data)
# Generate permutation of indexes based on the number of column in dataA
library(combinat)
index <- permn(ncol(dataA))
# Attach dataA to the previous list of index
res <- lapply(index, function(x) dataA[x])
# name my list keeping track of permutation order in dataframe name
names(res) <- unlist(lapply(res,function(x) sprintf('data%s',paste0(toupper(substr(colnames(x),1,1)),collapse = ''))))
# Create a list containing the name of each data.frame name
NameList <- unlist(lapply(res,function(x) sprintf('data%s',paste0(toupper(substr(colnames(x),1,1)),collapse = ''))))
# Define as N the number of columns/permutation/dataframes
N <- length(res)
# Merge res and dataB for all permutation of dataframes
res <- lapply(res,function(x) cbind(x,dataB))
# Change the name of res so that all data frames are named data
names(res) <- rep("data", N)
# APPLY FOR LOOP TO ALL DATAFRAMES
for (j in NameList){
runCalc <- function(data, y){
data <- data %>%
group_by_at(1) %>%
mutate(Income_1 = weighted.mean(income, weight))
data <- data %>%
group_by_at(2) %>%
mutate(Income_2 = weighted.mean(income, weight))
gini <- c(Gini_RSV(data$Income_1, data$weight), Gini_RSV(data$Income_2,data$weight))
Gini <- data.frame(gini)
colnames(Gini) <- c("Income_1","Income_2")
rownames(Gini) <- c(paste0("Gini_", y))
return(Gini)
}
runOtherCalc <- function(df, y){
Contrib <- (1/5) * df$Income_1 + df$Income_2
Contrib <- data.frame(Contrib)
colnames(Contrib) <- c("myresult")
rownames(Contrib) <- c(paste0("Contrib_", y)
return(Contrib)
}
# Run runCalc over dataframe data by year
df1_List <- lapply(unique(data$year), function(i) {
byperiod <- subset(data, year == i)
runCalc(byperiod, i)
})
# runCalc returns df which then passes to runOtherCalc, again by year
df1_OtherList <- lapply(unique(data$year), function(i)
byperiod <- subset(data, year == i)
df <- runCalc(byperiod, i)
runOtherCalc(df, i)
})
# Run runCalc over dataframe data by subdata
df2_List <- lapply(unique(data$subdata), function(i) {
byperiod <- subset(data, subdata == i)
runCalc(bysubdata, i)
})
# runCalc returns df which then passes to runOtherCalc, again by subdata
df2_OtherList <- lapply(unique(data$subdata), function(i)
bysubdata <- subset(data, subdata == i)
df <- runCalc(bysubdata, i)
runOtherCalc(df, i)
})
# Return all results in separate frames, then append by row in 2 frames
Gini_df1 <- do.call(rbind, df1_List)
Contrib_df1 <- do.call(rbind,df1_OtherList)
Gini_df2 <- do.call(rbind, df1_List)
Contrib_df2 <- do.call(rbind,df1_OtherList)
Gini <- rbind(Gini_df1, Gini_df2)
Contrib <- rbind(Contrib_df1, Contrib_df2)
}
【问题讨论】:
-
您在 dplyr 管道中缺少的东西是
purrr::map。我可以建议您在等待下面发布的潜在答案时查看 Hadley 的这段视频,解释如何准确解决这个问题:youtube.com/watch?v=rz3_FDVt9eg 此处幻灯片:speakerdeck.com/hadley/managing-many-models -
我做到了(并且在纸杯蛋糕示例中遇到了困难)。
purrr::map()是一个函数,用于将函数应用于列表的每个元素。我不知道这个功能。对于我的特定示例,不幸的是我并不完全知道如何计算它。作为一个新手,我想到了data <-map(datalist),其中datalist是我的数据框列表,但我不明白如何返回结果。 -
没有在你的
for循环中使用 j 变量。
标签: r list for-loop dataframe dplyr