【发布时间】:2019-03-15 15:50:23
【问题描述】:
我有一个如下所示的数据集:
set.seed(2)
origin <- rep(c("DEU", "GBR", "ITA", "NLD", "CAN", "MEX", "USA", "CHN", "JPN", "KOR","DEU", "GBR", "ITA", "NLD", "CAN", "MEX", "USA", "CHN", "JPN", "KOR"), 2)
year <- rep(c(1998,1998,1998,1998,1998,1998,1998,1998,1998,1998,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000), 2)
value <- sample(1:10000, size=length(origin), replace=TRUE)
test.df <- as.data.frame(cbind(origin, year, value))
rm(origin, year, value)
然后我有 2 个列表。
第一个是使用ISOcodes 库构建的按地区列出的国家/地区列表,如下所示:
library("ISOcodes")
list.continent <- list(asia = c("Central Asia", "Eastern Asia", "South-eastern Asia", "Southern Asia", "Western Asia"),
africa = c("Northern Africa", "Sub-Saharan Africa", "Eastern Africa", "Middle Africa", "Southern Africa", "Western Africa"),
europe = c("Eastern Europe", "Northern Europe", "Channel Islands", "Southern Europe", "Western Europe"),
oceania = c("Australia and New Zealand", "Melanesia", "Micronesia", "Polynesia"),
northamerica = c("Northern America"),
latinamerica = c("South America", "Central America", "Caribbean"))
country.list.continent <- sapply(list.continent, function(item) {
region <- subset(UN_M.49_Regions, Name %in% item)
sub <- subset(UN_M.49_Countries, Code %in% unlist(strsplit(region$Children, ", ")))
return(sub$ISO_Alpha_3)
}, simplify = FALSE)
rm(list.continent)
还有一份年表:
year.list <- levels(as.factor(unique(test.df$year)))
我想用与特定年份的精确区域相对应的计算数字填充矩阵。矩阵如下:
ncol <- length(year.list)
nrow <- length(country.list.continent)
matrix.extraction <- matrix(, nrow = nrow, ncol = ncol)
rownames(matrix.extraction) <- names(country.list.continent)
colnames(matrix.extraction) <- year.list
为了进行我的计算,我有一个循环能够将数据集子集太大,否则......循环基于年份(相当于colnames(matrix.extraction))。这个想法是计算每年代表每个国家/地区价值的(以百分比为单位)。计算部分足够简单并且运行良好。当我需要将值归因于每一行时,我的问题就出现了。
for(i in 1:length(colnames(matrix.extraction))){
### I subset and compute what I want
table.temp <- test.df %>%
subset(year == colnames(matrix.extraction)[i]) %>%
group_by(origin) %>%
summarise(value = sum(value, na.rm = TRUE))
table.temp$percent <- prop.table(table.temp$value)
### then I need to attribute the wanted values
matrix.extraction["ROWNAME",i] <- table.temp %>%
subset(origin %in% country.list.continent$"ROWNAME") %>%
summarise(. ,sum = sum(percent)))
}
我真的不知道我该怎么做。
预期的结果是一个像这样的矩阵:
1998 2000
asia here NA
africa NA NA
europe NA NA
oceania NA NA
northamerica NA NA
latinamerica NA NA
用 [1,1] 中的“here”代替 rowname 中该区域的每个国家/地区在 colname 中年份的值的总和。
任何帮助将不胜感激。
【问题讨论】:
-
@RonakShah 问题已编辑
标签: r for-loop matrix-multiplication