【问题标题】:Calculating the weighted mean of all numerical columns计算所有数值列的加权平均值
【发布时间】:2020-12-07 22:43:32
【问题描述】:

示例数据:

library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50),                                                    # Creates a panel ID
                      Country = c(rep("Albania",30),rep("Belarus",50), rep("Chilipepper",20)),       
                      some_NA = sample(0:5, 6),                                             
                      some_NA_factor = sample(0:5, 6),         
                      Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                      wt = 15*round(runif(100)/10,2),
                      Income = round(rnorm(10,-5,5),2),
                      Happiness = sample(10,10),
                      Sex = round(rnorm(10,0.75,0.3),2),
                      Age = sample(100,100),
                      Educ = round(rnorm(10,0.75,0.3),2))           
DT [, uniqueID := .I]                                                                        # Creates a unique ID                                                                                # https://stackoverflow.com/questions/11036989/replace-all-0-values-to-na
DT$some_NA_factor <- factor(DT$some_NA_factor)

我想计算所有数值列的加权平均值,所以我尝试了:

DT_w <- DT[,lapply(Filter(is.numeric,.SD), function(x) weighted.mean(DT$wt, x, na.rm=TRUE)), by=c("Country", "Time")]

然后它说:

Error in weighted.mean.default(DT$wt, x, na.rm = TRUE) : 
  'x' and 'w' must have the same length

我想我可能误解了语法。我这样做对吗?

【问题讨论】:

    标签: r syntax data.table mean weighted


    【解决方案1】:

    两个问题:

    • 当您使用 DT$wt 时,这是对 DT 表中完整 wt 列的显式调用 - by 参数将不起作用。 by 参数仅适用于没有 DT$ 前缀的列。

    • weighted.mean() 的参数顺序首先是 x,其次是 w(权重) - 你似乎倒序了

    解决这两个问题:

    DT_w <- DT[,lapply(Filter(is.numeric,.SD), function(x) weighted.mean(x, w = wt, na.rm=TRUE)), by=c("Country", "Time")]
    # runs without errors
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-11-24
      • 2020-01-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-10-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多