【问题标题】:Imputation of Missing Values by Categorical Mean?通过分类均值估算缺失值?
【发布时间】:2019-08-10 10:45:05
【问题描述】:

我有一个包含多列的数据集,其中一列缺少所需的数据块。

缺少数据的列 df$Variable 始终归属于特定人员 df$Name。每当 df$Variable 中缺少数据时,是否有办法估算每个人的平均值,而不是整个数据集的平均值?

我一直在玩 imputeTS 库。

【问题讨论】:

标签: r statistics imputation


【解决方案1】:

在没有看到您的数据框的情况下,我相信这会起作用。

set.seed(7)
# make some fake data
df <- data.frame(Name = rep(as.character(c("A", "B", "C", "D")), 10), Variable = sample(1:100, 40))
# change some to NA
df[which(df$Variable > 40),"Variable"] <- NA

# Fill in NA's for D with the mean of D
df[which(df$Name == "D" & is.na(df$Variable)),"Variable"] <-
  mean(df[which(df$Name == "D"),"Variable"], na.rm = TRUE)

您还可以循环访问其他“变量”

variable_vec <- c("A", "B", "C", "D")
for(i in 1:length(variable_vec)){
df[which(df$Name == i & is.na(df$Variable)),"Variable"] <-
  mean(df[which(df$Name == i),"Variable"], na.rm = TRUE)
}

【讨论】:

    【解决方案2】:

    如果没有可重复的例子,很难做出明确的回答,但鉴于你所说的,这样的事情应该可行:

    library('tidyverse')
    
    df <- data.frame(Name = c(rep("A", 5), rep("B", 5)),
                     Variable = sample(c(1, 2, 3, NA), 10, replace = TRUE))
    
    df %>%
      group_by(Name) %>%
      mutate(non_na_mean = mean(Variable, na.rm = T)) %>%
      ungroup() %>%
      mutate(newVariable = ifelse(is.na(Variable), non_na_mean, Variable))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-02-27
      • 2014-10-04
      • 1970-01-01
      • 2014-04-12
      • 1970-01-01
      • 2017-06-23
      • 2020-04-18
      相关资源
      最近更新 更多