【问题标题】:Determine the average value of a variable for a specific level in R确定 R 中特定水平的变量的平均值
【发布时间】:2016-09-27 01:12:26
【问题描述】:

我正在尝试查找特定于我分配了不同变量的级别的变量的平均值(平均值)。

到目前为止,我创建了一个新变量,其中包含与之相关的各个级别:

  • 1 级:值
  • 2 级:值
  • 3 级:值 >= 1%。
pincome$income_growth <- ifelse(pincome$incomechng <= 0, "level 1",
                                ifelse(pincome$incomechng < 1,"level 2","level 3"))

现在我想确定与上述水平相关的另一个变量的平均值(例如,第 1 级的平均收入(收入增长低于 0%)。

我希望这是有道理的,我对 R 非常陌生,并试图掌握它!

【问题讨论】:

  • 我猜正确的方法是 with(DF, ave(v, level))with(DF, tapply(v, level)) 其中DF 是你的data.frame,v 是你的变量,level 是你的分组变量。要了解更多信息,请输入 ?ave?tapply

标签: r binning


【解决方案1】:

如果您想要基本 R,请尝试 by (?by)。如果您开始做更复杂的事情,plyr/dplyr 软件包非常棒,如果您要处理大量数据集并且不介意更多的初始学习曲线,data.table 包也很棒。

reproducible example 会很棒。

例如

set.seed(1) # so your random numbers are the same as mine
pincome <- data.frame(incomechng = runif(20, min=-1, max=3))

# what you had was fine too; using ?cut is another way to do it
# have just put it in for demonstration purposes.
# though `cut` uses intervals like (a, b] or [a, b) whereas yours
#  are (-Inf, 0] (0, 1) [1, Inf) which is a little different.    
pincome$income_growth <- cut(pincome$incomechng,
                             breaks=c(-Inf, 0, 1, Inf),
                             labels=paste("level", 1:3))

现在我们可以取每个组的平均值。我已经展示了三个选项;我敢肯定还有更多。

# base R ?by
by(pincome$incomechng, pincome$income_growth, mean)
# pincome$income_growth: level 1
# [1] -0.6848674
# ------------------------------------------
# pincome$income_growth: level 2
# [1] 0.4132334
# ------------------------------------------
# pincome$income_growth: level 3
# [1] 1.772039

# plyr (dplyr has pipe syntax you may prefer but is otherwise the same)
library(plyr)
ddply(pincome, .(income_growth), summarize, avgIncomeGrowth=mean(incomechng))
#   income_growth avgIncomeGrowth
# 1       level 1      -0.6848674
# 2       level 2       0.4132334
# 3       level 3       1.7720395

# data.table
library(data.table)
setDT(pincome)
pincome[, list(avgIncomeGrowth=mean(incomechng)), by=income_growth]
#    income_growth avgIncomeGrowth
# 1:       level 2       0.4132334
# 2:       level 3       1.7720395
# 3:       level 1      -0.6848674

【讨论】:

    【解决方案2】:

    如果您想要一个 tidyverse 解决方案:

    library(tidyverse)
    pincome %>%
     mutate(income_growth = case_when(incomechng <= 0 ~ "level 1",
                                      incomechng < 1 ~ "level 2",
                                      TRUE ~ "level 3")) %>%
     group_by(income_growth) %>%
     summarize(avgIncomeGrowth = mean(incomechng,na.rm=TRUE))
    

    【讨论】:

      猜你喜欢
      • 2021-02-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-02-23
      • 2020-11-20
      • 2020-10-09
      • 1970-01-01
      • 2015-07-28
      相关资源
      最近更新 更多