【问题标题】:summarise() doesn't recognize a variablesummarise() 无法识别变量
【发布时间】:2021-02-27 22:18:20
【问题描述】:

我想知道为什么我在下面得到Error: Problem with summarise() input wt_avg

library(tidyverse)

CA_vacc <- read_csv('https://raw.githubusercontent.com/rnorouzian/e/master/2017-2018%20CA%20Vaccination%20Data.csv',
na = c(".","--*"))

CA_vacc %>% summarise(
    wt_avg = sum(HEPB_percent * ENROLLMENT, na.rm = TRUE) / sum(ENROLLMENT, na.rm = TRUE)
  )


# Error: Problem with `summarise()` input `wt_avg`.

【问题讨论】:

  • HEPB_percent 是一个字符 "783%" 例如。您的数据中还有 99%,这可能是他们的 NA 版本,我会检查来源
  • 您需要转换为数字并进行一些预处理。 as.numeric(str_remove_all(CA_vacc$HEPB_percent, "\\?|%"))

标签: r function dataframe dplyr tidyverse


【解决方案1】:

这行得通吗:

library(dplyr)
library(readr)
CA_vacc %>% summarise(
  wt_avg = sum(parse_number(HEPB_percent) * ENROLLMENT, na.rm = TRUE) / sum(ENROLLMENT, na.rm = TRUE)
+ )
# A tibble: 1 x 1
  wt_avg
   <dbl>
1   96.8

【讨论】:

    【解决方案2】:
     library(tidyverse)
     CA_vacc  %>%
      mutate(HEPB_percent = as.numeric(str_remove_all(CA_vacc$HEPB_percent, "\\?|%"))) %>%
      summarise(
      wt_avg = sum(HEPB_percent * ENROLLMENT, na.rm = TRUE) / sum(ENROLLMENT, na.rm = TRUE)
    )
    

    【讨论】:

      【解决方案3】:

      使用base R

      with(CA_vacc, sum(as.numeric(gsub("[?%]", "", HEPB_percent)) * 
            ENROLLMENT, na.rm = TRUE)/sum(ENROLLMENT, na.rm = TRUE))
      #[1] 96.76707
      

      【讨论】:

        猜你喜欢
        • 2013-05-15
        • 2014-06-10
        • 2017-01-04
        • 2013-08-13
        • 2018-11-12
        • 2013-02-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多