【问题标题】:Replace values using the median in R使用 R 中的中值替换值
【发布时间】:2021-08-13 11:58:08
【问题描述】:

假设我的数据如下

df<-data.frame(name=c(rep("Aust", 20), rep("Fr", 20), rep("Spa", 20)),
       Threshold = c(rep(38.9, 20), rep(50.5, 20), rep(20, 20)),
       Fitted_Data= c(38.20784,35.52096, 37.05763, 36.19203,39.91685,38.19453,36.86204,38.51312,35.14895,35.41919,35.13218,35.46005,
                      37.48999,37.54950,38.68705,36.23085, 35.90234,38.50205,39.27153,38.03129, 48.19456,48.23224,51.25736,50.59195,
                      51.35283, 48.45300,50.81403,51.03964,50.97189,50.38674,50.59499,49.76958,49.93091,48.90412,51.19752,51.31885, 
                      50.54078,48.77288,48.11736,48.60201, 18.99013,21.63701,21.45867,21.96485,19.73159,18.76820,21.73579,18.68561,
                      21.62721,20.88826,21.66602,19.29559,21.39014,21.40296,20.17120,21.42481,19.05561,21.71352,19.36918,18.95138),
       Day= c("jueves", "martes",  "miércoles", "jueves", "viernes", "viernes", "domingo" , "lunes",  " martes", "miércoles", "domingo", "lunes"),
       Month = c(rep(c(1, 3, 1, 4, 1, 1, 7, 2, 3, 4, 6, 7, 12, 11, 1, 2, 10, 10, 7, 3),3))

)

我想使用ThresholdFitted_Data 创建一个新变量New_prices。如果Fitted_Data 中的观察值大于Fitted_data 中的观察值,那么我想估计同一天和同一天的中位数并替换它。我尝试过类似的方法。

df%>%group_by(name, Month, Day)%>%
mutate(New_prices = replace(New_prices, Fitted_Data>Threshold, median(Fitted_Data)))

但我有一个错误提示 Problem with mutate()inputNew_prices.

例如,我对第一组 (Aus) 的期望输出将是:

name Threshold Fitted_Data       Day Month
Aust      38.9    38.20784    jueves     1
Aust      38.9    35.52096    martes     3
Aust      38.9    37.05763 miércoles     1
Aust      38.9    36.19203    jueves     4
Aust      38.9    39.05569   viernes     1  #Replaced with median(38.19453, 39.91685)
Aust      38.9    38.19453   viernes     1
Aust      38.9    36.86204   domingo     7
Aust      38.9    38.51312     lunes     2
Aust      38.9    35.14895    martes     3
Aust      38.9    35.41919 miércoles     4
Aust      38.9    35.13218   domingo     6
Aust      38.9    35.46005     lunes     7
Aust      38.9    37.48999    jueves    12
Aust      38.9    37.54950    martes    11
Aust      38.9    38.68705 miércoles     1
Aust      38.9    36.23085    jueves     2
Aust      38.9    35.90234   viernes    10
Aust      38.9    38.50205   viernes    10
Aust      38.9    38.06678   domingo     7 #replaced with median(39.27153,  36.86204)
Aust      38.9    38.03129     lunes     3

【问题讨论】:

    标签: r replace tidyverse dplyr


    【解决方案1】:

    我不确定你到底想做什么;您提供的代码(带有replace)给出了错误,因为它提到了数据中不存在的变量。

    但是使用 mutateif_else 应该可以工作:

    df %>%
      group_by(name, Month, Day) %>%
      mutate(Fitted_Data = if_else(Fitted_Data > Threshold, median(Fitted_Data), Fitted_Data))
    

    当我在你的数据上运行它时给出你想要的结果:

    # A tibble: 60 x 5
    # Groups:   name, Month, Day [51]
       name  Threshold Fitted_Data Day         Month
       <fct>     <dbl>       <dbl> <fct>       <dbl>
     1 Aust       38.9        38.2 "jueves"        1
     2 Aust       38.9        35.5 "martes"        3
     3 Aust       38.9        37.1 "miércoles"     1
     4 Aust       38.9        36.2 "jueves"        4
     5 Aust       38.9        39.1 "viernes"       1
     6 Aust       38.9        38.2 "viernes"       1
     7 Aust       38.9        36.9 "domingo"       7
     8 Aust       38.9        38.5 "lunes"         2
     9 Aust       38.9        35.1 " martes"       3
    10 Aust       38.9        35.4 "miércoles"     4
    # … with 50 more rows
    

    【讨论】:

      猜你喜欢
      • 2021-08-13
      • 2018-10-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-25
      • 2019-10-02
      • 1970-01-01
      相关资源
      最近更新 更多