【问题标题】:R ifelse by rows?R ifelse 按行?
【发布时间】:2021-02-23 00:30:13
【问题描述】:

我有一个我正在使用的数据集示例:

 ex <- structure(list(reg_desc = c("1-Northeast Region", "1-Northeast Region", 
"1-Northeast Region", "1-Northeast Region", "1-Northeast Region"
), state = c("04-Connecticut", "05-Maine", "04-Connecticut", 
"05-Maine", NA), trigger_city = c("14860-Bridgeport-Stamford-Norwalk", 
"12620-Bangor", NA, NA, NA), Category = c("M", "M", "S", "S", 
"R"), Cred_Fac = c(0, 0, 0.317804971641414, 0, 1), Mean = c(50323.3311111111, 
48944.4266666667, 44220.8220792079, 43724.1495, 50492.0654351396
)), row.names = c(1L, 7L, 118L, 119L, 136L), class = "data.frame")

我有一个类别列,其中 M 是大都市级别的行,S 是州级别,R 是地区级别。我想根据我想实现的 if-else 语句创建一个新列,但似乎无法正确实现。

代码是:

ex %>% mutate(New_Mean = if(any(Cred_Fac == 1) Mean else if(any(Cred_Fac < 1) if(Category == 'S' & Cred_Fac == 1) M_Mean * M_Cred+R_Mean*(1-M_Cred_Fac) if(Category == 'R' & Cred_Fac == 1) M_Mean * M_Cred+R_Mean*(1-M_Cred_Fac))

我的逻辑应该是:如果在M级别,Cred_Fac为1,那么保持Mean不变;如果小于 1,则转到州级,如果在州级 Cred_Fac 为 1,则执行 M_Mean * M_Cred+R_Mean*(1-M_Cred_Fac);如果在州级 Cred_Fac 不是 1,则重复该过程。

我想我的一个想法是创建新列,其中每一行还包含州和地区信息,例如:

    hi1 <- data.frame(reg_desc = c("1-Northeast Region", "1-Northeast Region", 
                                  "1-Northeast Region", "1-Northeast Region", "1-Northeast Region"
), state = c("04-Connecticut", "05-Maine", "04-Connecticut", 
             "05-Maine", NA), trigger_city = c("14860-Bridgeport-Stamford-Norwalk", 
                                               "12620-Bangor", NA, NA, NA), Category = c("M", "M", "S", "S", 
                                                                                         "R"), Cred_Fac = c(0, 0, 0.317804971641414, 0, 1), Mean = c(50323.3311111111, 
                                                                                                                                                     48944.4266666667, 44220.8220792079, 43724.1495, 50492.0654351396),
State_Cred_Fac = c(0.317805,0.000000,NA,NA,NA),Mean_State = c(44220.82,43724.15,NA,NA,NA),
Reg_Cred_Fac = c(1.000000,1.000000,1.000000,1.000000,NA),
Mean_Region = c(50492.07,50492.07,50492.07,50492.07,NA))

之后,我就可以了

new <- hi1 %>% mutate(New_Mean = ifelse(Cred_Fac == 1,Mean,ifelse(Cred_Fac < 1 & (State_Cred_Fac == 1 & !is.na(State_Cred_Fac)), Mean*Cred_Fac+State_Mean*(1-Cred_Fac),
                                                      ifelse(Cred_Fac < 1 & Reg_Cred_Fac == 1, Mean*Cred_Fac+Mean_Region*(1-Cred_Fac),NA))))

这给了我我正在寻找的最终结果,但我想知道是否有任何方法可以在无需插入新列的情况下按行执行此操作?我在一个小例子上做了这个,所以我不确定如何在不手动插入值的情况下更大规模地创建列 State_Mean、State_Cred_Fac 等。任何建议和意见将不胜感激!

【问题讨论】:

  • 当您执行any 时,它会检查所有行并给出单个 TRUE 或 FALSE
  • 你在做一个 group_by 即ex %&gt;% group_by(reg_desc) %&gt;% ..
  • 前两个 if 语句中的括号不匹配。另外,你有连续的表达式,没有任何中断。
  • 在更大的数据中,我最终会做一个 group_by reg_desc @akrun
  • 我是否可以对原始数据执行我在最后一个 sn-p 代码中使用的 ifelse() 语句,而不是第一个 if 语句?

标签: r if-statement tidyverse dplyr


【解决方案1】:

这是一种将适当的州和地区数据连接起来以帮助计算每一行的方法。

library(tidyverse)

ex_augmented <- ex %>%
  left_join(ex %>% filter(Category == "R") %>%
              select(reg_desc, R_Cred_Fac = Cred_Fac, R_Mean = Mean)) %>%
  left_join(ex %>% filter(Category == "S") %>%
              select(state, S_Cred_Fac = Cred_Fac, S_Mean = Mean)) %>%
  mutate(M_Cred = if_else(Category == "M", Cred_Fac, 0),
         M_Mean = if_else(Category == "M", Mean, 0),
         across(everything(), ~replace_na(.x, 0))) %>%
  select(-Cred_Fac, -Mean)


#> ex_augmented
#            reg_desc          state                      trigger_city Category R_Cred_Fac   R_Mean S_Cred_Fac   S_Mean M_Cred   M_Mean
#1 1-Northeast Region 04-Connecticut 14860-Bridgeport-Stamford-Norwalk        M          1 50492.07   0.317805 44220.82      0 50323.33
#2 1-Northeast Region       05-Maine                      12620-Bangor        M          1 50492.07   0.000000 43724.15      0 48944.43
#3 1-Northeast Region 04-Connecticut                                 0        S          1 50492.07   0.317805 44220.82      0     0.00
#4 1-Northeast Region       05-Maine                                 0        S          1 50492.07   0.000000 43724.15      0     0.00
#5 1-Northeast Region              0                                 0        R          1 50492.07   0.000000     0.00      0     0.00

现在我们可以进行在我看来更简单的计算。我的结果与第一行不同,因为我使用了似乎更简单的逻辑来使用部分可信度数据:按可信度打折,并添加更高比例数字的影响,直到总可信度 = 1。所以对于第 1 行,使用0% Metro cred,我选择了 32% 的州 + 68% 的地区,而不是 100% 的地区。这对我来说似乎更一致,但也许我需要消化你使用的更多逻辑来理解。

ex_augmented %>%
  # State and Region credibility add successively to Metro to get to 1
  mutate(S_Cred = pmax(0, S_Cred_Fac - M_Cred),
         R_Cred = pmax(0, R_Cred_Fac - S_Cred_Fac - M_Cred)) %>%

  # new_mean is wtd avg of all terms
  mutate(new_mean = M_Mean * M_Cred + S_Mean * S_Cred + R_Mean * R_Cred) %>%

  # sorting the columns for nicer reading
  select(reg_desc:Category, M_Mean, S_Mean, R_Mean, M_Cred, S_Cred, R_Cred, new_mean)



            reg_desc          state                      trigger_city Category   M_Mean   S_Mean   R_Mean M_Cred   S_Cred   R_Cred new_mean
1 1-Northeast Region 04-Connecticut 14860-Bridgeport-Stamford-Norwalk        M 50323.33 44220.82 50492.07      0 0.317805 0.682195 48499.03
2 1-Northeast Region       05-Maine                      12620-Bangor        M 48944.43 43724.15 50492.07      0 0.000000 1.000000 50492.07
3 1-Northeast Region 04-Connecticut                                 0        S     0.00 44220.82 50492.07      0 0.317805 0.682195 48499.03
4 1-Northeast Region       05-Maine                                 0        S     0.00 43724.15 50492.07      0 0.000000 1.000000 50492.07
5 1-Northeast Region              0                                 0        R     0.00     0.00 50492.07      0 0.000000 1.000000 50492.07

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-12-01
    • 1970-01-01
    • 2021-03-29
    相关资源
    最近更新 更多