如何根据年份将“埃塞俄比亚”替换为“埃塞俄比亚（-1992）”和“埃塞俄比亚（1993-）”答案

【问题标题】：How to substitute "Ethiopia" with "Ethiopia (-1992)" and "Ethiopia (1993-)" based on the year如何根据年份将“埃塞俄比亚”替换为“埃塞俄比亚（-1992）”和“埃塞俄比亚（1993-）”
【发布时间】：2019-06-30 01:07:44
【问题描述】：

如果location_1 表示“埃塞俄比亚”并且年份对应于 1992 年之前的所有年份以及“埃塞俄比亚 (1993- )" 如果location_1 说“埃塞俄比亚”并且年份对应于 1993 年以后的所有年份。

不幸的是，即使是 1992 年之后的那些年，我想出的代码也全部替换为“埃塞俄比亚 (-1992)”。

以下是代码：

if (mydata$year >= 1992) {
  mydata$location_1 <- sub("Ethiopia", "Ethiopia (-1992)", mydata$location_1)
} else mydata$location_1 <- sub("Ethiopia", "Ethiopia (1993-)", mydata$location_1)

我希望我会根据年份将所有“埃塞俄比亚”变成“埃塞俄比亚（-1992）”或“埃塞俄比亚（1993-）”。相反，结果是所有“埃塞俄比亚”都变成了“埃塞俄比亚（-1992）”。

【问题讨论】：

假设您的 year 列包含的值 >=1992，则错误在于您的 if 条件。您正在将其对应的year 属于>= 1992 条件的所有值转换为"Ethiopia (-1992)"，而那些不符合此条件（其他块）的值将转换为"Ethiopia (-1993)"。这与您在问题中所说的完全相反。
你能分享一些你的数据吗？阅读此处了解如何操作：How to make a reproducible example in r?

标签： r dataframe if-statement substitution

【解决方案1】：

您可以替换数据子集中的列：

mydata[which(mydata$location_1=="Ethiopia" & mydata$year <= 1992), 
      "location1"] <- "Ethiopia (-1992)"

mydata[which(mydata$location_1=="Ethiopia" & mydata$year >  1992), 
       "location1"] <- "Ethiopia (1993-)"

或者使用dplyr:

library(dplyr)
df1 %>% 
  mutate(location_1=case_when(location_1=="Ethiopia" & year <= 1992 ~ "Ethiopia (-1992)",
                              location_1=="Ethiopia" & year > 1992 ~ "Ethiopia (1993-)",
                              TRUE ~ location_1))

【讨论】：

which 在这里是多余的；您可以直接按逻辑向量进行子集化

【解决方案2】：

data.table 方法。 data.table 是一个非常快的包，详情请查看?data.table：

mydata[location_1 == "Ethiopia" & !is.na(year), 
       location1 := ifelse(year <= 1992, 
                           "Ethiopia (-1992)", 
                           "Ethiopia (1993-)")

里面有什么：

mydata[location_1 == "Ethiopia" & !is.na(year), 过滤location_1 是埃塞俄比亚并且有一年的所有行（我们不想错误地为不可用的年份分配名称）。

location1 := 是一个赋值调用（:= 是赋值运算符）

ifelse(year <= 1992, x, y) 如果条件为真则返回x，否则返回y。

【讨论】：

【解决方案3】：

您使用的 if-else 条件应该在迭代循环中。一个for循环，例如：

for (i in 1:nrow(mydata)){
    if (mydata$location_1[i] == "Ethiopia") {
        if (mydata$year[i] <= 1992) mydata$location_1[i] <- "Ethiopia (-1992)"
        else mydata$location_1[i] <- "Ethiopia (1993-)"
    }
}

#### OUTPUT ####

   year       location_1
1  1994          Germany
2  1998          Germany
3  1993 Ethiopia (1993-)
4  1982          Germany
5  1989            China
6  1997 Ethiopia (1993-)
7  2001            China
8  1990            China
9  1984 Ethiopia (-1992)
10 1999 Ethiopia (1993-)

使用矢量化函数ifelse，您可以更紧凑地实现相同的目标（也许更快）：

mydata$location_1 <- ifelse(mydata$location_1 == "Ethiopia",
       ifelse(mydata$year <= 1992, "Ethiopia (-1992)", "Ethiopia (1993-)"),
       mydata$location_1
       )

就个人而言，我可能只是创建一个新变量，其国家名称后跟(-1992) 或(1993-)。它在语法上紧凑，相对较快，并且所有信息都得到维护，这对于以后的子集化很有用：

mydata$cy <- paste(mydata$location_1, ifelse(mydata$year <= 1992,
                                             "(-1992)", "(1993-)"
                                             ))

#### OUTPUT ####

   year location_1               cy
1  1994    Germany  Germany (1993-)
2  1998    Germany  Germany (1993-)
3  1993   Ethiopia Ethiopia (1993-)
4  1982    Germany  Germany (-1992)
5  1989      China    China (-1992)
6  1997   Ethiopia Ethiopia (1993-)
7  2001      China    China (1993-)
8  1990      China    China (-1992)
9  1984   Ethiopia Ethiopia (-1992)
10 1999   Ethiopia Ethiopia (1993-)

数据：

set.seed(123)

mydata <- data.frame(year = sample(1980:2004, 10, T),
                     location_1 = sample(c("Ethiopia", "Germany", "China"), 10, T),
                     stringsAsFactors = F
                     )

【讨论】：