【问题标题】:R - Getting values from other columns when conditions are metR - 满足条件时从其他列获取值
【发布时间】:2020-06-07 05:26:57
【问题描述】:

我有一个类似这样的数据表。

Firm Year Moveyear Address OriginAddress DestinationAddress
  A  2000                     
  A  2001 2001              15Grand_Ave     700Grand_Ave
  A  2002
  A  2003 2003              700Grand_Ave    20Washington_Ave
  A  2004
  B  2000
  B  2001 
  B  2002 2002              2730State_st    40Washington_Ave
  B  2003
  B  2004
  C
  .
  .

这是一个面板数据集,显示了每个公司多年的搬迁信息。我想通过使用 'OriginAddress' 和 'DestinationAddress' 列将地址信息添加(或分配)到 'Address' 列。

例如,15Grand_Ave 应该在 2000 年分配给公司 A 的地址列,因为它是公司在 2001 年搬到 700Grand_Ave 之前的原始地址。而 700Grand_Ave 应该在 2001 年和 2002 年分配给公司 A 的地址列,因为它是它的在 2003 年搬到 20Washington_Ave 之前的地址。

所以我想要的结果:

Firm Year Moveyear  Address        OriginAddress DestinationAddress
  A  2000         15Grand_Ave            
  A  2001 2001    700Grand_Ave      15Grand_Ave     700Grand_Ave
  A  2002         700Grand_Ave
  A  2003 2003    20Washington_Ave  700Grand_Ave    20Washington_Ave
  A  2004         20Washington_Ave
  B  2000         2730State_st
  B  2001         2730State_st
  B  2002 2002    40Washington_Ave  2730State_st    40Washington_Ave
  B  2003         40Washington_Ave
  B  2004         40Washington_Ave
  C
  .
  .

我猜我需要在 R 中使用 for-loop 和 ifelse 语句,但我在编码时遇到了麻烦。请与我分享任何想法。

【问题讨论】:

  • 您好 Chicago2017,请在您的 data.frame/data.table 上调用 dput 函数,然后将控制台的输出复制并粘贴到您的问题中。

标签: r for-loop if-statement


【解决方案1】:

这是使用dplyr 的一种方式:

library(dplyr)

df %>%
  #Replace blanks with NA
  na_if('') %>%
  #Arrange data by Firm and Year 
  arrange(Firm, Year) %>%
  #Copy destination address to Address
  mutate(Address = DestinationAddress) %>%
  #For each Firm
  group_by(Firm) %>%
  #Fill NA values with previous non-NA values
  tidyr::fill(Address) %>%
  #Replace NA with first non-NA value in OriginAddress
  mutate(Address = replace(Address,is.na(Address),first(na.omit(OriginAddress))))


#  Firm   Year Moveyear Address          OriginAddress DestinationAddress
#   <chr> <int>    <int> <chr>            <chr>         <chr>             
# 1 A      2000       NA 15Grand_Ave      NA            NA                
# 2 A      2001     2001 700Grand_Ave     15Grand_Ave   700Grand_Ave      
# 3 A      2002       NA 700Grand_Ave     NA            NA                
# 4 A      2003     2003 20Washington_Ave 700Grand_Ave  20Washington_Ave  
# 5 A      2004       NA 20Washington_Ave NA            NA                
# 6 B      2000       NA 2730State_st     NA            NA                
# 7 B      2001       NA 2730State_st     NA            NA                
# 8 B      2002     2002 40Washington_Ave 2730State_st  40Washington_Ave  
# 9 B      2003       NA 40Washington_Ave NA            NA                
#10 B      2004       NA 40Washington_Ave NA            NA   

数据

df <- structure(list(Firm = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B"), Year = c(2000L, 2001L, 2002L, 2003L, 2004L, 2000L, 
2001L, 2002L, 2003L, 2004L), Moveyear = c(NA, 2001L, NA, 2003L, 
NA, NA, NA, 2002L, NA, NA), Address = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA), OriginAddress = c("", "15Grand_Ave", "", "700Grand_Ave", 
"", "", "", "2730State_st", "", ""), DestinationAddress = c("", 
"700Grand_Ave", "", "20Washington_Ave", "", "", "", "40Washington_Ave", 
"", "")), class = "data.frame", row.names = c(NA, -10L))             

【讨论】:

  • 非常感谢您的及时回复,Ronak!太不可思议了!此外,一旦我将这些代码应用于我的代码,我就会投票。我还没有将这些代码应用到我的数据集,但我认为它肯定适用于我的数据集(如果没有,我会在这里发表评论)。再次,非常感谢!
【解决方案2】:

基础 R 解决方案:

# Replace empty strings with NA_character: df_clean => data.frame
df_clean <- replace(df, df == "", NA_character_)

# Split apply combine NA filling function per firm: data.frame => stdout (console)
data.frame(do.call("rbind", lapply(split(df_clean, df_clean$Firm), function(x){
  x <- x[order(x$Year),]
  x$Address[1] <- ifelse(is.na(x$Address[1]), 
  x$OriginAddress[which(!(is.na(x$OriginAddress)))[1]],
                         x$Address[1])
  x$Address[which(is.na(x$Address))] <- x$DestinationAddress[which(is.na(x$Address))]
  x$Address <- na.omit(x$Address)[cumsum(!(is.na(x$Address)))]
  return(x)
    }
  )
), row.names = NULL)

使用数据感谢@Ronak Shah:

df <- structure(list(Firm = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B"), Year = c(2000L, 2001L, 2002L, 2003L, 2004L, 2000L, 
2001L, 2002L, 2003L, 2004L), Moveyear = c(NA, 2001L, NA, 2003L, 
NA, NA, NA, 2002L, NA, NA), Address = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA), OriginAddress = c("", "15Grand_Ave", "", "700Grand_Ave", 
"", "", "", "2730State_st", "", ""), DestinationAddress = c("", 
"700Grand_Ave", "", "20Washington_Ave", "", "", "", "40Washington_Ave", 
"", "")), class = "data.frame", row.names = c(NA, -10L))     

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-01-25
    • 2021-05-12
    • 2022-12-09
    • 1970-01-01
    • 2022-11-15
    • 2018-12-13
    • 2021-08-25
    • 2020-09-08
    相关资源
    最近更新 更多