【问题标题】:R: mutate(indicator variable) using conditions from multiple datasetsR: mutate(indicator variable) 使用来自多个数据集的条件
【发布时间】:2020-05-04 21:33:30
【问题描述】:

我有两个不同年份(2008 年和 2009 年)的两个数据集。这个想法是通过查看它们的 sales_units 和 Dollar_value 来识别新分子。如果在 2008 年某个分子没有任何销售额或美元价值,但在 2009 年该分子具有正销售额和美元价值,我想将其识别为新分子。我认为生成一个名为 New_Molecule 的指标变量,当有一个新分子时取 1,否则为 0,这是一个很好的方法。

######YEAR 2008 data##########
    Year <- c("2008", "2008", "2008", "2008","2008", "2008", "2008", "2008")
    Country <- c("US", "US","US", "US", "Canada", "Canada","Canada", "Canada")
    Molecule <- c("A", "B", "C", "D","A", "B", "C", "D")
    Dollar_Value <- c(0, 0, 100, 200, 75, 0, 0 ,0)
    Sales_Units <- c(0, 0, 20, 40, 5, 0, 0, 0)
    df_2008 <- data.frame(Year,Country, Molecule, Dollar_Value,Sales_Units)

######YEAR 2009 data##########

    Year <- c("2009", "2009", "2009", "2009","2009","2009", "2009", "2009", "2009","2009")
    Country <- c("US", "US","US", "US","US", "Canada", "Canada","Canada", "Canada","Canada")
    Molecule <- c("A", "B", "C", "D", "E","A", "B", "C", "D", "E")
    Dollar_Value <- c(500, 0, 100, 200,0, 75, 0, 0 ,99,0)
    Sales_Units <- c(60, 0, 20, 40,0,5, 0, 0, 27,0)
    df_2009 <- data.frame(Year, Country, Molecule, Dollar_Value,Sales_Units)

######Want to generate This##########

    Year <- c("2009", "2009", "2009", "2009","2009","2009", "2009", "2009", "2009","2009")
    Country <- c("US", "US","US", "US","US", "Canada", "Canada","Canada", "Canada","Canada")
    Molecule <- c("A", "B", "C", "D", "E","A", "B", "C", "D", "E")
    Dollar_Value <- c(500, 0, 100, 200,0, 75, 0, 0 ,99,0)
    Sales_Units <- c(60, 0, 20, 40,0,5, 0, 0, 27,0)
    New_Molecule <- c(1, 0, 0, 0,0,0,0,0,1,0)
    df_2009_NewColumn <- data.frame(Year, Molecule, Dollar_Value,Sales_Units,New_Molecule)

我尝试过的: 首先我尝试按年份、国家、分子对数据集进行分组,然后使用 mutate。

df_2008 <- group_by(df_2008,Year,Country,Molecule)
df_2009 <- group_by(df_2009,Year,Country,Molecule)

withnew <- mutate(df_2009, New_Molecule = case_when(df_2008$Dollar_Value ==0 & df_2008$Sales_Units ==0 & df_2009$Dollar_Value >0 & df_2009$Sales_Units >0 ~1,
                                  TRUE~0))

但这会给出错误消息:

Error: Column `New_Molecule` must be length 1 (the group size), not 10
In addition: Warning message:
In df_2008$Dollar_Value == 0 & df_2008$Sales_Units == 0 & df_2009$Dollar_Value >  :
  longer object length is not a multiple of shorter object length

然后我只是尝试了 mutate,但它没有根据我的需要生成指标变量。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    如果您使用right_join 将数据组合成宽格式,这会更容易。这样,您可以引用现在在同一行中的所有变量,以便与 ifelse 进行比较:

    right_join(df_2008, df_2009, 
              by = c("Country", "Molecule"), 
              suffix = c("_2008", "_2009")) %>%
       group_by(Country, Molecule) %>%
       mutate(New_Molecule = ifelse(Dollar_Value_2008 == 0 & 
                                    Sales_Units_2008  == 0 & 
                                    Dollar_Value_2009 >  0 & 
                                    Sales_Units_2009  >  0, 1, 0)) %>%
       ungroup() %>%
       transmute(Year = Year_2009, Country = Country, Molecule = Molecule,
                 Dollar_Value = Dollar_Value_2009, Sales_Units = Sales_Units_2009,
                 New_Molecule = New_Molecule)
    #> # A tibble: 10 x 6
    #>    Year  Country Molecule Dollar_Value Sales_Units New_Molecule
    #>    <fct> <fct>   <chr>           <dbl>       <dbl>        <dbl>
    #>  1 2009  US      A                 500          60            1
    #>  2 2009  US      B                   0           0            0
    #>  3 2009  US      C                 100          20            0
    #>  4 2009  US      D                 200          40            0
    #>  5 2009  US      E                   0           0            0
    #>  6 2009  Canada  A                  75           5            0
    #>  7 2009  Canada  B                   0           0            0
    #>  8 2009  Canada  C                   0           0            0
    #>  9 2009  Canada  D                  99          27            1
    #> 10 2009  Canada  E                   0           0            0
    

    【讨论】:

      猜你喜欢
      • 2020-07-22
      • 1970-01-01
      • 2017-11-10
      • 1970-01-01
      • 2013-06-14
      • 1970-01-01
      • 1970-01-01
      • 2016-06-02
      • 2020-06-02
      相关资源
      最近更新 更多