【发布时间】:2020-05-04 21:33:30
【问题描述】:
我有两个不同年份(2008 年和 2009 年)的两个数据集。这个想法是通过查看它们的 sales_units 和 Dollar_value 来识别新分子。如果在 2008 年某个分子没有任何销售额或美元价值,但在 2009 年该分子具有正销售额和美元价值,我想将其识别为新分子。我认为生成一个名为 New_Molecule 的指标变量,当有一个新分子时取 1,否则为 0,这是一个很好的方法。
######YEAR 2008 data##########
Year <- c("2008", "2008", "2008", "2008","2008", "2008", "2008", "2008")
Country <- c("US", "US","US", "US", "Canada", "Canada","Canada", "Canada")
Molecule <- c("A", "B", "C", "D","A", "B", "C", "D")
Dollar_Value <- c(0, 0, 100, 200, 75, 0, 0 ,0)
Sales_Units <- c(0, 0, 20, 40, 5, 0, 0, 0)
df_2008 <- data.frame(Year,Country, Molecule, Dollar_Value,Sales_Units)
######YEAR 2009 data##########
Year <- c("2009", "2009", "2009", "2009","2009","2009", "2009", "2009", "2009","2009")
Country <- c("US", "US","US", "US","US", "Canada", "Canada","Canada", "Canada","Canada")
Molecule <- c("A", "B", "C", "D", "E","A", "B", "C", "D", "E")
Dollar_Value <- c(500, 0, 100, 200,0, 75, 0, 0 ,99,0)
Sales_Units <- c(60, 0, 20, 40,0,5, 0, 0, 27,0)
df_2009 <- data.frame(Year, Country, Molecule, Dollar_Value,Sales_Units)
######Want to generate This##########
Year <- c("2009", "2009", "2009", "2009","2009","2009", "2009", "2009", "2009","2009")
Country <- c("US", "US","US", "US","US", "Canada", "Canada","Canada", "Canada","Canada")
Molecule <- c("A", "B", "C", "D", "E","A", "B", "C", "D", "E")
Dollar_Value <- c(500, 0, 100, 200,0, 75, 0, 0 ,99,0)
Sales_Units <- c(60, 0, 20, 40,0,5, 0, 0, 27,0)
New_Molecule <- c(1, 0, 0, 0,0,0,0,0,1,0)
df_2009_NewColumn <- data.frame(Year, Molecule, Dollar_Value,Sales_Units,New_Molecule)
我尝试过的: 首先我尝试按年份、国家、分子对数据集进行分组,然后使用 mutate。
df_2008 <- group_by(df_2008,Year,Country,Molecule)
df_2009 <- group_by(df_2009,Year,Country,Molecule)
withnew <- mutate(df_2009, New_Molecule = case_when(df_2008$Dollar_Value ==0 & df_2008$Sales_Units ==0 & df_2009$Dollar_Value >0 & df_2009$Sales_Units >0 ~1,
TRUE~0))
但这会给出错误消息:
Error: Column `New_Molecule` must be length 1 (the group size), not 10 In addition: Warning message: In df_2008$Dollar_Value == 0 & df_2008$Sales_Units == 0 & df_2009$Dollar_Value > : longer object length is not a multiple of shorter object length
然后我只是尝试了 mutate,但它没有根据我的需要生成指标变量。
【问题讨论】: