【发布时间】:2020-04-29 19:30:51
【问题描述】:
我有两个要合并的数据框。
“数据”看起来像:
Filled_Ticker2LP publishYear CO_1_Name
1: SONC 2005 sonic corp
2: SONC 2005 sonic corp
3: <NA> 2005 cascade bancorp inc.
4: JCP 2005 jc penney company inc
“comp”看起来像:
tic fyear conm
<chr> <int> <chr>
1 JCP 2004 penney (j c) co
2 JCP 2005 penney (j c) co
3 JCP 2006 penney (j c) co
4 JCP 2007 penney (j c) co
5 JCP 2008 penney (j c) co
我想使用 left_join(或来自 data.table 包等的东西)将这两个数据集连接在一起
我目前可以根据年份和符号加入它,例如SONC、JCP。
mergedData <- data %>%
left_join(comp, by = c("Filled_Ticker2LP" = "tic", "publishYear" = "fyear"))
“mergedData”看起来像:
Filled_Ticker2LP publishYear CO_1_Name conm
1: SONC 2005 sonic corp sonic corp
2: SONC 2005 sonic corp sonic corp
3: <NA> 2005 cascade bancorp inc. <NA>
4: JCP 2005 jc penney company inc penney (j c) co
效果很好,但它在Filled_Ticker2LP 列中有一个NA(来自data 数据集。
我想尝试使用我拥有的当前方法加入数据,但如果Filled_Ticker2LP 列中有NA 值,我想将“匹配链接”从加入tic 更改和Filled_Ticker2LP 匹配公司名称。 conm 或 CO_1_Name。
也就是说,数据当前无法加入,因为在Filled_Ticker2LP 列下的data 数据集中,用于观察3 具有NA 值。但是,该观察仍然可以与comp 数据结合,因为在data$CO_1_Name 观察3 列中具有cascade Bancorp。此结果也出现在conm 列下的观察 30 - 53 的comp 数据中。
我在想一个if 声明:
如果 data$Filled_Ticker2LP 中不是 NA,则使用 Filled_Ticker2LP 加入 和 tic else 使用 CO_1_Name 和 conm 加入。
附加
我还注意到列中有一些空格。
data %>%
mutate(
CO_1_Name = str_trim(CO_1_Name)
)
数据1:
comp <- structure(list(tic = c("JCP", "JCP", "JCP", "JCP", "JCP", "JCP",
"JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP",
"SONC", "SONC", "SONC", "SONC", "SONC", "SONC", "SONC", "SONC",
"SONC", "SONC", "SONC", "SONC", "SONC", "SONC", "CACB", "CACB",
"CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB",
"CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB", "CACB",
"CACB", "CACB", "CACB", "CACB", "CACB", "CACB"), fyear = c(2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 2017L, 2018L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L,
2018L, 2005L, 2005L, 2006L, 2006L, 2007L, 2007L, 2008L, 2008L,
2009L, 2009L, 2010L, 2010L, 2011L, 2011L, 2012L, 2012L, 2013L,
2013L, 2014L, 2014L, 2015L, 2015L, 2016L, 2016L), conm = c("penney (j c) co",
"penney (j c) co", "penney (j c) co", "penney (j c) co", "penney (j c) co",
"penney (j c) co", "penney (j c) co", "penney (j c) co", "penney (j c) co",
"penney (j c) co", "penney (j c) co", "penney (j c) co", "penney (j c) co",
"penney (j c) co", "penney (j c) co", "sonic corp", "sonic corp",
"sonic corp", "sonic corp", "sonic corp", "sonic corp", "sonic corp",
"sonic corp", "sonic corp", "sonic corp", "sonic corp", "sonic corp",
"sonic corp", "sonic corp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp", "cascade bancorp", "cascade bancorp",
"cascade bancorp", "cascade bancorp")), row.names = c(NA, -53L
), class = c("tbl_df", "tbl", "data.frame"))
数据 2:
data <- structure(list(Filled_Ticker2LP = c("SONC", "SONC", NA, "JCP",
"JCP", "JCP", "SONC", "SONC", "JCP", "JCP", "JCP", "JCP", "SONC",
"JCP", "JCP", "JCP", "SONC", "JCP", "JCP", "SONC", "JCP", "JCP",
"JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP", "JCP",
"JCP", "JCP", "JCP", "JCP", "SONC"), publishYear = c(2005L, 2005L,
2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 2008L, 2008L, 2008L,
2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L,
2011L, 2011L, 2012L, 2013L, 2015L, 2015L, 2016L), CO_1_Name = c(" sonic corp",
" sonic corp", " cascade bancorp inc.", " jc penney company inc",
" jc penney company inc", " jc penney company inc", " sonic corp",
" sonic corp", " jc penney company inc", " jc penney company inc",
" jc penney company inc", " jc penney company inc", " sonic corp",
" jc penney company inc", " jc penney company inc", " jc penney company inc",
" sonic corp", " jc penney company inc", " jc penney company inc",
" sonic corp", " jc penney company inc", " jc penney company inc",
" jc penney company inc", " macy's incorporated", " macy's incorporated",
" jc penney company inc", " macy's incorporated", " macy's incorporated",
" jc penney company inc", " apple inc", " apple inc", " macy's incorporated",
" jc penney company inc", " jc penney company inc", " jc penney company inc",
" sonic corp")), .internal.selfref = <pointer: 0x55603dbefe00>, row.names = c(NA,
-36L), class = c("data.table", "data.frame"))
【问题讨论】:
标签: r