【发布时间】:2020-06-09 12:54:09
【问题描述】:
我一直在寻找可能的解决方案,并找到了部分适用于我的应用程序的方法。我正在通过循环将丢失的数据添加到“块”中的现有数据框中。更新包含主表中不存在的行。
我遇到的问题是我需要插入第二个表中不存在于第一个表中的行,并在添加新行的地方用 0 填充 Value1 列。
(每个 SubDF 的行数超过 50k 行,每个 MainDF 可以有 50 个 SubDF 进行迭代,从而产生 250 万行 MainDF)
当前代码:(请原谅循环编码,它不起作用。仅用于说明)
Main_DF
df_list <- c(Sub_DF1, Sub_DF2, SubDF3)
for (i in df_list){
Sub_DF <- i
############## Code in question
setDT(Main_DF)
setDT(Sub_DF)
Main_DF[Sub_DF,
on=c("Path1", "Path2","File_Name", "ID"),
c("value2") := .(i.value2)]
###############
}
我尝试过的各种其他排列:
#
setDT(Main_DF)
setDT(Sub_DF)
setkeyv(Main_DF, c("Path1", "Path2","File_Name", "ID"))
setkeyv(Sub_DF, c("Path1", "Path2","File_Name", "ID"))
Main_DF<- Main_DF[Sub_DF]
#
Main_DF<- merge(Main_DF, Sub_DF, by = c("Path1", "Path2","File_Name", "ID"),
all = TRUE)
#
Main_DF[Sub_DF, on=c("Path1", "Path2","File_Name", "ID"),
c("Value2") := .(i.Value2)]
#
Main_DF[Sub_DF,]
#
Main_DF<- Main_DF[Sub_DF, on=c("Path1", "Path2","File_Name", "ID")]
#
Main_DF[Sub_DF, on=.("Path1", "Path2","File_Name", "ID"), `:=` (Value2=
i.Value2)]
#
Main_DF <- merge(Main_DF,Sub_DF,by=c("Path1", "Path2","File_Name", "ID"),all
=T, fill.NA = 0)
主DF
Path1 Path2 File_Name ID Value1
root home Sample1 1 1
root home Sample1 2 0
root home Sample1 7 1
root home Sample2 1 0
root home Sample2 2 1
root home Sample2 3 1
root home Sample2 8 1
root home Sample3 1 0
root home Sample3 2 1
root home Sample3 6 1
Sub DF(循环的第一次迭代)
Path1 Path2 File_Name ID Value2
root home Sample1 1 5000
root home Sample1 2 9000
root home Sample1 5 400
root home Sample1 6 3500
root home Sample1 7 8500
root home Sample1 8 2200
Sub DF(循环的第二次迭代)
Path1 Path2 File_Name ID Value2
root home Sample2 1 5000
root home Sample2 2 9000
root home Sample2 3 700
root home Sample2 5 400
root home Sample2 6 3500
root home Sample2 7 8500
root home Sample2 8 2200
Sub DF(循环的第三次迭代)
Path1 Path2 File_Name ID Value2
root home Sample3 1 5000
root home Sample3 2 9000
root home Sample3 5 400
root home Sample3 6 3500
root home Sample3 7 8500
root home Sample3 8 2200
实际更新的主 DF(迭代 3 个子 DF 后)
Path1 Path2 File_Name ID Value1 Value2
root home Sample1 1 1 5000
root home Sample1 2 0 9000
root home Sample1 7 1 8500
root home Sample2 1 0 5000
root home Sample2 2 1 9000
root home Sample2 3 1 700
root home Sample2 8 1 8800
root home Sample3 1 0 5000
root home Sample3 2 1 9000
root home Sample3 6 1 3500
希望更新主 DF
Path1 Path2 File_Name ID Value1 Value2
root home Sample1 1 1 5000
root home Sample1 2 0 9000
root home Sample1 5 1 400
root home Sample1 6 1 3500
root home Sample1 7 0 8500
root home Sample1 8 0 2200
root home Sample2 1 0 5000
root home Sample2 2 1 9000
root home Sample2 3 1 700
root home Sample2 5 0 400
root home Sample2 6 0 3500
root home Sample2 7 0 8500
root home Sample2 8 1 2200
root home Sample3 1 0 5000
root home Sample3 2 1 9000
root home Sample3 5 0 400
root home Sample3 6 1 3500
root home Sample3 7 0 8500
root home Sample3 8 0 2200
【问题讨论】:
-
@chinsoon12 这是第一次!我需要通过添加一个额外的列和当前不存在的行来更新一个表,然后使用与第一个相同格式的连续表继续更新该表。
-
@chinsoon12 在我的标题为“当前代码”的块中,有一个部分被标记为“有问题的代码”。该代码正在对表执行左连接,但我需要完全连接才能从右表添加左侧不存在的行。
标签: r merge data.table