【问题标题】:Lookup value in one dataframe based on column name stored as a value in another dataframe根据存储为另一个数据框中的值的列名在一个数据框中查找值
【发布时间】:2014-02-18 19:16:41
【问题描述】:

请参阅下面的可重现(剪切 + 粘贴)示例。实际数据集对 11000 人进行了 4000 多次连续观察。我需要创建列 A、B、C 等,显示与“疾病”变量的特定值的第一次出现相对应的“药物”变量 X、Y、Z 等的数量。数字是指使用特定药物采取的行动(开始、停止、增加剂量等)。“疾病”变量是指疾病是否在具有多个阶段(包括发作和缓解)的疾病中发作。

例如:

Animal <- c("aardvark", "1", "cheetah", "dromedary", "eel", "1", "bison", "cheetah", "dromedary",     
"eel")
Plant <- c("apple_tree", "blossom", "cactus", "1", "bronze", "apple_tree", "bronze", "cactus",     
"dragonplant", "1")
Mineral <- c("amber", "bronze", "1", "bronze", "emerald", "1", "bronze", "bronze", "diamond",     
"emerald")
Bacteria <- c("acinetobacter", "1", "1", "d-strep", "bronze", "acinetobacter", "bacillus", 
"chlamydia", "bronze", "enterobacter" )
AnimalDrugA <- c(1, 11, 12, 13, 14, 15, 16, 17, 18, 19)
AnimalDrugB <- c(20, 1, 22, 23, 24, 25, 26, 27, 28, 29)
PlantDrugA <- c(301, 302, 1, 304, 305, 306, 307, 308, 309, 310)
PlantDrugB <- c(401, 402, 1, 404, 405, 406, 407, 408, 409, 410)
MineralDrugA <- c(1, 2, 3, 4, 1, 6, 7, 8, 9, 10)
MineralDrugB <- c(11, 12, 13, 1, 15, 16, 17, 18, 19, 20)
BacteriaDrugA <- c(1, 2, 3, 4, 5, 6 , 7, 8, 9, 1)
BacteriaDrugB <- c(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
dummy_id <- c(1001, 2002, 3003, 4004, 5005, 6006, 7007, 8008, 9009, 10101)


Elements <- data.frame(dummy_id, Animal, Plant, Mineral, Bacteria, AnimalDrugA, AnimalDrugB,          
PlantDrugA, PlantDrugB, MineralDrugA, MineralDrugB, BacteriaDrugA, BacteriaDrugB)
ds <- Elements[,order(names(Elements))]
ds  #Got it in alphabetical order... The real data set will be re-ordered chronologically


#Now I want the first occurrence of the word "bronze" for each id
# for each subject 1 through 10.  (That is, "bronze" corresponds to start of disease flare.)
first.bronze <- colnames(ds)[apply(ds,1,match,x="bronze")]
first.bronze

#Now, I want to find the number in the DrugA, DrugB variable that corresponds to the first            
#occurrence of bronze.
#Using the alphabetically ordered data set, the answer should be:
#dummy_id  DrugA  DrugB
#1...      NA   NA
#2...      2    12
#3...     NA    NA
#4...     4     1
#5...     5     6
#6...    NA    NA
#7...    7     17
#8...    8     18
#9...    9     2
#10...    NA    NA
#Note that all first occurrences of "bronze"
# are in Mineral or Bacteria.
#As a first step, join first.bronze to the ds
ds$first.bronze <- first.bronze 
ds

#Make a new ds where those who have an NA for first.bronze are excluded:
ds2 <- ds[complete.cases(ds$first.bronze),]
ds2


# Create a template data frame
out <- data.frame(matrix(nr = 1, nc = 3))
colnames(out) <- c("Form Number", "DrugA", "DrugB")  # Gives correct column names
out

#Then grow the data frame...yes I realize potential slowness of computation
test <- for(i in ds2$first.bronze){
    data <- rbind(colnames(ds2)[grep(i, names(ds2), ignore.case = FALSE, fixed = TRUE)])
    colnames(data) <- c("Form Number", "DrugA", "DrugB")  # Gives correct column names
    out <- rbind(out, data)
}
out

#Then delete the first row of NAs
out <- na.omit(out)
out

#Then add the appropriate dummy_ids
dummy_id <- ds2$dummy_id
out_with_ids <- as.data.frame(cbind(dummy_id, out))
out_with_ids

现在我被困住了。我将 ds2 中的列名称列为 out_with_ids 数据集中的 Drug A、Drug B 的值。我已经彻底搜索过 Stack Overflow,但基于匹配、合并、替换和 data.table 包的解决方案似乎不起作用。

谢谢!

【问题讨论】:

  • 嗨,剪切+粘贴示例 +1。但是,如果您可以进一步简化问题,这将有助于我们更快地发布答案
  • 我会尽量简化:基本上 df1 包含一些变量,其值是在 df2 中找到的变量的名称。我需要将 df1 中这些变量的值替换为 df2 上匹配变量名称下的实际值。

标签: r replace match data.table lookup


【解决方案1】:

我认为这里的问题是数据格式。我可以建议您将其存储在“长”表中,如下所示:

library(data.table)
dt <- data.table(dummy_id = rep(dummy_id, 4),
                 type = rep(c("Animal", "Bacteria", "Mineral", "Plant"), each = 10),
                 name = c(Animal, Bacteria, Mineral, Plant),
                 drugA = c(AnimalDrugA, BacteriaDrugA, MineralDrugA, PlantDrugA),
                 drugB = c(AnimalDrugB, BacteriaDrugB, MineralDrugB, PlantDrugB))

那么过滤和做其他操作就容易多了。例如,

dt[name == "bronze"][order(dummy_id)]

坦率地说,我不确定我是否理解您最终想要实现的目标。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-11-23
    • 1970-01-01
    • 2018-11-12
    • 2021-09-19
    相关资源
    最近更新 更多