【问题标题】:Why does melt return NA column in R?为什么熔体在 R 中返回 NA 列?
【发布时间】:2021-05-11 15:25:02
【问题描述】:

我在 R 中有以下列表 df

structure(list(disease = structure(c(1L, 1L), .Label = "Barcelona", class = "factor"), 
    `<18` = structure(list(0.193103448275862, 
        0.0445344129554656), .Names = c(NA_character_, NA_character_
    )), `19-25` = structure(list(0.0413793103448276, 
        0.345748987854251), .Names = c(NA_character_, NA_character_
    )), `26-64` = structure(list(0.448275862068966, 0.167611336032389), .Names = c(NA_character_, 
    NA_character_)), `46-64` = structure(list(0.0344827586206897, 
        0.00647773279352227), .Names = c(NA_character_, NA_character_
    )), `>65` = structure(list(0.282758620689655, 
        0.435627530364373), .Names = c(NA_character_, NA_character_
    )), type = structure(1:2, .Label = c("Clinical Trial", "Real-World"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

我想重新排列数据框,以便我可以使用melt 按城市、单位和年龄组获取每个值。但是,我得到一个额外的列作为输出:

melt(df)
           city           type           variable      value          NA
1  Barcelona       flat                  <18           0.19310345 0.044534413
2  Barcelona       house                 <18           0.19310345 0.044534413
3  Barcelona       flat                  19 - 25       0.04137931 0.345748988
4  Barcelona       house                 19 - 25       0.04137931 0.345748988
5  Barcelona       flat                  26 - 45       0.44827586 0.167611336
6  Barcelona       house                 26 - 45       0.44827586 0.167611336
7  Barcelona       flat                  46 - 64       0.03448276 0.006477733
8  Barcelona       house                 46 - 64       0.03448276 0.006477733
9  Barcelona       flat                  > 65          0.28275862 0.435627530
10 Barcelona       house                 > 65          0.28275862 0.435627530

有什么方法可以不使用NA 列并在value 列中获取唯一值?

【问题讨论】:

  • 能否请您重现您的问题?当我读入您在问题中提供的数据并在其上运行reshape2::melt() 时,我得到了没有NA 列的预期输出。 (请使用dput() 共享数据,以便复制/粘贴,读取数据时必须清理换行符很烦人。)
  • 啊,很高兴你使用了dput()。问题是您的列是lists,而不是numeric

标签: r reshape reshape2 melt data-wrangling


【解决方案1】:

问题是您的度量列是list 类,而不是numeric 类。如果我们将它们转换为数字,melt 将正常工作。 (我展示了一种方法,但最好在您的工作流中更早地进行,并首先防止将列创建为列表......如果我的代码适用于您的代码,这绝对是您应该做的样本数据在较大数据上遇到问题。tidyr::unnest 可能会在这种情况下提供帮助。)

sapply(df, class)
#  disease      <18    19-25    26-64    46-64      >65     type 
# "factor"   "list"   "list"   "list"   "list"   "list" "factor" 

list_cols = sapply(df, is.list)

df[list_cols] = lapply(df[list_cols], unlist)

reshape2::melt(df, id.vars = c("disease", "type"))
#      disease           type variable       value
# 1  Barcelona Clinical Trial      <18 0.193103448
# 2  Barcelona     Real-World      <18 0.044534413
# 3  Barcelona Clinical Trial    19-25 0.041379310
# 4  Barcelona     Real-World    19-25 0.345748988
# 5  Barcelona Clinical Trial    26-64 0.448275862
# 6  Barcelona     Real-World    26-64 0.167611336
# 7  Barcelona Clinical Trial    46-64 0.034482759
# 8  Barcelona     Real-World    46-64 0.006477733
# 9  Barcelona Clinical Trial      >65 0.282758621
# 10 Barcelona     Real-World      >65 0.435627530

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-01-01
    • 1970-01-01
    • 2019-08-21
    • 1970-01-01
    • 2021-11-19
    • 1970-01-01
    • 2013-07-28
    • 1970-01-01
    相关资源
    最近更新 更多