【问题标题】:R unlisting nested row valuesR取消列出嵌套行值
【发布时间】:2016-08-22 10:55:21
【问题描述】:

您好,我有一个数据框,其中包含多个值作为某些行的列表。

var1
A8
A9
c("A1", "A1", "D3")
c("A1", "D1")
c("D1", "D1")
c("D2", "A2")
c("D5", "A1")

我试图通过保留第一个观察结果来“取消列出”具有多个值的行。我一直在玩 unlist 命令,但没有任何运气。完成此任务的最简单方法是什么。

【问题讨论】:

  • 可以提供dput(head(df))吗?
  • 如果这来自于读取一个列数参差不齐的文件,那么让我们回到read.csv() 命令并请dput()我们文件的标题+前几行。
  • 结构(列表(Var1 =结构(1:6,.Label = c(“”,“B1”,“B2”,“B3”,“B4”,“B5”,“B6 ", "B7", "B8", "c(\"B1\", \"B1\")", "c(\"B3\", \"B4\")", "c(\"B4 \", \"B2\")"), class= "因子"), Freq = c(2538L, 633L, 458L, 328L, 135L, 56L)), .Names = c("Var1", "Freq") , row.names = c(NA, 6L), class= "data.frame")
  • 结构显示你有一个常规的data.frame
  • 您的列是因子列,所以@akrun 的建议不起作用(无论如何,这必须是其他问题的重复答案)。

标签: r list dataframe multiple-value


【解决方案1】:

如 cmets 所示,该列必须首先从当前强制(转换)为 characterfactor 类使用as.character

这可以通过使用参数stringsAsFactors=FALSE在文件读取阶段避免

分割每一行并只保留第一个值可以通过以下方式完成:

copyDF$Var1 = sapply(strsplit(copyDF$Var1,","),head,1)

让我们知道这是否有效:

#user input data with factor class
userDF = structure(list(Var1 = structure(1:6, .Label = c("", "B1", "B2", "B3", "B4", "B5", "B6", "B7", "B8", "c(\"B1\", \"B1\")", "c(\"B3\", \"B4\")", "c(\"B4\", \"B2\")"), class = "factor"), Freq = c(2538L, 633L, 458L, 328L, 135L, 56L)), .Names = c("Var1", "Freq"), row.names = c(NA, 6L), class = "data.frame")
userDF
#  Var1 Freq
#1      2538
#2   B1  633
#3   B2  458
#4   B3  328
#5   B4  135
#6   B5   56

str(userDF)
#   'data.frame':   6 obs. of  2 variables:
#$ Var1: Factor w/ 12 levels "","B1","B2","B3",..: 1 2 3 4 5 6
#$ Freq: int  2538 633 458 328 135 56

#Since userDF had no multiple values, adding them here
newDF = structure(list(Var1 = structure(1:6, .Label = c("B1,B2,B3", "B4,B5", "B6,B7,B8", "B3", "B4", "B5", "B6", "B7", "B8", "c(\"B1\", \"B1\")", "c(\"B3\", \"B4\")", "c(\"B4\", \"B2\")"), class = "factor"), Freq = c(2538L, 633L, 458L, 328L, 135L, 56L)), .Names = c("Var1", "Freq"), row.names = c(NA, 6L), class = "data.frame")
newDF
#      Var1 Freq
#1 B1,B2,B3 2538
#2    B4,B5  633
#3 B6,B7,B8  458
#4       B3  328
#5       B4  135
#6       B5   56


#Make a copy of the dataset
copyDF = newDF

#Var1 is of class factor which is not amenable for string operations,hence convert to character class
copyDF$Var1 = as.character(copyDF$Var1)

#Split each row, unlist and retain only first value

copyDF$Var1 = sapply(strsplit(copyDF$Var1,","),head,1)

copyDF
#  Var1 Freq
#1   B1 2538
#2   B4  633
#3   B6  458
#4   B3  328
#5   B4  135
#6   B5   56

【讨论】:

    猜你喜欢
    • 2023-04-06
    • 1970-01-01
    • 2021-03-20
    • 2015-08-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-10-24
    • 2022-01-10
    相关资源
    最近更新 更多