【问题标题】:Splitting a column, containing comma-separated string values, into new header columns in R将包含逗号分隔的字符串值的列拆分为 R 中的新标题列
【发布时间】:2018-02-24 06:53:30
【问题描述】:

我有一个数据框,其中一列包含字符串,用逗号分隔。我想知道是否有一种有效的方法可以将这些逗号分隔的值转换为新的列标题,如果这些新列值是原始行的一部分,则将它们设为二进制。我的数据样本可以复制如下:

data <- structure(list(id = c(6901257L, 6304928L, 7919400L), amenities = 
c("Wireless Internet,Air conditioning,Kitchen,Heating,Family/kid 
friendly,Essentials,Hair dryer,Iron,translation missing: 
en.hosting_amenity_50",  "Wireless Internet,Air 
conditioning,Kitchen,Heating,Family/kid friendly,Washer,Dryer,Smoke 
detector,Fire extinguisher,Essentials,Shampoo,Hangers,Hair 
dryer,Iron,translation missing: en.hosting_amenity_50",  "TV,Cable 
TV,Wireless Internet,Air 
conditioning,Kitchen,Breakfast,Buzzer/wireless 
intercom,Heating,Family/kid friendly,Smoke detector,Carbon monoxide 
detector,Fire extinguisher,Essentials,Shampoo,Hangers,Hair 
dryer,Iron,Laptop friendly workspace,translation missing: 
en.hosting_amenity_50" )), .Names = c("id", "amenities"), class = 
"data.frame", row.names = c(NA,  3L))

我有一个低效的方法来产生我的结果,就是把数据变成长格式,然后在reshape2中使用dcast。这种低效的方法可以通过以下方式重现:

data.long <- data %>%
mutate(amenities = strsplit(as.character(amenities), ",")) %>%
unnest(amenities)

data.long$amenities.value <- 1

data.wide <- reshape2::dcast(data.long, id ~ amenities, value.var = 
"amenities.value") #desired result

有没有更有效的方法从原始数据结构中得到想要的结果?

【问题讨论】:

    标签: r string reshape


    【解决方案1】:

    这是使用库 splitstackshape 的一种方法:

    library(splitstackshape) 
    library(tidyverse)
    
    cSplit(df,  "amenities", sep = ",", direction = "long") %>%
      mutate(value = 1) %>%
      spread(amenities, value) -> df.wide
    
    all.equal(df.wide, data.wide)
    #TRUE
    

    根据@A5C1D2H2I1M1N2O1R2T1,一个更密集、更快的解决方案是

    cSplit_e(data, "amenities", ",", mode = "binary", type = "character", drop = TRUE)
    

    【讨论】:

    • 您的方法与他们目前所做的并没有太大的不同。相反,我建议cSplit_e(data, "amenities", ",", mode = "binary", type = "character", drop = TRUE)
    • @A5C1D2H2I1M1N2O1R2T1 感谢您的建议,更新了答案。如果您想将其作为单独的答案发布,请这样做,我将删除我的。
    • 不需要。当有人发现对我有用的“splitstackshape”包时总是很高兴:-)
    【解决方案2】:

    仅使用 tidyverse

    library(tidyverse)
    data %>% 
      separate_rows(amenities, sep = ",") %>% 
      table() %>% 
      data.frame() %>% 
      spread(amenities,Freq)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多