【问题标题】:R - re-order columns based on match (template)R - 基于匹配(模板)重新排序列
【发布时间】:2017-06-07 09:39:03
【问题描述】:

所以我有一个看起来像这样的大型数据集:

     V1       V2   V3         V4
1 Sleep Domestic  Eat Child Care
2 Sleep Domestic  Eat       Paid
3 Sleep Domestic  Eat Child Care
4 Sleep      Eat Paid       <NA>

我想做的是reorder基于“模板”的列

["Sleep", "Eat", "Domestic", "Paid", "Child care"] 

得到(输出)

   V1    V2       V3      V4            V5
Sleep   Eat Domestic      NA    Child Care
Sleep   Eat Domestic    Paid            NA
Sleep   Eat Domestic      NA    Child Care
Sleep   Eat       NA    Paid            NA

所以在第 1 列 Sleep,第 2 列 Eat,...

我不知道从哪里开始。 任何想法 ?

数据

x = structure(list(V1 = c("Sleep", "Sleep", "Sleep", "Sleep"), V2 = c("Domestic", 
"Domestic", "Domestic", "Eat"), V3 = c("Eat", "Eat", "Eat", "Paid"
), V4 = c("Child Care", "Paid", "Child Care", NA)), .Names = c("V1", 
"V2", "V3", "V4"), row.names = c(NA, 4L), class = "data.frame")

template = c('Sleep', 'Eat', 'Domestic', 'Paid', 'Child care')

【问题讨论】:

  • 您的案例不匹配 - “Child care”与“Child Care”
  • 我无法理解你的问题,所以让我提出我认为你在问的问题,然后你告诉我哪里错了,好吗?基本上每一列应该代表有值或没有值,例如:[4,'V5'] 应该是“Child Care”(意思是“是”表示儿童保育),或“NA”表示“不”用于儿童保育。并且这些是/否值的顺序应该根据模板在每一行中排序。这是真的吗?
  • @TravisHeeter 嗨,是的,实际上这是另一种看待它的方式。我没有那样想,但是是的。
  • 扩展@TravisHeeter 的评论,类似table(row(x), factor(as.matrix(x), template)) 可能有用

标签: r list sorting


【解决方案1】:

这是tidyverse的选项

library(dplyr)
library(tidyr)
library(tibble)
rownames_to_column(x, 'id') %>% 
       gather(Var, Val, -id, na.rm = TRUE) %>% 
       mutate(Var = factor(Val, levels = template)) %>% 
       spread(Var, Val) %>% 
       select(-id) %>% 
       setNames(., paste0("V", seq_along(template)))
#     V1  V2       V3   V4         V5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid       <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat     <NA> Paid       <NA>

【讨论】:

  • @giacomoV 感谢您的评论。
【解决方案2】:

reshape2 和 dplyr 解决方案。显然不像其他人那么紧凑。这个想法是融化(变高)、排序因子和铸造。

library(reshape2)
library(dplyr)

# make and id column 
x$id <- row.names(x)

# make a tall result id, var, value
tall <- x %>% 
  melt(id.vars="id") %>%
  select(id, value) 

# make an ordered factor with the template
tall$value <- factor(tall$value, levels=template, ordered = TRUE) 

# make wide result with dcast
result <-  tall %>%  
  filter(!is.na(value)) %>%  # drop the NAs 
  mutate(var = value) %>%    # name the column the same as the value
  dcast(id ~ var)            # make into wide format

result
#  id Sleep Eat Domestic Paid Child Care
#1  1 Sleep Eat Domestic <NA> Child Care
#2  2 Sleep Eat Domestic Paid       <NA>
#3  3 Sleep Eat Domestic <NA> Child Care
#4  4 Sleep Eat     <NA> Paid       <NA>

【讨论】:

    【解决方案3】:

    检查每个template 值的rowSums,然后将其重新拼凑起来:

    template <- c("Sleep", "Eat", "Domestic", "Paid", "Child Care")
    # i've fixed this template so the case matches the values for 'Child Care'
    
    data.frame(lapply(
      setNames(template, seq_along(template)),
      function(v) c(NA,v)[(rowSums(x==v,na.rm=TRUE)>0)+1]
    ))
    
    #     X1  X2       X3   X4         X5
    #1 Sleep Eat Domestic <NA> Child Care
    #2 Sleep Eat Domestic Paid       <NA>
    #3 Sleep Eat Domestic <NA> Child Care
    #4 Sleep Eat     <NA> Paid       <NA>
    

    或者使用pmax的替代方法:

    data.frame(
      lapply(
        setNames(template, seq_along(template)), 
        function(v) do.call(pmax, c(replace(x, x != v,NA),na.rm=TRUE)) 
      )
    )
    

    【讨论】:

      猜你喜欢
      • 2019-02-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-06-03
      • 2022-12-01
      • 1970-01-01
      相关资源
      最近更新 更多