【问题标题】:Produce two columns of word combinations from one column of words where ID column values are equal从 ID 列值相等的一列单词生成两列单词组合
【发布时间】:2018-01-17 18:27:02
【问题描述】:

我正在尝试准备一个数据框以输入 forceNetwork 函数

这是我的数据示例:

structure(list(Case.Number = c("127967", "127967", "127967", 
"127967", "141330", "141330", "141330", "141330", "141240", "141240", 
"141240"), Word = c("account", "want", "membership", "sort", 
"unhappi", "vr", "info", "miss", "csrf", "unhappi", "dissatisfi"
)), .Names = c("Case.Number", "Word"), class = c("data.table", 
"data.frame"), row.names = c(NA, -11L))

对于每个案例编号的单词,我想生成一个数据框,其中包含所有可能(且唯一)的两个单词组合的两列,如下所示,同一列没有重复的组合(包括倒序),也没有同一个词

127967 account want
127967 account membership
127967 account sort
127967 want    membership
127967 want    sort
141330 unhappi vr
141330 unhappi info...

excluding
141330 unhappi unhappi

我尝试了以下方法来获得组合:

source <- c("remove")
target <- c("remove")
ID <- c("remove")
df <- data.frame(ID = c("remove"), source = c("remove"), target = c("remove"))

for(i in unique(tbl$Case.Number)){
  for (r in grep(i, tbl$Case.Number)) {
    if(r < max(grep(i, tbl$Case.Number))){
      ID <- i
      source <- tbl$Word[r]
      target <- tbl$Word[r+1]
      rbind(df, cbind(ID, source,target))
    }

  }

}

View(df) 

但它不起作用。

有没有更清洁的方法?

【问题讨论】:

    标签: networkd3 r networkd3


    【解决方案1】:

    自加入然后过滤:

    setkey(dd, Case.Number)
    dd[dd, allow.cartesian = TRUE][Word < i.Word]
    #     Case.Number       Word     i.Word
    #  1:      127967    account       want
    #  2:      127967 membership       want
    #  3:      127967       sort       want
    #  4:      127967    account membership
    #  5:      127967    account       sort
    #  6:      127967 membership       sort
    #  7:      141240       csrf    unhappi
    #  8:      141240 dissatisfi    unhappi
    #  9:      141240       csrf dissatisfi
    # 10:      141330       info    unhappi
    # 11:      141330       miss    unhappi
    # 12:      141330    unhappi         vr
    # 13:      141330       info         vr
    # 14:      141330       miss         vr
    # 15:      141330       info       miss
    

    【讨论】:

    • 简单,完全符合我的要求。谢谢
    【解决方案2】:

    更新

    使用tidyr::expand...

    df <- read.table(header = T, stringsAsFactors = F, text = "
    Case.Number Word
    127967    account
    127967       want
    127967 membership
    127967       sort
    141330    unhappi
    141330         vr
    141330       info
    141330       miss
    141240       csrf
    141240    unhappi
    141240 dissatisfi
    ")
    
    library(dplyr)
    library(tidyr)
    
    df %>% 
      group_by(Case.Number) %>% 
      expand(Word, i.Word = Word) %>% 
      filter(Word < i.Word)
    

    这是一种tidyverse 的处理方式(比下面的原始方式更简洁,利用@Gregor 出色的简单过滤方法)...

    df <- read.table(header = T, stringsAsFactors = F, text = "
    Case.Number Word
    127967    account
    127967       want
    127967 membership
    127967       sort
    141330    unhappi
    141330         vr
    141330       info
    141330       miss
    141240       csrf
    141240    unhappi
    141240 dissatisfi
    ")
    
    library(dplyr)
    library(tidyr)
    
    df %>% 
      group_by(Case.Number) %>% 
      mutate(i.Word = Word) %>% 
      complete(Word, i.Word) %>% 
      filter(Word < i.Word)
    
    # A tibble: 15 x 3
    # Groups: Case.Number [3]
       Case.Number Word       i.Word    
             <int> <chr>      <chr>     
     1      127967 account    membership
     2      127967 account    sort      
     3      127967 account    want      
     4      127967 membership sort      
     5      127967 membership want      
     6      127967 sort       want      
     7      141240 csrf       dissatisfi
     8      141240 csrf       unhappi   
     9      141240 dissatisfi unhappi   
    10      141330 info       miss      
    11      141330 info       unhappi   
    12      141330 info       vr        
    13      141330 miss       unhappi   
    14      141330 miss       vr        
    15      141330 unhappi    vr
    

    这是tidyverse 的做法(如果有点复杂)...

    df <- read.table(header = T, stringsAsFactors = F, text = "
    Case.Number Word
    127967    account
    127967       want
    127967 membership
    127967       sort
    141330    unhappi
    141330         vr
    141330       info
    141330       miss
    141240       csrf
    141240    unhappi
    141240 dissatisfi
    ")
    
    library(dplyr)
    library(tidyr)
    
    as_tibble(df) %>% 
      group_by(Case.Number) %>% 
      mutate(Word = list(as_data_frame(t(combn(unlist(Word), 2))))) %>% 
      unique() %>% 
      unnest(Word)
    

    如果您顺序运行以下命令以查看它们的作用,则更容易理解。 combn 具有将向量扩展为所有可能组合的魔力。

    vec <- c("account", "want", "membership", "sort")
    combn(vec, 2)
    t(combn(vec, 2))
    as_data_frame(t(combn(vec, 2)))
    

    【讨论】:

      猜你喜欢
      • 2016-08-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-12-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多