【问题标题】:Predominant calculation for Character fields字符字段的主要计算
【发布时间】:2016-04-21 22:15:54
【问题描述】:

我正在尝试遍历我的列名,其中 type = character 并返回一个数据框,其中包含每个字符列的所有主要值,按 ID 字段分组。

有没有办法在某种循环中复制以下代码?:

      DF_Characters <- DF_Characters[,sapply(dfr,is.character)]

##Predominance Column1##
      Predom <- select(DF_Characters, Group_ID, Column_1)
      Predom <- group_by(Predom,Group_ID, Column_1)
      Predom <- summarise(Predom,
                             CountPredom = n()
                             )
      Predom <- arrange(Predom,Group_ID, desc(CountPredom) )
      Predom <- data.table(Predom, key="Group_ID")
      Predominant_Column_1 <- Predom[,head(.SD,1),by=Group_ID]


##Predominant Column_2##
      Predom <- select(DF_Characters, Group_ID, Column_2)
      Predom <- group_by(Predom,Group_ID, Column_2)
      Predom <- summarise(Predom,
                             CountPredom = n()
                             )
      Predom <- arrange(Predom,Group_ID, desc(CountPredom) )
      Predom <- data.table(Predom, key="Group_ID")
      Predominant_Column_2 <- Predom[,head(.SD,1),by=Group_ID]

##Merge final table##
      Merged <- merge(Predominant_Column_1 ,Predominant_Column_2 ,by="Group_ID")

另外为了澄清我的问题,我添加了一个虚拟表: DF_Character_table

结果应该是这样的 Result Table

因此,对于第 1 组,Petre 是第 1 列中的主要名称,而汽车是主要的出行方式。应分别计算第 1 列和第 2 列的优势。

谢谢

【问题讨论】:

  • 所有列名都是字符。您是指变量/列的模式/类型吗?
  • 是的,我的意思是列中的变量是字符类型的:即 > class(DF_Characters$Column_1) [1] "character"

标签: r dplyr


【解决方案1】:

这可能不是最好的解决方案,但它确实有效。

 ##########Predominant Calculations
  #Character fields
  DF_Characters <- as.data.frame(dfr)
  DF_Characters <- DF_Characters[,sapply(dfr,is.character)]

  # Field names without the Group by id
  CharactersToMerge <- c(names(DF_Characters))

  #Add Groupby ID to Character fields
  Character_Field_List <- c("Groupby_ID", names(DF_Characters))
  DF_Characters <- subset(dfr,select = Character_Field_List)

  #Column Names to loop through
  DF_FieldsToMerge <- subset(dfr,select = CharactersToMerge)


  # Predominant Table
  fin_table <- DF_Characters %>% group_by(Groupby_ID) %>%
                  tally(sort = TRUE) #Count observations

  # Loop and merge tables to Predominant Table
  for(i in names(DF_FieldsToMerge)){

  temp_table <- DF_Characters %>% group_by_("Groupby_ID", i ) %>%
                      tally(sort = TRUE)
  temp_table  <- temp_table[,head(.SD,1),by=Groupby_ID] #Remove ties
  temp_table  <- subset(temp_table,select = c("Groupby_ID", i)) #remove counts

  fin_table <- merge(fin_table, temp_table, by="Groupby_ID")
  }

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-07-19
    • 1970-01-01
    • 1970-01-01
    • 2019-07-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多