【问题标题】:factorize all column by their levels with how many times they occur in the attribute of my data set根据它们在我的数据集属性中出现的次数来分解所有列
【发布时间】:2025-12-07 14:30:01
【问题描述】:

this is my data set on which i want to complete factorize my data set with each count levels of the every attribute of file 这是我的代码:

    library(dplyr)
    #read File
    h_Data<-read.csv(file.choose())
    #store university attribute
    h_Data<-h_Data$University

    #Count each levels factor of data of 
    h_DataDF <- data.frame(h_Data)
    h_dataLevels<-h_DataDF %>% 
    group_by(h_Data) %>%
    summarise(no_rows = length(h_Data))
    h_dataLevels  

    #missing of data
    h_DataMissing<-sum(is.na(h_Data))
    h_DataMissing

    #percentage of each level of factor
    h_DataPer<-prop.table(table(h_Data))*100

    #table format
    h_DataTable <-data.frame(levels_data=h_dataLevels,levels_perc=h_DataPer,missing_data=h_DataMissing)
    h_DataTable

我想总结为: levels_University no.of_timesLevels Percentage_of_Level MissingAttributes IBA 4 57.14 0 库 1 14.28 0 UIT 2 28.57 0

【问题讨论】:

  • 请让这个问题可重现。这包括示例代码(包括列出非基础 R 包)、示例数据(例如,dput(head(x)))和预期输出。参考:*.com/questions/5963269*.com/help/mcve*.com/tags/r/info。由于您提到了一个“文件”,可能包括文件中的前“n”行,其中“n”是基于平衡相对重要性、充分性和紧凑性来定义的。
  • 标题应该是一个非常简短的问题摘要,而不是问题本身,首先...

标签: r machine-learning deep-learning analytics data-mining


【解决方案1】:

如果没有一些样本数据和所需的输出,很难确切知道您想要什么,但这里有一些代码采用数据框,并且对于作为因子的每一列,返回一个数据框,列出每个因子级别的观察数。

## dummy data
df <- data.frame(Sex = c("m", "f", "m","f"), department = c("bs", "el", "bs", "se"), numbers = c(1,2,3,4))

## function that takes a column of data
## and returns factor counts if the column is a factor
countFactors <- function(col){
     if(is.factor(col)){
          fct_count(col)
     }else{
          NULL
     }
}

## use purrr::map to iterate through the columns of the
## dataframe and apply the function
df %>% 
     map(~ countFactors(.)) %>% 
     compact()

【讨论】: