【问题标题】:Creating a table in R with several columns with frequency of string matches over time在 R 中创建一个表,其中包含几列随着时间的推移字符串匹配的频率
【发布时间】:2018-07-31 01:30:03
【问题描述】:

我有一个看起来像这样的数据框:

     Day           text          place    gender
Feb 20 2016   #geom and #stats      SP          M
Feb 20 2016   #geom and #stats      SP          F
Feb 20 2016   #bio and #stats       SP          M

我想从“文本”中提取主题标签,然后用这些信息(摘要)构建一个表:

Day          Hashtag    Daily_Freq  %men    %women  Freq_UK Freq_SP
Feb 20 2016   #stats      2              0.5      0.5     1       1
Feb 20 2016   #maths      1              1        0       1       0
Feb 20 2016   #geom       1              0        1       0       1

我不知道该怎么做!谁能帮我?

【问题讨论】:

    标签: r frequency hashtag summary


    【解决方案1】:
    options(stringsAsFactors = FALSE)
    df = read.table(text = " Day           text            place           gender
                            'Feb 20 2016'   '#stats and #maths'      UK          M
                            'Feb 20 2016'   '#geom and #stats'       SP          F", 
                     header = TRUE)
    
    # extract tags
    tags =  lapply(strsplit(df$text, split = "[\\s,\\t]+", perl = TRUE), 
               function(item) item[substr(item, 1, 1)=="#"])
    
    # create list of data.frames 
    long_list1 = lapply(seq_len(NROW(df)), function(i) {
        data.frame(
            Day = df[["Day"]][i],
            Hashtag = tags[[i]],
            place = df[["place"]][i],
            gender = df[["gender"]][i]
        )
    })
    
    # long form - each hashtag on each own row
    long = do.call(rbind, long_list1)
    
    # compute list of data.frames with statistics 
    long_list2 = 
            lapply(
                split(long, list(long$Day, long$Hashtag)), 
                function(item){
                    with(item, data.frame(
                        Day = Day[1],
                        Hashtag = Hashtag[1],
                        Daily_Freq  = NROW(item), 
                        '%men' = mean(gender == "M"),   
                        '%women' = mean(gender == "F"),   
                        Freq_UK  = sum(place == "UK"), 
                        Freq_SP = sum(place == "SP"),
                        check.names = FALSE
    
                    ))
                })
    
    # combine result
    res = do.call(rbind, c(long_list2, make.row.names = FALSE))
    res
    # 
    #         Day Hashtag Daily_Freq %men %women Freq_UK Freq_SP
    # 1 Feb 20 2016   #geom          1  0.0    1.0       0       1
    # 2 Feb 20 2016  #maths          1  1.0    0.0       1       0
    # 3 Feb 20 2016  #stats          2  0.5    0.5       1       1
    

    【讨论】:

      猜你喜欢
      • 2016-09-07
      • 1970-01-01
      • 2016-12-22
      • 2020-09-04
      • 2018-02-01
      • 1970-01-01
      • 2011-10-21
      • 2023-02-02
      • 1970-01-01
      相关资源
      最近更新 更多