【问题标题】:R code (Rstats) calculating unemployment rate based off columns in long form dataR 代码 (Rstats) 根据长格式数据中的列计算失业率
【发布时间】:2021-04-23 06:36:48
【问题描述】:

我正在尝试根据以下数据计算失业率并将其作为新行添加到数据表中。我想根据日期将失业人数除以劳动力,并将每个数据点添加为一行。

基本上,我正在尝试摆脱这种情况

date series_1 value
2021-01-01 labourforce 13793
2021-02-01 labourforce 13812
2021-03-01 labourforce 13856
2021-01-01 unemployed 875
2021-02-01 unemployed 805
2021-03-01 unemployed 778

到这里

date series_1 value
2021-01-01 labourforce 13793
2021-02-01 labourforce 13812
2021-03-01 labourforce 13856
2021-01-01 unemployed 875
2021-02-01 unemployed 805
2021-03-01 unemployed 778
2021-01-01 unemploymentrate 6.3
2021-02-01 unemploymentrate 5.8
2021-03-01 unemploymentrate 5.6

到目前为止,这是我的代码。我知道最后一行是错的?欢迎任何建议或想法!

longdata %>% 
  group_by(date) %>%
  summarise(series_1 = 'unemploymentrate',
  value = series_1$unemployed/series_1$labourforce))

【问题讨论】:

  • 看起来您可能会从更整洁的数据中受益。如果您的 data.frame 包含 datelabourforceunemployed 列,您可以轻松添加另一个名为 unempleymentrate 的列。完成此操作后,您仍然可以使用 reshape2::melt() 融化该 data.frame 以将其放入您发布的表单中

标签: r group-by summarize


【解决方案1】:

从每天开始,您可以获得'unemployed''labourforce' 的比率,并将其作为新行添加到您的原始数据集中。

library(dplyr)

df %>% 
  group_by(date) %>%
  summarise(value = value[series_1 == 'unemployed']/value[series_1 == 'labourforce'] * 100, 
            series_1 = 'unemploymentrate') %>%
  bind_rows(df) %>%
  arrange(series_1)

#   date          value series_1        
#  <chr>         <dbl> <chr>           
#1 2021-01-01 13793    labourforce     
#2 2021-02-01 13812    labourforce     
#3 2021-03-01 13856    labourforce     
#4 2021-01-01   875    unemployed      
#5 2021-02-01   805    unemployed      
#6 2021-03-01   778    unemployed      
#7 2021-01-01     6.34 unemploymentrate
#8 2021-02-01     5.83 unemploymentrate
#9 2021-03-01     5.61 unemploymentrate

【讨论】:

    【解决方案2】:

    试试:

    library(dplyr)
    library(tidyr)
    
     
      df %>% 
      pivot_wider(names_from = series_1, values_from = value) %>% 
      mutate(unempolymentrate = round(unemployed*100/labourforce, 2)) %>% 
        pivot_longer(-1, names_to = "series_1", values_to = "value") %>%
        mutate(series_1 = factor(series_1, levels = c("labourforce", "unemployed", "unempolymentrate"))) %>% 
        arrange(series_1, date)
    
    #> # A tibble: 9 x 3
    #>   date       series_1            value
    #>   <chr>      <fct>               <dbl>
    #> 1 2021-01-01 labourforce      13793   
    #> 2 2021-02-01 labourforce      13812   
    #> 3 2021-03-01 labourforce      13856   
    #> 4 2021-01-01 unemployed         875   
    #> 5 2021-02-01 unemployed         805   
    #> 6 2021-03-01 unemployed         778   
    #> 7 2021-01-01 unempolymentrate     6.34
    #> 8 2021-02-01 unempolymentrate     5.83
    #> 9 2021-03-01 unempolymentrate     5.61
    

    reprex package (v2.0.0) 于 2021-04-23 创建数据

    df <- structure(list(date = c("2021-01-01", "2021-02-01", "2021-03-01", 
                                  "2021-01-01", "2021-02-01", "2021-03-01"), series_1 = c("labourforce", 
                                                                                          "labourforce", "labourforce", "unemployed", "unemployed", "unemployed"
                                  ), value = c(13793L, 13812L, 13856L, 875L, 805L, 778L)), class = "data.frame", row.names = c(NA, 
                                                                                                                               -6L))
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-04-24
      相关资源
      最近更新 更多