【问题标题】:How to sum and weight certain rows in a dataframe in R?如何对R中数据框中的某些行进行求和和加权?
【发布时间】:2020-03-31 22:23:15
【问题描述】:

我目前有一个data.frame,如下:

  State      Area_name    LessHSD        HSD    SomeCAD   BDorMore P_LessHSD P_HSD ZIP
1    US  United States 26,948,057 59,265,308 63,365,655 68,867,051      12.3  27.1 1009
1913 NY Richmond County    37,675    101,738     81,014    108,326      11.5  30.9 36085
2    AL        Alabama    470,043  1,020,172    987,148    822,595      14.2  30.9 1020
3    AL Autauga County      4,204     12,119     10,552     10,291      11.3  32.6 7080
1873 NY Bronx County      258,956    255,427    226,620    183,134       28   27.6 36005
1911 NY Queens County     303,881    454,105    369,271    518,999      18.5  27.6 36081  
4    AL Baldwin County     14,310     40,579     46,025     46,075       9.7  27.6 1088
1901 NY New York County   162,237    155,048    171,461    758,325        13  12.4 36061
5    AL Barbour County      4,901      6,486      4,566      2,220      27.0  35.7 20012
1894 NY Kings County      326,469    455,299 3   47,052    648,461      18.4  25.6 36047
6    AL    Bibb County      2,650      7,471      3,846      1,813      16.8  47.3 9012

我想总结LessHSD,HSD,SomeCAD 列的 5 个纽约市 burroughs (ZIP 36005,36047,36061,36081,36085) 数据,并用这些总和创建一个新行与Area_name = New York Proper(见下面的输出)。

对于P_LessHSDP_HSD 列,我想将这些变量按人口加权到一个新行中。我已经从另一组计算了自己的权重。我想将里士满县乘以0.05669632,将布朗克斯县乘以0.17051732,将皇后区乘以0.27133878,将纽约县乘以0.19392188,将国王乘以0.3075256

实际上,对于 P_LessHSD 列,这看起来像:

11.5*0.05669632 
+ 28*0.17051732
+ 18.5*0.27133878 
+ 13*0.19392188 
+ 18.4*0.3075256

给出 18.6(四舍五入到十位)。这也适用于 P_HSD。我希望新行的 ZIP 为 55555。我还想删除 Burroughs 的所有 5 行。

输出应该是:

  State      Area_name    LessHSD        HSD    SomeCAD   BDorMore P_LessHSD P_HSD ZIP
1    US  United States 26,948,057 59,265,308 63,365,655 68,867,051      12.3  27.1 1009
2    AL        Alabama    470,043  1,020,172    987,148    822,595      14.2  30.9 1020
3    AL Autauga County      4,204     12,119     10,552     10,291      11.3  32.6 7080  
4    AL Baldwin County     14,310     40,579     46,025     46,075       9.7  27.6 1088
5    AL Barbour County      4,901      6,486      4,566      2,220      27.0  35.7 20012
6    AL    Bibb County      2,650      7,471      3,846      1,813      16.8  47.3 9012
7    NY New York Proper   1089218    1421617     895418    2217245      18.6  24.2 55555

【问题讨论】:

  • 请提供您的数据输入
  • 感谢阅读!这是什么意思?

标签: r dataframe


【解决方案1】:

可能有帮助。

它使用dplyr 包。你需要先安装它

install.packages("dplyr")
library(dplyr)

DF %>% 
  filter(!(ZIP %in% c(36005,36047,36061,36081,36085))) %>%
  bind_rows(
        DF %>%
          filter(ZIP %in% c(36005,36047,36061,36081,36085)) %>%
          mutate(wg = case_when(Area_name == "Richmond County" ~ 0.05669632, 
                                Area_name == "Bronx County" ~ 0.17051732,
                                Area_name == "Queens County" ~ 0.27133878,
                                Area_name == "New York County" ~ 0.19392188, 
                                Area_name == "Kings County" ~ 0.3075256,
                                TRUE ~ 0),
                 P_LessHSD = wg*P_LessHSD,
                 P_HSD = wg*P_HSD,
                 Area_name = "New York Proper") %>%
          group_by(State, Area_name) %>%
          summarize_at(vars(LessHSD:P_HSD), sum) %>%
          mutate(ZIP = 55555) )

# # A tibble: 7 x 9
#   State Area_name        LessHSD      HSD  SomeCAD BDorMore P_LessHSD P_HSD   ZIP
#   <chr> <chr>              <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl> <dbl>
# 1 US    United States   26948057 59265308 63365655 68867051      12.3  27.1  1009
# 2 AL    Alabama           470043  1020172   987148   822595      14.2  30.9  1020
# 3 AL    Autauga County      4204    12119    10552    10291      11.3  32.6  7080
# 4 AL    Baldwin County     14310    40579    46025    46075       9.7  27.6  1088
# 5 AL    Barbour County      4901     6486     4566     2220      27    35.7 20012
# 6 AL    Bibb County         2650     7471     3846     1813      16.8  47.3  9012
# 7 NY    New York Proper  1089218  1421617  1195418  2217245      18.6  24.2 55555

PS。它为someCAD 提供了不同的结果。

【讨论】:

    猜你喜欢
    • 2018-10-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-23
    • 2016-02-15
    • 1970-01-01
    • 2021-11-14
    相关资源
    最近更新 更多