【问题标题】:Create count per item by year/decade按年/十年创建每个项目的计数
【发布时间】:2016-06-03 11:03:47
【问题描述】:

我在 data.table 中有如下数据:

> x<-df[sample(nrow(df), 10),]
> x      

>                   Importer                 Exporter       Date

 1:                 Ecuador                  United Kingdom 2004-01-13
 2:                  Mexico                   United States 2013-11-19
 3:               Australia                   United States 2006-08-11
 4:           United States                   United States 2009-05-04
 5:                   India                   United States 2007-07-16
 6:               Guatemala                       Guatemala 2014-07-02
 7:                  Israel                          Israel 2000-02-22
 8:                   India                   United States 2014-02-11
 9:                    Peru                            Peru 2007-03-26
10:                  Poland                          France 2014-09-15

我正在尝试创建摘要,以便给定时间段(比如十年),我可以找到每个国家/地区作为进口商和出口商出现的次数。因此,在上面的示例中,除以十倍数时所需的输出应该是这样的:

Decade    Country.Name    Importer.Count         Exporter.Count

2000      Ecuador         1                      0
2000      Mexico          1                      1
2000      Australia       1                      0
2000      United States   1                      3
.
.
.
2010     United States    0                      2
.
.
.

到目前为止,我已经尝试过 here 帖子所建议的聚合和 data.table 方法,但它们似乎都只是给我计算了每年进口商/出口商的数量(或者十年,因为我更多感兴趣)。

> x$Decade<-year(x$Date)-year(x$Date)%%10
> importer_per_yr<-aggregate(Importer ~ Decade, FUN=length, data=x)
> importer_per_yr

   Decade                      Importer

2   2000                       6
3   2010                       4

考虑到聚合使用公式接口,我尝试添加另一个条件,但得到以下错误:

> importer_per_yr<-aggregate(Importer~ Decade + unique(Importer), FUN=length, data=x)
Error in model.frame.default(formula = Importer ~ Decade +  : 
  variable lengths differ (found for 'unique(Importer)')

有没有办法根据十年和进口商/出口商创建摘要?导入器和导出器的摘要是否在不同的表中无关紧要。

【问题讨论】:

    标签: r data.table


    【解决方案1】:

    我们可以使用data.table 方法来做到这一点,通过分配:= 创建'Decade' 列,然后通过指定melt 将数据从'wide' 格式到'long' 格式通过指定measure 列,重塑它使用dcast 回到“宽”,我们使用fun.aggregate 作为length

    x[, Decade:= year(Date) - year(Date) %%10]
    dcast(melt(x, measure = c("Importer", "Exporter"), value.name = "Country"), 
                           Decade + Country~variable, length)
    #     Decade        Country Importer Exporter
    # 1:   2000      Australia        1        0
    # 2:   2000        Ecuador        1        0
    # 3:   2000          India        1        0
    # 4:   2000         Israel        1        1
    # 5:   2000           Peru        1        1
    # 6:   2000 United Kingdom        0        1
    # 7:   2000  United States        1        3
    # 8:   2010         France        0        1
    # 9:   2010      Guatemala        1        1
    #10:   2010          India        1        0
    #11:   2010         Mexico        1        0
    #12:   2010         Poland        1        0
    #13:   2010  United States        0        2
    

    【讨论】:

      【解决方案2】:

      我认为 with 将与基础 R 中的 aggregate 一起使用:

      my.data <- read.csv(text = '
              Importer,             Exporter,           Date
               Ecuador,       United Kingdom,     2004-01-13
                Mexico,        United States,     2013-11-19
             Australia,        United States,     2006-08-11
         United States,        United States,     2009-05-04
                 India,        United States,     2007-07-16
             Guatemala,            Guatemala,     2014-07-02
                Israel,               Israel,     2000-02-22
                 India,        United States,     2014-02-11
                  Peru,                 Peru,     2007-03-26
                Poland,               France,     2014-09-15
      ', header = TRUE, stringsAsFactors = TRUE, strip.white = TRUE)
      
      my.data$my.Date <- as.Date(my.data$Date, format = "%Y-%m-%d")
      
      my.data <- data.frame(my.data,
                       year  = as.numeric(format(my.data$my.Date, format = "%Y")),
                       month = as.numeric(format(my.data$my.Date, format = "%m")),
                       day   = as.numeric(format(my.data$my.Date, format = "%d")))
      
      my.data$my.decade <- my.data$year - (my.data$year %% 10)
      
      importer.count <- with(my.data, aggregate(cbind(count = Importer) ~ my.decade + Importer, FUN = function(x) { NROW(x) }))
      exporter.count <- with(my.data, aggregate(cbind(count = Exporter) ~ my.decade + Exporter, FUN = function(x) { NROW(x) }))
      
      colnames(importer.count) <- c('my.decade', 'country', 'importer.count')
      colnames(exporter.count) <- c('my.decade', 'country', 'exporter.count')
      
      my.counts <- merge(importer.count, exporter.count, by = c('my.decade', 'country'), all = TRUE)
      
      my.counts$importer.count[is.na(my.counts$importer.count)] <- 0
      my.counts$exporter.count[is.na(my.counts$exporter.count)] <- 0
      
      my.counts
      
      #    my.decade        country importer.count exporter.count
      # 1       2000      Australia              1              0
      # 2       2000        Ecuador              1              0
      # 3       2000          India              1              0
      # 4       2000         Israel              1              1
      # 5       2000           Peru              1              1
      # 6       2000  United States              1              3
      # 7       2000 United Kingdom              0              1
      # 8       2010      Guatemala              1              1
      # 9       2010          India              1              0
      # 10      2010         Mexico              1              0
      # 11      2010         Poland              1              0
      # 12      2010  United States              0              2
      # 13      2010         France              0              1
      

      【讨论】:

        猜你喜欢
        • 2021-04-13
        • 1970-01-01
        • 1970-01-01
        • 2019-12-10
        • 2020-09-17
        • 1970-01-01
        • 2020-02-08
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多