【问题标题】:How to vectorize length-frequency calculation?如何向量化长度频率计算?
【发布时间】:2023-05-07 03:18:01
【问题描述】:

目前我有一个相当长的代码,其中有一个 for 循环计算数据集不同成熟度下各种长度的频率,我想对代码进行矢量化/找到一个更优雅的解决方案,但到目前为止我已经无法弄清楚如何做到这一点。频率计算比较简单: (count of occurances of a specific length at a certain maturity/total number of females or males)*100

示例数据:

   Species Sex Maturity    Length
1     HAK   M        1         7
2     HAK   M        2         24
3     HAK   F        2         10
4     HAK   M        3         25
5     HAK   F        5         25
6     HAK   F        4         12

我目前正在使用的代码:

reps <- seq(min(Length), max(Length), by = 1)
m1      <- m2 <- m3 <- m4 <- m5 <- rep(NA, length(reps))
f1      <- f2 <- f3 <- f4 <- f5 <- rep(NA, length(reps))
# Makes vectors for each maturity stage for both sexes 
# same length as the reps vector filled with NA for the loop:
# Loop:

for (i in 1:length(reps)) # repeats for each value of the x axis

{

        m1[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 1])/total.m*100
        m2[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 2])/total.m*100
        m3[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 3])/total.m*100
        m4[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 4])/total.m*100
        m5[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 5])/total.m*100
        f1[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 1])/total.f*100
        f2[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 2])/total.f*100
        f3[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 3])/total.f*100
        f4[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 4])/total.f*100
        f5[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 5])/total.f*100

}
#Stitching together the output of the  loop.
males_all<-rbind(m1, m2, m3, m4, m5)
females_all<-rbind(f1, f2, f3, f4, f5)

这是我通常从循环中得到的输出:

 mat       X8       X9       X10       X11      X12       X14       X15
1  m1 0.104712 0.104712 0.6282723 1.3612565 1.884817 0.1047120 0.2094241
2  m2 0.000000 0.000000 0.3141361 0.8376963 2.198953 2.4083770 1.3612565
3  m3 0.000000 0.000000 0.0000000 0.0000000 0.104712 0.2094241 0.1047120
4  m4 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
5  m5 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.0000000 0.2094241

mat 之后的列是长度,为简洁起见,我没有将它们全部包括在内,它们最多可达 30 左右。 females_all 看起来一样,只是 f1, f2 等在 mat 列中。

【问题讨论】:

    标签: r for-loop vectorization


    【解决方案1】:

    据我所知,这就是你想要的:

    library(dplyr)
    counts = count(df, Sex, Maturity, Length)
    totals = count(df, Sex, name = "total")
    
    counts = counts %>% left_join(totals) %>%
      mutate(prop = n / total)
    # # Joining, by = "Sex"
    # # A tibble: 6 x 6
    #   Sex   Maturity Length     n total  prop
    #   <fct>    <int>  <int> <int> <int> <dbl>
    # 1 F            2     10     1     3 0.333
    # 2 F            4     12     1     3 0.333
    # 3 F            5     25     1     3 0.333
    # 4 M            1      7     1     3 0.333
    # 5 M            2     24     1     3 0.333
    # 6 M            3     25     1     3 0.333
    
    counts %>% select(Sex, Maturity, Length, prop) %>%
      tidyr::spread(key = Length, value = prop, fill = 0)
    # # A tibble: 6 x 7
    #   Sex   Maturity   `7`  `10`  `12`  `24`  `25`
    #   <fct>    <int> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1 F            2 0     0.333 0     0     0    
    # 2 F            4 0     0     0.333 0     0    
    # 3 F            5 0     0     0     0     0.333
    # 4 M            1 0.333 0     0     0     0    
    # 5 M            2 0     0     0     0.333 0    
    # 6 M            3 0     0     0     0     0.333
    

    使用这些数据:

    df = read.table(text = "   Species Sex Maturity    Length
    1     HAK   M        1         7
    2     HAK   M        2         24
    3     HAK   F        2         10
    4     HAK   M        3         25
    5     HAK   F        5         25
    6     HAK   F        4         12", header = T)
    

    【讨论】: