【问题标题】:R programming: computation of biotic index?R编程:生物指数的计算?
【发布时间】:2020-05-20 04:05:13
【问题描述】:

我有一个物种丰度数据集(作为 .csv 数据格式),我想编写一个自动化的 r 脚本来计算每个采样点的生物指数 (BI)。 BI 基于存在-不存在数据。这是我的数据(P表示存在物种):添加代码格式

df = data.frame(Species = c("Sp1", "Sp2","Sp3", "Sp4", "Sp5", "SP6", "Sp7", "Sp8", "Sp9"), Site1 = c("P", NA, "P", "P", NA, "P", NA, "P", "P"), Site2 = c(NA, "P", "P", "P", "P", NA, "P", "P", NA), Site3 = c("P", "P", NA, "P", NA, NA, NA, NA, "P"), Site4 = c(NA, "P", NA, "P", "P", "P", NA, "P", NA), Site5 = c("P", "P", "P", NA, "P", NA, NA, NA, NA))

每个站点的BI可以计算为=(特定站点中存在的每个物种的耐受值之和/物种总数)*10

品种公差值:

Sp1 =1.2, Sp2=1.1, Sp3=2.3, Sp4=4, Sp5 =2.5, Sp6=7, Sp7=2.7, Sp8=3.4,Sp9 =4.5, Sp10=5.5

输出表应该是这样的:

SiteName	BI
Site1	37.3
Site2	26.7
Site3	27
Site4	36
Site5	17.8

有人可以帮我解决这个问题吗?

【问题讨论】:

  • 您能否使用dputdput(df) 添加数据并显示上述示例的预期输出?
  • 嗨 Ronak,希望这就是你的意思
  • 所以你的物种总数是9,对吧?或者您想计算每个物种的站点数量?
  • 是的,物种总数为 9,但并非所有站点都获得了全部 9 个物种

标签: r


【解决方案1】:

我们可以首先创建一个包含 Species 及其容差值的参考数据框。

ref_df <- data.frame(Species = paste0('Sp', 1:10), 
                     tolerance = c(1.2, 1.1, 2.3, 4, 2.5,7, 2.7, 3.4, 4.5, 5.5))

获取长格式数据,与ref_df连接,计算每个tolerancesumSite除以物种总数*10。

library(dplyr)

DF %>%
 tidyr::pivot_longer(cols = -Species, 
                     values_drop_na = TRUE, names_to = 'SiteName') %>%
 left_join(ref_df, by = 'Species') %>% 
  group_by(SiteName) %>%
  summarise(BI = sum(tolerance)/n_distinct(Species) * 10)
  #Or we can also divide by number of rows for each site.
  #summarise(BI = sum(tolerance)/n() * 10)


# A tibble: 5 x 2
#  SiteName    BI
#  <chr>    <dbl>
#1 Site1     37.3
#2 Site2     26.7
#3 Site3     27  
#4 Site4     36  
#5 Site5     17.8

【讨论】:

    【解决方案2】:

    重组数据集后,您可以使用crossprod 获得每个站点的加权总和,可以使用colSums 将其除以案例数,例如:

    x <- df[-1] == "P"
    x[is.na(x)] <- FALSE
    y <- ref_df$tolerance[match(df$Species, ref_df$Species)]
    
    crossprod(x, y) / colSums(x) * 10
    #          [,1]
    #Site1 37.33333
    #Site2 26.66667
    #Site3 27.00000
    #Site4 36.00000
    #Site5 17.75000
    

    数据:

    df <- structure(list(Species = c("Sp1", "Sp2", "Sp3", "Sp4", "Sp5", 
    "Sp6", "Sp7", "Sp8", "Sp9"), Site1 = c("P", NA, "P", "P", NA, 
    "P", NA, "P", "P"), Site2 = c(NA, "P", "P", "P", "P", NA, "P", 
    "P", NA), Site3 = c("P", "P", NA, "P", NA, NA, NA, NA, "P"), 
        Site4 = c(NA, "P", NA, "P", "P", "P", NA, "P", NA), Site5 = c("P", 
        "P", "P", NA, "P", NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -9L))
    
    ref_df <- structure(list(Species = c("Sp1", "Sp2", "Sp3", "Sp4", "Sp5", 
    "Sp6", "Sp7", "Sp8", "Sp9", "Sp10"), tolerance = c(1.2, 1.1, 
    2.3, 4, 2.5, 7, 2.7, 3.4, 4.5, 5.5)), class = "data.frame", row.names = c(NA, 
    -10L))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-06-26
      • 2011-03-07
      • 1970-01-01
      • 1970-01-01
      • 2017-05-13
      • 1970-01-01
      • 2023-02-25
      • 2021-06-30
      相关资源
      最近更新 更多