根据图层部分名称匹配选择堆栈中的栅格答案

【问题标题】：Select rasters in stack based on layer partial name match根据图层部分名称匹配选择堆栈中的栅格
【发布时间】：2013-05-07 14:40:24
【问题描述】：

我有一堆栅格（每个物种一个），然后我有一个数据框，其中包含 lat/long 列以及物种名称。

fls = list.files(pattern="median")
s <- stack(fls)
df<-c("x","y","species name")

我希望一次只选择一个栅格来使用提取功能。我希望选择基于基于物种名称列的部分匹配。我想这样做是因为栅格名称可能与物种列表中的名称不完全匹配，可能存在小写/大写不匹配或栅格图层名称可能更长，例如“species_name_median”，或者也可能有“ _" 而不是空格。

for(i:length(df.species name))
{
  result<-extract(s[[partial match to "species name[i]" ]],df.xy)
}

我希望这是有道理的，我只想一次使用一个栅格进行提取。我可以使用 s[[i]] 轻松选择单个栅格，但不能保证列表中的每个物种都有其等效栅格。

【问题讨论】：

如果没有一些你想做的模糊匹配的例子，这个问题基本上是不可能以任何有意义的方式回答的。
@SimonO101 一个例子是：一个名为“Lion_median”的栅格，其中的物种名称列将是“lion”。在这种情况下，我需要将狮子与狮子相匹配。有帮助吗
是的。我添加了一个可行的答案，前提是物种名称实际上拼写正确（即匹配忽略了层名中物种名称的标点符号、大小写和位置）。 HTH。
如果您需要更多帮助，请发布您遇到的任何后续问题... :-)
@SimonO101 我不熟悉您使用的某些功能，因此我需要一些时间才能了解您的答案实际上是做什么的。不过谢谢。

标签： string r select stack raster

【解决方案1】：

如果您要查询的点数据由 x 和 y 坐标的 data.frame 以及要查询的图层的适当物种名称组成，您可以使用这两个命令来完成所有操作：

#  Find the layer to match on using 'grepl' and 'which' converting all names to lowercase for consistency
df$layer <- lapply( df$species , function(x) which( grepl( tolower(x) , tolower(names(s)) ) ) )


# Extract each value from the appropriate layer in the stack
df$Value <- sapply( seq_len(nrow(df)) , function(x) extract( s[[ df$layer[x] ]] , df[ x , 1:2 ] ) )

工作原理

从第一行开始：

首先，我们定义一个新的列向量df$layer，它将作为堆栈中我们需要用于该行的rasterLayer 的索引。
lapply 沿df$species 列中的所有元素迭代，并依次应用匿名函数，将df$species 中的每个项目用作输入变量x。 lapply 是一个循环结构，尽管它看起来不像。
在第一次迭代中，我们获取df$species 的第一个元素，现在是x，并在grepl 中使用它（意思是“全局正则模式匹配逻辑”）来查找我们的名称中的哪些元素堆栈s 包含我们的物种模式。我们在要匹配的模式（x）和要匹配的元素（names(s)）上都使用tolower()，以确保即使在大小写不匹配的情况下也能匹配，例如"Tiger" 找不到 "tiger"。
grepl 返回一个逻辑向量，其中它找到了与模式匹配的元素，例如grepl( "abc" , c("xyz", "wxy" , "acb" , "zxabcty" ) ) 返回 F , F , T , T。我们使用which 来获取这些元素的索引。
我们的想法是我们得到一个，并且只有一个堆栈中的层与每一行的物种名称匹配，所以唯一的TRUE索引将是我们想要的堆栈中的层的索引。

在第二行，sapply：

sapply 是一个迭代器，很像 lapply，但它返回一个向量而不是值列表。 TBH 您可以在此用例中使用任何一种。
现在我们遍历从1 到nrow(df) 的数字序列。
我们使用另一个匿名函数中的行号作为我们的输入变量x
我们想要提取data.frame当前行（由x给出）的"x"和"y"坐标（分别为第1列和第2列），使用我们在上一行中得到的层.
我们将所有这些操作的结果分配给 data.frame 中的另一列，该列包含为相应层的 x/y 坐标提取的值

希望对你有帮助！！

还有一个包含一些数据的工作示例：

require( raster )
#  Sample rasters - note the scale of values in each layer  
# Tens
r1 <- raster( matrix( sample(1:10,100,repl=TRUE) , ncol = 10 ) )    
# Hundreds
r2 <- raster( matrix( sample(1e2:1.1e2,100,repl=TRUE) , ncol = 10 ) )   
# Thousands
r3 <- raster( matrix( sample(1e3:1.1e3,100,repl=TRUE) , ncol = 10 ) )

#  Stack the rasters
s <- stack( r1,r2,r3 )
#  Name the layers in the stack
names(s) <- c("LIon_medIan" , "PANTHeR_MEAN_AVG" , "tiger.Mean.JULY_2012")


#  Data of points to query on
df <- data.frame( x = runif(10) , y = runif(10) , species = sample( c("lion" , "panther" , "Tiger" ) , 10 , repl = TRUE ) )

#  Run the previous code
df$layer <- lapply( df$species , function(x) which( grepl( tolower(x) , tolower(names(s)) ) ) )
df$Value <- sapply( seq_len(nrow(df)) , function(x) extract( s[[ df$layer[x] ]] , df[ x , 1:2 ] ) )

#  And the result (note the scale of Values is consistent with the scale of values in each rasterLayer in the stack)
df
#          x         y species layer Value
#1  0.4827577 0.7517476    lion     1     1
#2  0.8590993 0.9929104    lion     1     3
#3  0.8987446 0.4465397   tiger     3  1084
#4  0.5935572 0.6591223 panther     2   107
#5  0.6382287 0.1579990 panther     2   103
#6  0.7957626 0.7931233    lion     1     4
#7  0.2836228 0.3689158   tiger     3  1076
#8  0.5213569 0.7156062    lion     1     3
#9  0.6828245 0.1352709 panther     2   103
#10 0.7030304 0.8049597 panther     2   105

【讨论】：

【解决方案2】：

您尝试subset您的 RasterStack 了吗？

像这样的

for(i in 1: length(df.species.name)) #assuming it is the 'partial species name'
{
  result <- subset(s, grep(df.species.name[i], ignore.case = TRUE, value = TRUE)
}

了解不同的栅格和物种名称可能会很有趣。这将允许更好的方法，必要时调整正则表达式。您会在此处找到许多对 grep 的引用。也可以试试?grep。

【讨论】：