如何将美国人口普查局的州级 shapefile 组合成全国范围的 shape答案

【问题标题】：How to combine state-level shapefiles from the united states census bureau into a nationwide shape如何将美国人口普查局的州级 shapefile 组合成全国范围的 shape
【发布时间】：2014-12-09 01:21:06
【问题描述】：

人口普查局不提供全国范围的公共使用微数据区域（美国社区调查中可用的最小地理区域）的 shapefile。我尝试将它们与几种不同的方法结合起来，但即使是对标识符进行重复数据删除的方法在到达加利福尼亚后也会中断。我是在做一些愚蠢的事情，还是需要一个困难的解决方法？这是重现到事情中断的代码。

library(taRifx.geo)
library(maptools)

td <- tempdir() ; tf <- tempfile()
setInternet2( TRUE )
download.file( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/" , tf )

al <- readLines( tf )
tl <- al[ grep( "geo/tiger/TIGER2014/PUMA/tl_2014_" , al ) ]
fp <- gsub( "(.*)geo/tiger/TIGER2014/PUMA/tl_2014_([0-9]*)_puma10\\.zip(.*)" , "\\2" , tl )

# get rid of alaska
fp <- fp[ fp != '02' ]

af <- paste0( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/tl_2014_" , fp , "_puma10.zip" )

d <- NULL
for ( i in af ){
    try( file.remove( z ) , silent = TRUE )
    download.file( i , tf , mode = 'wb' )
    z <- unzip( tf , exdir = td )
    b <- readShapePoly( z[ grep( 'shp$' , z ) ] )
    if ( is.null( d ) ) d <- b else d <- taRifx.geo:::rbind.SpatialPolygonsDataFrame( d , b , fix.duplicated.IDs = TRUE )
}

# Error in `row.names<-.data.frame`(`*tmp*`, value = c("d.0", "d.1", "d.2",  : 
  # duplicate 'row.names' are not allowed
# In addition: Warning message:
# non-unique values when setting 'row.names': ‘d.0’, ‘d.1’, ‘d.10’, ‘d.11’, ‘d.12’, ‘d.13’, ‘d.14’, ‘d.15’, ‘d.16’, ‘d.17’, ‘d.18’, ‘d.19’, ‘d.2’, ‘d.3’, ‘d.4’, ‘d.5’, ‘d.6’, ‘d.7’, ‘d.8’, ‘d.9’

【问题讨论】：

标签： r map gis shapefile census

【解决方案1】：

您应该猜到的问题是由于您的对象d 中有重复的多边形 ID。

确实，“shp”文件中的所有多边形 ID 都是 "0"。因此，您使用fix.duplicated.IDs = TRUE 使它们与众不同。

这很奇怪，因为 taRifx.geo:::rbind.SpatialPolygonsDataFrame 应该在您设置 fix.duplicated.IDs = TRUE 时修复它。更准确地说，信息被传送到sp::rbind.SpatialPolygons，后者调用“内部”函数sp:::makeUniqueIDs，最终使用函数base::make.unique。

我不想看到这个链条出了什么问题。或者，我建议您自己设置多边形的 ID，而不是使用 fix.duplicated.IDs 选项。

要自行修复，请将 for 循环替换为以下代码：

d <- NULL
count <- 0
for ( i in af ){
    try( file.remove( z ) , silent = TRUE )
    download.file( i , tf , mode = 'wb' )
    z <- unzip( tf , exdir = td )
    b <- readShapePoly( z[ grep( 'shp$' , z ) ] )

    for (j in 1:length(b@polygons))
        b@polygons[[j]]@ID <- as.character(j + count)
    count <- count + length(b@polygons)

    if ( is.null( d ) ) 
       d <- b 
    else 
       d <- taRifx.geo:::rbind.SpatialPolygonsDataFrame( d , b )
}

j 上的简单 for 循环仅在将对象 b 中的每个多边形的 ID 更改为 d 之前。

【讨论】：

【解决方案2】：

这是另一种方法，其中包括获取 FTP 目录列表的捷径。正如@Pop 提到的，关键是确保ID 都是唯一的。

library(RCurl) 
library(rgdal)

# get the directory listing
u <- 'ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/'
f <- paste0(u, strsplit(getURL(u, ftp.use.epsv = FALSE, ftplistonly = TRUE), 
                        '\\s+')[[1]])

# download and extract to tempdir/shps
invisible(sapply(f, function(x) {
  path <- file.path(tempdir(), basename(x))
  download.file(x, destfile=path, mode = 'wb')
  unzip(path, exdir=file.path(tempdir(), 'shps'))
}))

# read in all shps, and prepend shapefile name to IDs
shps <- lapply(sub('\\.zip', '', basename(f)), function(x) {
  shp <- readOGR(file.path(tempdir(), 'shps'), x)
  shp <- spChFIDs(shp, paste0(x, '_', sapply(slot(shp, "polygons"), slot, "ID")))
  shp
})

# rbind to a single object
shp <- do.call(rbind, as.list(shps))

# plot (note: clipping to contiguous states for display purposes)
plot(shp, axes=T, xlim=c(-130, -60), ylim=c(20, 50), las=1)

# write out to wd/USA.shp
writeOGR(shp, '.', 'USA', 'ESRI Shapefile')

【讨论】：