【发布时间】:2018-08-20 13:18:50
【问题描述】:
我有一个data.table,其中包含一些州名缩写和县名。我想得到大约。每行的坐标来自ggplot2::map_data('county')。
我可以使用:= 使用多行代码按顺序执行此操作,但我只想进行一个函数调用。
以下是我尝试过的:
数据:
library(data.table)
library(ggplot2)
> dput(dt[1:20, .(state, county, prime_mover)])
structure(list(state = c("AZ", "AZ", "CA", "CA", "CA", "CT",
"FL", "IN", "MA", "MA", "MA", "MN", "NJ", "NJ", "NJ", "NY", "NC",
"SC", "TN", "TX"), county = c("Maricopa", "Maricopa", "Los Angeles",
"Orange", "Los Angeles", "Fairfield", "Hillsborough", "Morgan",
"Barnstable", "Nantucket", "Essex", "Dakota", "Cape May", "Salem",
"Middlesex", "Kings", "Buncombe", "Anderson", "Shelby", "Tarrant"
), prime_mover = c("GT", "GT", "CT", "CT", "CT", "CT", "GT",
"CT", "GT", "GT", "GT", "GT", "CT", "GT", "CT", "GT", "CT", "CT",
"CT", "CT")), .Names = c("state", "county", "prime_mover"), row.names = c(NA,
-20L), class = c("data.table", "data.frame"))
coord_data <- as.data.table(map_data('county'))
代码:
getCoords <- function(state, county){
prov <- state.name[grep(state, state.abb)]
ck <- coord_data[region == tolower(prov) & subregion == tolower(county),
.(lon = mean(long), lat = mean(lat))]
return(list(unname(unlist(ck))))
}
# Testing getCoords
> getCoords('AZ', 'Maricopa')
[[1]]
[1] -111.88668 33.58126
错误:
> dt[, c('lon', 'lat') := lapply(.SD, getCoords), .SDcols = c('state', 'county')]
Error in tolower(county) : argument "county" is missing, with no default
In addition: Warning message:
In grep(state, state.abb) :
argument 'pattern' has length > 1 and only the first element will be used
我已经看到以下答案,但无法完全理解我做错了什么:
- Loop through data.table and create new columns basis some condition
- R data.table create new columns with standard names
- Add new columns to a data.table containing many variables
- Add multiple columns to R data.table in one function call?
- Assign multiple columns using := in data.table, by group
- Dynamically create new columns in data.table
我可以通过其他方式(多行,dplyr 甚至是基本 R)实现我想要的,但我更喜欢使用 data.table 方法。
【问题讨论】:
-
lapply用于多次应用函数,每列一次。您可以使用do.call(fun, .SD)将多个 cols 作为单独的参数传递给单个函数调用(或者使用do.call(fun, unname(.SD))按位置传递)。 -
或
dt[, c('lon', 'lat') := getCoords(state, county), by=1:NROW(dt)] -
@Frank 谢谢!你能解释一下它是如何/为什么起作用的吗?
-
@dww 我没有收到任何错误,但有很多警告主要与
getCoords返回的大小有关。代码块将getCoords输出的第一个元素写入lon和lat。 -
据我所知这是有效的。
dt[,lon := getCoords(state, county)[[1]][1],by=1:NROW(dt)][, lat:=getCoords(state, county)[[1]][2],by=1:NROW(dt)]基于此stackoverflow.com/a/11308946/5795592
标签: r data.table