【问题标题】:apply a function to each cell in a column of a dataframe in R将函数应用于R中数据框列中的每个单元格
【发布时间】:2016-02-05 21:40:37
【问题描述】:

编辑感谢 @user5249203 指出地理编码最好使用 ggmaps 的地理编码调用。不过要注意NA。

我正在与 R 中的apply 家庭作斗争。

我正在使用function,它接受一个字符串并返回经度和纬度

> gGeoCode("Philadelphia, PA") [1] 39.95258 -75.16522

我有一个简单的数据框,其中包含所有 52 个州的名称:

dput(state_lat_long)
structure(
  list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

为了练习我的apply 技能,我只想将gGeoCode 应用于state_lat_long 数据框唯一列中的每个单元格。

再简单不过了。

那么这有什么问题呢?

> View(apply(state_lat_long, function(x) gGeoCode(x)))

当我运行它时,我得到:

Error in View : argument "FUN" is missing, with no default  

我不明白,因为FUN 没有丢失。

那么,让我们试试sapply。应该很简单吧?

但这有什么问题呢?

View(sapply(state_lat_long$State, function(x) gGeoCode(x)))

当我运行它时,我得到 2 行 50 列,其中包含 NAs。我无法理解它。

接下来,我尝试了

View(apply(state_lat_long, 2, function(x) gGeoCode(x)))  

我得到了

     State
  40.71278
 -74.00594  

再一次,这毫无意义!

我做错了什么?谢谢。

【问题讨论】:

  • 申请时需要输入3个参数。第一个是您的对象(例如数据框),第二个指示是应用于行还是列(您需要 2 个列),第三个是 FUN。在您的代码中,缺少第三个参数,因此请尝试 View(apply(state_lat_long, 2, function(x) gGeoCode(x)))
  • 你能看看我对原始问题的编辑吗?
  • 也许我弄混了,应该是 View(apply(state_lat_long, 1, function(x) gGeoCode(x))) ?如果没有,它可能没有我想象的那么简单,我需要查看您用于 gGeoCode 的代码是否有帮助(我可能会也可能不会)。
  • 分解代码并一次运行一个。 margin = as rowcolumn 也很重要,此外,看看您如何生成 dataframe 会有所帮助?你是如何生成数据框的?
  • 而且,它还有助于了解您是否传递了函数 gGeoCode 所期望的正确参数。

标签: r sapply


【解决方案1】:

我知道这个问题主要是关于 *apply,但是,如果您只是在地理编码之后,更简单的选择是使用矢量化函数,例如 ggmap::geocode

state_lat_long <- structure(
    list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

library(ggmap)

## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
#           lon      lat
# 1   -74.00594 40.71278
# 2  -116.41939 38.80261
# 3   -99.90181 31.96860
# 4  -119.41793 36.77826
# 5   -94.68590 46.72955
# 6  -101.00201 47.55149

【讨论】:

  • 我认为这是最好的主意——不要使用 APPLY 而是使用来自 ggmap 的地理编码。
【解决方案2】:

你的数据框是这样的吗?

df = data.frame(State = c(
    32L, 28L, 43L, 5L, 23L, 34L,
    30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
    18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
    17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
    19L, 41L, 50L, 2L, 45L
  ), Label = c(
    "alabama", "alaska", "arizona",
    "arkansas", "california", "colorado", "connecticut", "delaware",
    "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
    "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
    "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
    "montana", "nebraska", "nevada", "new hampshire", "new jersey",
    "new mexico", "new york", "north carolina", "north dakota", "ohio",
    "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
    "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
    "washington", "west virginia", "wisconsin", "wyoming"
  ))

head(df)
  State      Label
1    32    alabama
2    28     alaska
3    43    arizona
4     5   arkansas
5    23 california
6    34   colorado

apply(df, 1, function(x) gGeoCode(x))

或者,

mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)

注意:某些州仍然会抛出 NA。重新运行代码会获取丢失的坐标。但是,如果我们知道您的输入格式/数据框结构,我希望它能够更有效地工作。此外,重要的是确保您传递的参数是 gGeoCode 所期望的。

【讨论】:

  • 所以状态抛出NA的原因是由于功能,而不是应用。我明白了。
  • 我认为您需要了解 fuxntion 的工作原理。当您分别传递州名时……它确实给出了坐标。但这里的问题是,您传递它的方式或传递函数的方式。 Apply 或 mapply 正在帮助您在没有 for 循环的情况下应用函数。但是,您需要知道哪些坐标是正确的,哪些是错误的。
猜你喜欢
  • 2014-12-17
  • 2022-07-13
  • 1970-01-01
  • 2017-04-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-01-21
  • 2015-09-13
相关资源
最近更新 更多