【问题标题】:How to use a list of row numbers to look up values in a dataframe column如何使用行号列表在数据框列中查找值
【发布时间】:2018-10-14 01:41:03
【问题描述】:

我有一个包含 to 列的大数据框,一个是名为“code”的 ID 代码,一个是两个火车站的名称,由名为“name”的斜线分隔

我想搜索与电台名称相关的所有代码(并且能够一次查找多个电台),因此它会给我一个包含每个电台的多个代码的向量列表。

我使用 lapply 来获取每个站点的行,但现在我无法在与行号关联的“代码”列中查找值。

SearchFor <- c("Chicago", "New York", "Atlanta")
lapply(c(SearchFor,grep,x=datastations$name)

我有以下清单:

$`Chicago`
 [1]  29  64 135 160 164 167 176 186 225 247 248 

$New York
 [1]  51  53 109 111 112 164 


$Atlanta
[1]   4  78 168 237 291

基本上,我想将这些数字中的每一个更改为这些行中“代码”列的值。

这是我使用 dput 后​​的数据表“数据站”:

structure(list(code = c(6000L, 6001L, 6002L, 6003L, 6004L, 6005L, 
6006L, 6007L, 6008L, 6009L, 6010L, 6011L, 6012L, 6013L, 6014L, 
6015L, 6016L, 6017L, 6018L, 6019L, 6020L, 6021L, 6022L, 6023L, 
6024L, 6025L, 6026L, 6027L, 6028L, 6029L, 6030L, 6031L, 6032L, 
6033L, 6034L, 6035L, 6036L, 6037L, 6038L, 6039L, 6040L, 6041L, 
6042L, 6043L, 6044L, 6045L, 6046L, 6047L, 6048L, 6049L, 5000L, 
5001L, 5002L, 5003L, 5004L, 5005L, 5006L, 5007L, 5008L, 6050L, 
6051L, 6052L, 6053L, 6054L, 6055L, 6056L, 6057L, 6058L, 6059L, 
6060L, 6061L, 6062L, 6063L, 6064L, 6065L, 6066L, 6067L, 6068L, 
6069L, 6070L, 6071L, 6072L, 6073L, 6074L, 6075L, 6076L, 6077L, 
6078L, 6079L, 6080L, 6081L, 6082L, 6083L, 6084L, 6085L, 6086L, 
6087L, 6088L, 6089L, 6090L, 6091L, 5009L, 5010L, 5011L, 5012L, 
6092L, 6093L, 6094L, 6095L, 6096L, 6097L), name = c("Atlanta / New York", 
"Atlanta / Chicago", "Atlanta / Miami", "Atlanta / Los Angeles", 
"Atlanta / Toronto", "Atlanta / Washington", "Atlanta / Cleveland", 
"Atlanta / Raleigh", "Atlanta / Newark", "Atlanta / Ottawa", 
"Atlanta / Detroit", "Atlanta / Albany", "Atlanta / Hartford", 
"Atlanta / Providence", "New York / Chicago", "New York / Miami", 
"New York / Los Angeles", "New York / Toronto", "New York / Washington", 
"New York / Cleveland", "New York / Raleigh", "New York / Newark", 
"New York / Ottawa", "New York / Detroit", "New York / Albany", 
"New York / Hartford", "New York / Providence", "Chicago / Miami", 
"Chicago / Los Angeles", "Chicago / Toronto", "Chicago / Washington", 
"Chicago / Cleveland", "Chicago / Raleigh", "Chicago / Newark", 
"Chicago / Ottawa", "Chicago / Detroit", "Chicago / Albany", 
"Chicago / Hartford", "Chicago / Providence", "Miami / Los Angeles", 
"Miami / Toronto", "Miami / Washington", "Miami / Cleveland", 
"Miami / Raleigh", "Miami / Newark", "Miami / Ottawa", "Miami / Detroit", 
"Miami / Albany", "Miami / Hartford", "Miami / Providence", "Toronto /             Washington", 
"Toronto / Cleveland", "Toronto / Raleigh", "Toronto / Newark", 
"Toronto / Ottawa", "Toronto / Detroit", "Toronto / Albany", 
"Toronto / Hartford", "Toronto / Providence", "Los Angeles / Toronto", 
"Los Angeles / Washington", "Los Angeles / Cleveland", "Los Angeles /         Raleigh", 
"Los Angeles / Newark", "Los Angeles / Ottawa", "Los Angeles / Detroit", 
"Los Angeles / Albany", "Los Angeles / Hartford", "Los Angeles / Providence", 
"Washington / Washington", "Washington / Cleveland", "Washington / Raleigh", 
"Washington / Newark", "Washington / Ottawa", "Washington / Detroit", 
"Washington / Hartford", "Washington / Providence", "Raleigh / Newark", 
"Raleigh / Ottawa", "Raleigh / Detroit", "Raleigh / Albany", 
"Raleigh / Hartford", "Raleigh / Providence", "Cleveland / Raleigh", 
"Cleveland / Newark", "Cleveland / Ottawa", "Cleveland / Detroit", 
"Cleveland / Albany", "Cleveland / Hartford", "Cleveland / Providence", 
"New York / Newark", "New York / Ottawa", "New York / Detroit", 
"New York / Albany", "New York / Hartford", "New York / Providence", 
"Newark / Ottawa", "Newark / Detroit", "Newark / Albany", "Newark /         Hartford", 
"Newark / Providence", "Ottawa / Detroit", "Ottawa / Albany", 
"Ottawa / Hartford", "Ottawa / Providence", "Detroit / Albany", 
"Detroit / Hartford", "Detroit / Providence", "Albany / Hartford", 
"Albany / Providence", "Hartford / Providence")), class = "data.frame",     row.names = c(NA, 
-111L))

我通过使用此代码读取 .csv 文件获得了这个数据库

read.csv(file, colClasses = 
c(rep("integer",1),rep("character",1),rep("NULL",2)))

我想申请类似的东西:

List[1] <- datastations$code[List[[1]]]

但是在列表的每个向量上,不管有多少(所以基本上没有循环)

【问题讨论】:

  • lapply(SearchFor,grep,x=datastations$name, value = TRUE)
  • 这给了我“名称”列中的值,而我需要“代码”列中的值
  • 您能否向我们提供代码以生成具有相同内容的相同格式的数据框,或者使用dput 来打包您的数据以便重现?当我们可以将您的数据以及您尝试过的内容剪切并粘贴到我们自己的 R 会话中时,帮助解决问题会容易得多。谢谢:)
  • 我编辑了我的帖子以添加运行 dput 后​​得到的内容。如果我知道怎么做,我也可以只提供 .csv
  • 请添加一个示例,说明您的预期结果。根据您对@mysteRious 的回答,这并不完全清楚。

标签: r lapply


【解决方案1】:

就像其他人在上面的 cmets 中所说的那样,您想要的最终结果并不完全清楚。但如果我理解正确,我认为这可能就是你想要的。

在这里,我使用 map 包中的 purrr 来遍历城市名称的向量,并为每个城市名称获取一个代码向量,使用 set_names 来命名最终列表的元素城市。

library(dplyr)
library(stringr)
library(purrr)

# load data as df (see below) 

cities <- c("Chicago", "New York", "Atlanta")

get_city_stations <- function(city, station_data) {
  station_data %>% 
    filter(str_detect(name, city)) %>% 
    pull(code)
}

codes <- map(cities, get_city_stations, station_data = df) %>% set_names(cities)

codes
#> $Chicago
#>  [1] 6001 6014 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038
#> 
#> $`New York`
#>  [1] 6000 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026
#> [15] 6081 6082 6083 6084 6085 6086
#> 
#> $Atlanta
#>  [1] 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013

reprex package (v0.2.0) 于 2018 年 10 月 14 日创建。

df <- structure(list(code = c(6000L, 6001L, 6002L, 6003L, 6004L, 6005L, 
6006L, 6007L, 6008L, 6009L, 6010L, 6011L, 6012L, 6013L, 6014L, 
6015L, 6016L, 6017L, 6018L, 6019L, 6020L, 6021L, 6022L, 6023L, 
6024L, 6025L, 6026L, 6027L, 6028L, 6029L, 6030L, 6031L, 6032L, 
6033L, 6034L, 6035L, 6036L, 6037L, 6038L, 6039L, 6040L, 6041L, 
6042L, 6043L, 6044L, 6045L, 6046L, 6047L, 6048L, 6049L, 5000L, 
5001L, 5002L, 5003L, 5004L, 5005L, 5006L, 5007L, 5008L, 6050L, 
6051L, 6052L, 6053L, 6054L, 6055L, 6056L, 6057L, 6058L, 6059L, 
6060L, 6061L, 6062L, 6063L, 6064L, 6065L, 6066L, 6067L, 6068L, 
6069L, 6070L, 6071L, 6072L, 6073L, 6074L, 6075L, 6076L, 6077L, 
6078L, 6079L, 6080L, 6081L, 6082L, 6083L, 6084L, 6085L, 6086L, 
6087L, 6088L, 6089L, 6090L, 6091L, 5009L, 5010L, 5011L, 5012L, 
6092L, 6093L, 6094L, 6095L, 6096L, 6097L), name = c("Atlanta / New York", 
"Atlanta / Chicago", "Atlanta / Miami", "Atlanta / Los Angeles", 
"Atlanta / Toronto", "Atlanta / Washington", "Atlanta / Cleveland", 
"Atlanta / Raleigh", "Atlanta / Newark", "Atlanta / Ottawa", 
"Atlanta / Detroit", "Atlanta / Albany", "Atlanta / Hartford", 
"Atlanta / Providence", "New York / Chicago", "New York / Miami", 
"New York / Los Angeles", "New York / Toronto", "New York / Washington", 
"New York / Cleveland", "New York / Raleigh", "New York / Newark", 
"New York / Ottawa", "New York / Detroit", "New York / Albany", 
"New York / Hartford", "New York / Providence", "Chicago / Miami", 
"Chicago / Los Angeles", "Chicago / Toronto", "Chicago / Washington", 
"Chicago / Cleveland", "Chicago / Raleigh", "Chicago / Newark", 
"Chicago / Ottawa", "Chicago / Detroit", "Chicago / Albany", 
"Chicago / Hartford", "Chicago / Providence", "Miami / Los Angeles", 
"Miami / Toronto", "Miami / Washington", "Miami / Cleveland", 
"Miami / Raleigh", "Miami / Newark", "Miami / Ottawa", "Miami / Detroit", 
"Miami / Albany", "Miami / Hartford", "Miami / Providence", "Toronto /             Washington", 
"Toronto / Cleveland", "Toronto / Raleigh", "Toronto / Newark", 
"Toronto / Ottawa", "Toronto / Detroit", "Toronto / Albany", 
"Toronto / Hartford", "Toronto / Providence", "Los Angeles / Toronto", 
"Los Angeles / Washington", "Los Angeles / Cleveland", "Los Angeles /         Raleigh", 
"Los Angeles / Newark", "Los Angeles / Ottawa", "Los Angeles / Detroit", 
"Los Angeles / Albany", "Los Angeles / Hartford", "Los Angeles / Providence", 
"Washington / Washington", "Washington / Cleveland", "Washington / Raleigh", 
"Washington / Newark", "Washington / Ottawa", "Washington / Detroit", 
"Washington / Hartford", "Washington / Providence", "Raleigh / Newark", 
"Raleigh / Ottawa", "Raleigh / Detroit", "Raleigh / Albany", 
"Raleigh / Hartford", "Raleigh / Providence", "Cleveland / Raleigh", 
"Cleveland / Newark", "Cleveland / Ottawa", "Cleveland / Detroit", 
"Cleveland / Albany", "Cleveland / Hartford", "Cleveland / Providence", 
"New York / Newark", "New York / Ottawa", "New York / Detroit", 
"New York / Albany", "New York / Hartford", "New York / Providence", 
"Newark / Ottawa", "Newark / Detroit", "Newark / Albany", "Newark /         Hartford", 
"Newark / Providence", "Ottawa / Detroit", "Ottawa / Albany", 
"Ottawa / Hartford", "Ottawa / Providence", "Detroit / Albany", 
"Detroit / Hartford", "Detroit / Providence", "Albany / Hartford", 
"Albany / Providence", "Hartford / Providence")), class = "data.frame",     row.names = c(NA, 
-111L))

【讨论】:

    【解决方案2】:

    也许这就是您要找的东西?按照我阅读问题的方式,您需要一份与特定城市或城市组相对应的所有车站代码的列表。如果这看起来很有趣,您的dput 中可能有错误的电台代码。

    library(dplyr)
    codelist <- df %>% filter(grepl("Chicago",name)) %>% select(code)
    
    > unlist(codelist)
     code1  code2  code3  code4  code5  code6  code7  code8  code9 code10 code11 code12 code13 code14 
      6001   6014   6027   6028   6029   6030   6031   6032   6033   6034   6035   6036   6037   6038 
    

    或者对于多个站点:

    > codelist <- df %>% filter(grepl("Chicago|New York|Atlanta",name)) %>% select(code)
    > unlist(codelist)
     code1  code2  code3  code4  code5  code6  code7  code8  code9 code10 code11 code12 code13 code14 code15 
      6000   6001   6002   6003   6004   6005   6006   6007   6008   6009   6010   6011   6012   6013   6014 
    code16 code17 code18 code19 code20 code21 code22 code23 code24 code25 code26 code27 code28 code29 code30 
      6015   6016   6017   6018   6019   6020   6021   6022   6023   6024   6025   6026   6027   6028   6029 
    code31 code32 code33 code34 code35 code36 code37 code38 code39 code40 code41 code42 code43 code44 code45 
      6030   6031   6032   6033   6034   6035   6036   6037   6038   6081   6082   6083   6084   6085   6086 
    

    【讨论】:

    • 很奇怪,当我尝试你的代码时,它似乎认为数据帧是反转的,它说:“结果必须有长度 1909827,而不是 2”。另外,我希望像我在帖子中那样以列表格式获取它,但使用行号代替。例如,我想为芝加哥获取一个向量,为亚特兰大获取一个向量,为纽约获取一个向量,并且可能具有任意长度的向量“SearchFor”。
    • 尝试library(tibble),然后创建new.df &lt;- as.tibble(df),然后在new.df 上尝试。如果您仍然收到奇怪的错误,则dput 中的数据存在问题。我从中得到的df 在 5000-5012 和 6000-6097 之间有 111 个唯一代码。
    • 好吧,这是我的一个奇怪的错误,但现在它需要为多个站点工作,但我不希望来自不同站点的代码被分组在同一个列表中
    猜你喜欢
    • 2020-04-07
    • 1970-01-01
    • 2021-12-29
    • 1970-01-01
    • 2021-05-16
    • 1970-01-01
    • 1970-01-01
    • 2023-04-01
    • 1970-01-01
    相关资源
    最近更新 更多