【问题标题】:How to use select_helpers() [starts_with()] when using readr::read_csv()使用 readr::read_csv() 时如何使用 select_helpers() [starts_with()]
【发布时间】:2019-12-13 16:19:19
【问题描述】:

我有一个相当宽的数据集要读取,顶部有 1000 多个缺失值,但所有变量名称都遵循相同的模式。有没有办法使用starts_with() 来强制正确解析某些变量?

MWE:

library(tidyverse)
library(readr)
mwe.csv <- data.frame(id        = c("a", "b"), #not where I actually get the data from
                      amount1   = c(NA, 20),
                      currency1 = c(NA, "USD")
)

mwe <- readr::read_csv("mwe.csv", guess_max = 1) #guess_max() for example purposes

我希望能够做到

mwe<- read_csv("mwe.csv", guess.max = 1 
         col_types = cols(starts_with("amount") = "d",
                          starts_with("currency") = "c"))
)

> mwe
# A tibble: 2 x 3
  id    amount currency
  <chr>  <dbl> <chr>   
1 a         NA NA      
2 b         20 USD   

但我收到错误“意外的 '=' in: read_csv”。有什么想法吗?我无法对其进行硬编码,因为列数会定期更改,但模式 (amountN) 将保持不变。还会有其他列不是 id 或金额/货币。为了提高速度,我不希望增加 guess.max() 选项。

【问题讨论】:

    标签: r readr


    【解决方案1】:

    答案是作弊!

    mwe             <- read_csv("mwe.csv", n_max = 0) # only need the col_names
    cnames          <- attr(mwe, "spec") # grab the col_names
    ctype           <- rep("?", ncol(mwe)) # create the col_parser abbr -- all guesses
    currency        <- grepl("currency", names(cnames$col)) # which ones are currency? 
                    # or use base::startsWith(names(cnames$col), "currency")
    ctype[currency] <- "c" # do not guess on currency ones, use character
                           # repeat lines 4 & 5 as needed
    mwe             <- read_csv("mwe.csv", col_types = paste(ctype, collapse = ""))
    

    【讨论】:

      猜你喜欢
      • 2021-10-05
      • 2021-08-11
      • 1970-01-01
      • 1970-01-01
      • 2016-01-29
      • 2019-09-05
      • 2015-10-12
      • 2019-03-26
      • 1970-01-01
      相关资源
      最近更新 更多