在 R 中格式化数据答案

【问题标题】：Formatting Data in R在 R 中格式化数据
【发布时间】：2012-09-23 05:57:03
【问题描述】：

假设我在 R 中导入一个 csv 文件来创建 R 数据集。现在这个文件有数字、字符、数据和百分比值。如何确保我导入的数据与原始文件中的数据格式相同。

在 SAS 中，我们通常具有在导入时格式化数据的选项。这是示例

data test ;  
           infile "c:\mydocument\raw.csv" 
           delimiter = ',' MISSOVER DSD lrecl=32767
           firstobs=2 ;

           input 
              varA         
              varB         : $50.
              varC        : date9.
              varD      : Percent5.2
              varE      : $20.
;
run;

R 中是否有任何选项可以执行相同的操作？如果有人可以给我一些参考，那就太好了！

基于以下答案的示例：

Local<-read.csv("C:\\Users\\Raw.csv",colClasses = c("character","character","Date","character","character","character","character","character","character","character","numeric","numeric", "numeric","numeric"),row.names=1)

我根据 Dason 的 示例使用了以下代码。但我收到以下错误。你能告诉我为什么会出现这个错误吗？你帮了大忙。

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  scan() expected 'a real', got '.'

谢谢。 Rgds。

【问题讨论】：

也许 . 用于 NA...但是，这对我们来说很难说，因为您的示例不可重现。
感谢保罗的评论。我有 ”。”在我的数据中。所以这个错误已经被处理了。但是另一个错误来了。 “charToDate(x) 中的错误：字符串不是标准的明确格式”。我想我必须自己照顾它。
或者如果你没有成功，问另一个问题，最好包括一个可重现的例子。我还添加了我的评论作为答案。

标签： r csv import file-io

【解决方案1】：

read.csv 的 colClasses 参数就是你想要的。来自?read.csv：

colClasses: character.  A vector of classes to be assumed for the
          columns.  Recycled as necessary, or if the character vector
          is named, unspecified values are taken to be ‘NA’.

          Possible values are ‘NA’ (the default, when ‘type.convert’ is
          used), ‘"NULL"’ (when the column is skipped), one of the
          atomic vector classes (logical, integer, numeric, complex,
          character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’.
          Otherwise there needs to be an ‘as’ method (from package
          ‘methods’) for conversion from ‘"character"’ to the specified
          formal class.

          Note that ‘colClasses’ is specified per column (not per
          variable) and so includes the column of row names (if any).

一些例子使用

dat <- data.frame(num = 1:4, ch = letters[1:4])
write.csv(dat, file = "test.csv")
read.csv("test.csv", 
          colClasses = c(NA, "numeric", "character"),
          row.names = 1)
#  num ch
#1   1  a
#2   2  b
#3   3  c
#4   4  d
out <- read.csv("test.csv", 
                 colClasses = c(NA, "numeric", "character"),
                 row.names = 1)
str(out)
#'data.frame':  4 obs. of  2 variables:
# $ num: num  1 2 3 4
# $ ch : chr  "a" "b" "c" "d"

【讨论】：

【解决方案2】：

关于您的第二条错误消息，可能发生的情况是 . 被用作特殊字符，可能是为了显示数据集中 NA 的位置。您可以使用 na.strings 参数告诉 read.csv 哪些字符串被视为 NA。

【讨论】：