【问题标题】:Trouble with reading CSV file in R在 R 中读取 CSV 文件时遇到问题
【发布时间】:2015-05-19 07:20:06
【问题描述】:

我是 R 的新手。 我有一个 24MB 的 CSV 文件。在装有 OS Yoswmite、4GB RAM 的 MacBook Air 上将其读入 RStudio。 R 版本 3.1.1 (2014-07-10)。查看View(df)的内容就OK了。正在尝试应用过滤器。不要受到任何打击。试图从字符转换为数字。 R 在铸造完成的列中用 NA 替换所有字符!这里会发生什么?似乎 R 无法读取单元格的内容。有什么关于编码的吗? 这就是我所做的: 先做个总结:

R 代码:

eiendommer <- read.csv("eiendommer.csv",  sep = ";", quote = "",  encoding="UTF-8", stringsAsFactors = FALSE)
View(eiendommer)# I can view the content of the csv file
filtereiendommer <- filter(eiendommer, kommune == "0101")# no match
filtereiendom <- eiendommer [eiendommer$kommune == "0101",]#no match
utvalg <- eiendommer[160567:161934,]#manual selection of rows do work             utvalgsortert <- arrange(utvalg, desc(jordbruksareal), desc(skogareal))# works
View(utvalgsortert)
##Try to transform columns from character to number. 
transformedEiendom <- transform(sortertEiendom, jordbruksareal = as.numeric(jordbruksareal),
                       skogareal = as.numeric(skogareal) )
#This result in NA where it earlier was characters with lengt 1-3:"646", "18", "2" 

总结:

kommune           X.gardsnr.         X.bruksnr.         X.festenr.         bruksnavn         jordbruksareal    
 Length:207554      Length:207554      Length:207554      Length:207554           Length:207554      Length:207554     
 Class :character   Class :character   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
 X.annetareal.       skogareal         X.fulldyrket.        X.overflatedyrket. X.innmarksbeite.  
 Length:207554      Length:207554      Length:207554      Length:207554      Length:207554     
 Class :character   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  

头:

head(eiendommer)
  kommune X.gardsnr. X.bruksnr. X.festenr.    bruksnavn jordbruksareal X.annetareal. skogareal X.fulldyrket.
1  "0101"        "1"        "1"        "0" "PRESTEGÅRD"            "0"           "5"       "0"           "0"
2  "0101"        "1"        "6"        "0"         "MO"            "8"           "4"       "7"           "8"
3  "0101"        "1"        "9"        "0"  "BERG GÅRD"          "415"          "16"      "39"         "415"
4  "0101"        "2"        "1"        "0"     "BOBERG"          "467"          "22"     "276"         "463"
5  "0101"        "4"        "1"        "0"  "LUNDESTAD"          "877"          "62"     "793"         "837"
6  "0101"        "4"        "5"        "0"     "LEIREN"           "74"          "14"     "165"          "74"    

【问题讨论】:

  • Hei CodeR,您的问题不是如何读取 CSV 文件,而是如何将对象从一类转换为另一类。

标签: r csv


【解决方案1】:

您似乎已经指定了quote = "",实际上您应该在该位置指定,也许quote='"' 或简单地使用默认值。

请看下面的例子

d <- data.frame(x='a',y='"a"',stringsAsFactors=FALSE)
d 
#   x   y
# 1 a "a"

对于常规字符向量,print.data.frame 不会将其包含在 "

【讨论】:

  • 好吧,如果我没听错的话:我所有的数据都有引号(“”),这是因为 read.csv 中的引号语句?我这样做是为了读取文件。早些时候我收到这条消息:警告消息:在扫描(文件,什么,nmax,sep,dec,quote,skip,nlines,na.strings,:引用字符串中的EOF当在read.csv中包含quotes语句时,我能够阅读文件。谢谢。
  • 在这里我找到了关于引号的提示:stackoverflow.com/questions/17414776/…
【解决方案2】:

挪威 Kommune Nummer 将面临的一个挑战是那些以 0 开头的挑战,例如 Halden "0101"。

#Prepare Data
    kommune = rep("0101", 6) 
    jordbruksareal<- c("5","4","16","22","62","14")
    skogareal <- c("0","8","415","463","837", "74")
    eiendommer <- cbind(kommune, jordbruksareal, skogareal)
    eiendommer <- as.data.frame(cbind(kommune, jordbruksareal, skogareal), stringsAsFactors=FALSE)
    #Transform into numeric
    str(eiendommer) #All is Character
    eiendommer$skogareal<-as.numeric(eiendommer$skogareal)
    eiendommer$jordbruksareal<-as.numeric(eiendommer$jordbruksareal)
    eiendommer$kommune<-as.numeric(eiendommer$kommune)
    str(eiendommer) #All is numeric, but losing first zer0
#Make a filter
    require(dplyr)
    filterA <- filter(eiendommer, eiendommer$jordbruksareal == "4")
    filter <- subset(eiendommer, eiendommer$kommune == 101)
#Treat Kommune Numbers
    eiendommer$kommune <- formatC(eiendommer$kommune, digits = 0, format = "f", width = 4, flag = 0)
    eiendommer$kommune <- sprintf("%04d",eiendommer$kommune)
    str(eiendommer)
    filter2 <- subset(eiendommer, eiendommer$kommune == "0101")

希望这对你有一点帮助,Ha det bra!

【讨论】:

    【解决方案3】:

    这导致我在文件中出现问题:

    ;"BLOMSTERHAGEN\"";
    

    改成:

    ;"BLOMSTERHAGEN";
    

    解决了这个问题。不,我可以像这样读取.csv:

    eiendommer <- read.csv("eiendommer.csv",  sep = ";", encoding="UTF-8", stringsAsFactors = FALSE)
    

    谢谢

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-09-28
      • 2017-07-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-04-21
      • 1970-01-01
      相关资源
      最近更新 更多