【发布时间】:2016-11-26 22:26:00
【问题描述】:
我有一个巨大的 csv 文件。它的大小约为 9 GB。我有 16 GB 的内存。我遵循了page 的建议并在下面实施了它们。
If you get the error that R cannot allocate a vector of length x, close out of R and add the following line to the ``Target'' field:
--max-vsize=500M
我仍然收到以下错误和警告。我应该如何将 9 GB 的文件读入我的 R?我有 R 64 位 3.3.1,并且我在 rstudio 0.99.903 中运行以下命令。我有 windows server 2012 r2 标准,64 位操作系统。
> memory.limit()
[1] 16383
> answer=read.csv("C:/Users/a-vs/results_20160291.csv")
Error: cannot allocate vector of size 500.0 Mb
In addition: There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
2: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
3: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
4: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
5: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
6: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
7: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
8: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
9: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
10: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
11: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
12: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
------- 更新1
我的第一次尝试基于建议的答案
> thefile=fread("C:/Users/a-vs/results_20160291.csv", header = T)
Read 44099243 rows and 36 (of 36) columns from 9.399 GB file in 00:13:34
Warning messages:
1: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv", :
Reached total allocation of 16383Mb: see help(memory.size)
2: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv", :
Reached total allocation of 16383Mb: see help(memory.size)
------- 更新2
根据建议的答案,我的第二次尝试如下
thefile2 <- read.csv.ffdf(file="C:/Users/a-vs/results_20160291.csv", header=TRUE, VERBOSE=TRUE,
+ first.rows=-1, next.rows=50000, colClasses=NA)
read.table.ffdf 1..
Error: cannot allocate vector of size 125.0 Mb
In addition: There were 14 warnings (use warnings() to see them)
如何将这个文件读入单个对象,以便一次分析整个数据
-----------------更新 3
我们买了一台昂贵的机器。它有 10 个内核和 256 GB 内存。这不是最有效的解决方案,但至少在不久的将来会奏效。我看了下面的答案,但我认为它们不能解决我的问题:(我很欣赏这些答案。我想执行市场篮子分析,我认为除了将我的数据保存在 RAM 中之外别无他法
【问题讨论】:
-
你能指定你打算如何处理数据吗?特别是如果您的第一步是汇总它们或仅使用一些变量?
ff是一个解决方案,但相关性取决于您将做什么。另一种选择是例如结合ff读取然后存储在数据库中 - 您可能对 MonetDB 中的这方面感兴趣,并入MonetDBLite包 -
请告诉我们文件中的行数和列数。
-
@EricLecoutre 我计划探索数据。一旦我绘制并更好地理解它,我可能会删除一些行和/或列
-
@user1436187 36 列和 47,368,186 行...