是否可以增加 python 进程正在使用的 RAM 量答案

【问题标题】：Is it possible to increase the amount of RAM a python process is using是否可以增加 python 进程正在使用的 RAM 量
【发布时间】：2011-09-15 00:39:24
【问题描述】：

我正在一个具有 64GB RAM 的 Windows 服务器上运行分类/特征提取任务，不知何故，python 认为我的内存不足：

misiti@fff /cygdrive/c/NaiveBayes
$ python run_classify_comments.py > tenfoldcrossvalidation.txt
Traceback (most recent call last):
  File "run_classify_comments.py", line 70, in <module>
    run_classify_comments()
  File "run_classify_comments.py", line 51, in run_classify_comments
    NWORDS = get_all_words("./data/HUGETEXTFILE.txt")
  File "run_classify_comments.py", line 16, in get_all_words
    def get_all_words(path): return words(file(path).read())
  File "run_classify_comments.py", line 15, in words
    def words(text): return re.findall('[a-z]+', text.lower())
  File "C:\Program Files (x86)\Python26\lib\re.py", line 175, in findall
    return _compile(pattern, flags).findall(string)
MemoryError

所以 re 模块因 64 GB 的 RAM 而崩溃...我不这么认为... 为什么会发生这种情况，如何配置 python 以使用我机器上的所有可用 RAM？

【问题讨论】：

您的 Windows 版本是 64 位吗？你的 Python 版本是 64 位的吗？你检查过进程实际使用了多少内存吗？
Program Files (x86) 建议 windows 是 64 位的，但 python 不是

标签： python regex nltk

【解决方案1】：

只需重写您的程序，以一次一行地读取巨大的文本文件。只需将get_all_words(path) 更改为：

def get_all_words(path):
    return sum((words(line) for line in open(path))

注意括号中的生成器的使用，它是惰性的，将由 sum 函数按需评估。

【讨论】：

【解决方案2】：

在我看来，问题似乎在于使用 re.findall() 将整个文本作为单词列表读取到内存中。您是否正在以这种方式阅读超过 64GB 的文本？根据您的 NaiveBayes 算法的实现方式，您可能会更好地逐步构建频率字典，以便仅将字典保存在内存中（而不是整个文本）。有关您的实施的更多信息可能有助于更直接地回答您的问题。

【讨论】：

我实际上是通过在生成特征的循环中调用“del”来修复它（在交叉验证期间）