内存错误张量流答案

【问题标题】：MemoryError tensorflow内存错误张量流
【发布时间】：2017-04-25 16:36:58
【问题描述】：

我正在从 P2.xlarge 类型的 AWS 实例运行此模型。它给出了一个错误：

Exception in thread Thread-16:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/tensorflow/models/summarization/textsum/batch_reader.py" , line 136, in _FillInputQueue
(article, abstract) = input_gen.next()
File "/home/ubuntu/tensorflow/models/summarization/textsum/batch_reader.py", line 245, in _TextGenerator
e = example_gen.next()
File "/home/ubuntu/tensorflow/models/summarization/textsum/data.py", line 109, in ExampleGen
example_str = struct.unpack('%ds' % str_len, reader.read(str_len))[0]
MemoryError

系统存储信息是-

Filesystem Size Used Avail Use% Mounted on
udev 30G 0 30G 0% /dev
tmpfs 6.0G 8.9M 6.0G 1% /run
/dev/xvda1 30G 12G 18G 39% /
tmpfs 30G 0 30G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 30G 0 30G 0% /sys/fs/cgroup
tmpfs 6.0G 0 6.0G 0% /run/user/1000

NVIDIA 状态-

ubuntu@ip-172-31-28-161:~$ lspci | grep -i nvidia

00:1e.0 3D 控制器：NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

解决办法是什么？

如果我将 str_len = struct.unpack('q', len_bytes)[0] 替换为 str_len = struct.unpack('Bi', len_bytes)[0] 然后这个错误消失，新的错误出现：

Exception in thread Thread-15:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/mindstix/bazel/models/Summarizer/textsum/batch_reader.py", line 136, in _FillInputQueue
(article, abstract) = input_gen.next()
File "/home/mindstix/bazel/models/Summarizer/textsum/batch_reader.py", line 248, in _TextGenerator
article_text = self._GetExFeatureText(e, self._article_key)
File "/home/mindstix/bazel/models/Summarizer/textsum/batch_reader.py", line 265, in _GetExFeatureText
return ex.features.feature[key].bytes_list.value[0]
IndexError: list index (0) out of range

如果我打印example_str，则该值将显示在屏幕上。但是当我尝试打印 ex.features.feature[key].bytes_list.value 时，它返回空白。

应该怎么做才能解决这一切？

这是我正在遵循的代码步骤：

>>> import tensorflow as tf
>>> import struct
>>>from tensorflow.core.example import example_pb2
>>> reader = open('data/training-1', 'rb')
>>> len_bytes = reader.read(8)
>>> str_len = struct.unpack('q', len_bytes)[0]
>>> str_len
2335523720558635124
>>> example_str = struct.unpack('%ds' % str_len, reader.read(str_len))[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError

>>> str_len = struct.unpack('Bi', len_bytes)[0]
>>> str_len
116

>>> example_str = struct.unpack('%ds' % str_len, reader.read(str_len))[0]
>>>e = example_pb2.Example.FromString(example_str)
>>> e.features.feature['article'].bytes_list.value
<google.protobuf.pyext._message.RepeatedScalarContainer object at  0x7fc25c9325a8>

>>> e.features.feature['article'].bytes_list.value[0] 
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index (0) out of range

【问题讨论】：

如果没有其余代码作为上下文，很难说什么。你能把它浓缩成一个最小但可运行的例子吗？
@AllenLavoie 我已经用我尝试使用 tensorflow 运行的示例代码更新了这个问题。
所以文章功能是空的？有理由认为它不应该吗？仅打印整个示例 (print(e)) 以查看解析的内容可能很有用。也不确定struct 的用法是怎么回事：也许TFRecord 格式会是更稳定的存储格式？
你能告诉我，处理这个问题的更好方法是什么？

标签： python python-2.7 amazon-web-services tensorflow p2

【解决方案1】：

我遇到了同样的问题。但原因是我使用原始文本文件进行测试。应该使用传输的二进制文件。不知道你的情况是否跟我一样。

【讨论】：

我的错误已解决。该错误是因为输入文件到 tensorflow/textsum 的二进制格式不正确。 example_str = struct.unpack('%ds' % str_len, reader.read(str_len))[0] read() 函数需要值来读取该字节的数据。我传递了无效的大小。因此，e.features.feature['article'].bytes_list.value 一无所有。那个时候对象是空白的。我试图将文本格式转换为 tensorflow 可接受的格式。使用 [github.com/surmenok/TextSum/blob/master/….
@LeenaBharambe 你能提供任何帮助吗，因为我指的是相同的代码并且卡在同一行。提前致谢