【问题标题】:Why does the size of tensorflow model files depend on the size of the dataset?为什么 tensorflow 模型文件的大小取决于数据集的大小?
【发布时间】:2019-07-26 22:14:50
【问题描述】:

我保存的模型在 10K 句子的数据集上训练后的 .index、.meta 和 .data 文件的大小分别为 3KB、58MB 和 375MB

保持网络架构不变,并在 100K 句子的数据集上进行训练,文件大小分别为 3KB、139MB 和 860MB

我认为这表明大小取决于数据集的大小。 According to this answer,文件的大小应该与数据集的大小无关,因为神经网络的架构是相同的。

为什么大小差别这么大?

我还想知道除了链接答案中提到的文件之外,这些文件还包含什么。

这些文件是否包含与训练历史相关的信息,例如每一步的损失值等?

【问题讨论】:

  • 没有代码就无法解释这一点,通常模型的大小与训练集中数据点的数量无关,但也许您的代码以某种方式耦合两者(如从训练集中学习到的词汇)。
  • 非常感谢!你是对的。 “嵌入”变量存储单词嵌入。无论如何,有没有办法从这些文件中获取每一步的训练损失?我还没有将摘要写入事件文件。

标签: tensorflow


【解决方案1】:
import tensorflow as tf
from tensorflow.python.training import checkpoint_utils as cp
cp.list_variables('./model.ckpt-12520')

运行上面的 sn -p 会得到以下输出

[('Variable', []), ('decoder/attention_wrapper/attention_layer/kernel', [600, 300]), ('decoder/attention_wrapper/attention_layer/kernel/Adam', [600, 300]), ('decoder/attention_wrapper/attention_layer/kernel/Adam_1', [600, 300]), ('decoder/attention_wrapper/bahdanau_attention/attention_b', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_b/Adam', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_b/Adam_1', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_g', []), ('decoder/attention_wrapper/bahdanau_attention/attention_g/Adam', []), ('decoder/attention_wrapper/bahdanau_attention/attention_g/Adam_1', []), ('decoder/attention_wrapper/bahdanau_attention/attention_v', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_v/Adam', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_v/Adam_1', [300]), ('decoder/attention_wrapper/bahdanau_attention/query_layer/kernel', [300, 300]), ('decoder/attention_wrapper/bahdanau_attention/query_layer/kernel/Adam', [300, 300]), ('decoder/attention_wrapper/bahdanau_attention/query_layer/kernel/Adam_1', [300, 300]), ('decoder/attention_wrapper/basic_lstm_cell/bias', [1200]), ('decoder/attention_wrapper/basic_lstm_cell/bias/Adam', [1200]), ('decoder/attention_wrapper/basic_lstm_cell/bias/Adam_1', [1200]), ('decoder/attention_wrapper/basic_lstm_cell/kernel', [900, 1200]), ('decoder/attention_wrapper/basic_lstm_cell/kernel/Adam', [900, 1200]), ('decoder/attention_wrapper/basic_lstm_cell/kernel/Adam_1', [900, 1200]), ('decoder/dense/kernel', [300, 49018]), ('decoder/dense/kernel/Adam', [300, 49018]), ('decoder/dense/kernel/Adam_1', [300, 49018]), ('decoder/memory_layer/kernel', [300, 300]), ('decoder/memory_layer/kernel/Adam', [300, 300]), ('decoder/memory_layer/kernel/Adam_1', [300, 300]), ('embeddings', [49018, 300]), ('embeddings/Adam', [49018, 300]), ('embeddings/Adam_1', [49018, 300]), ('loss/beta1_power', []), ('loss/beta2_power', []), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam_1', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam_1', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam_1', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam_1', [450, 600])]

我意识到 embeddings 变量正在存储单词嵌入,这会导致这些文件的大小增加

cp.load_variable('./model.ckpt-12520', 'embeddings')

【讨论】:

    【解决方案2】:

    培训摘要包含在您的活动文件中。

    【讨论】:

    • 我将模型保存为 saver.save(sess, "./saved_model/model.ckpt", global_step=step)。到目前为止,我没有任何事件文件。我必须手动在事件文件中写入损失吗?还是隐含地写入损失?
    猜你喜欢
    • 2018-02-21
    • 2015-07-25
    • 2019-10-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-09-20
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多