【问题标题】:Different result using deep learning in Keras在 Keras 中使用深度学习的不同结果
【发布时间】:2019-02-05 07:25:50
【问题描述】:

我在tutorial 之后使用 keras 中的深度神经网络进行文本分类,但是当我多次运行以下代码时,我得到了不同的结果。

例如,第一次运行的测试损失为0.88815,第二次运行为0.89030,略高。我想知道随机性从何而来?

import keras
from keras.datasets import reuters


(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)
word_index = reuters.get_word_index(path="reuters_word_index.json")



print('# of Training Samples: {}'.format(len(x_train)))
print('# of Test Samples: {}'.format(len(x_test)))

num_classes = max(y_train) + 1
print('# of Classes: {}'.format(num_classes))

index_to_word = {}
for key, value in word_index.items():
    index_to_word[value] = key

print(' '.join([index_to_word[x] for x in x_train[0]]))
print(y_train[0])


from keras.preprocessing.text import Tokenizer

max_words = 10000

tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


print(x_train[0])
print(len(x_train[0]))

print(y_train[0])
print(len(y_train[0]))


from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))



model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.metrics_names)

batch_size = 32
epochs = 3

history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.1)
score = model.evaluate(x_test, y_test, batch_size=batch_size, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

【问题讨论】:

    标签: machine-learning keras deep-learning


    【解决方案1】:

    我不检查 GPU,但检查 CPU 似乎无法像上面那样使用 Tensorflow 1 作为 Keras 后端来修复种子。因此,我们需要将 Tensorflow 1 更改为 Tensorflow 2。然后,固定种子将起作用。例如,这对我有用。

    import os
    import numpy as np
    import random as rn
    import tensorflow as tf
    
    os.environ['PYTHONHASHSEED']= '0'
    np.random.seed(1)
    rn.seed(1)
    tf.set_random_seed(1)
    

    【讨论】:

      【解决方案2】:

      Keras FAQ中提到的,添加如下代码:

      import numpy as np
      import tensorflow as tf
      import random as rn
      
      # The below is necessary in Python 3.2.3 onwards to
      # have reproducible behavior for certain hash-based operations.
      # See these references for further details:
      # https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
      # https://github.com/keras-team/keras/issues/2280#issuecomment-306959926
      
      import os
      os.environ['PYTHONHASHSEED'] = '0'
      
      # The below is necessary for starting Numpy generated random numbers
      # in a well-defined initial state.
      
      np.random.seed(42)
      
      # The below is necessary for starting core Python generated random numbers
      # in a well-defined state.
      
      rn.seed(12345)
      
      # Force TensorFlow to use single thread.
      # Multiple threads are a potential source of
      # non-reproducible results.
      # For further details, see: https://stackoverflow.com/questions/42022950/which-seeds have-to-be-set-where-to-realize-100-reproducibility-of-training-res
      
      session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, 
      inter_op_parallelism_threads=1)
      
      from keras import backend as K
      
      # The below tf.set_random_seed() will make random number generation
      # in the TensorFlow backend have a well-defined initial state.
      # For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
      
      tf.set_random_seed(1234)
      
      sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
      K.set_session(sess)
      
      # Rest of code follows ...
      

      【讨论】:

        【解决方案3】:

        这是 keras 的常见行为。请参阅 github 的 keras 存储库问题列表中的 this discussion

        例如,在fit function 中,第 9 个参数是关于洗牌的。默认情况下设置为 true。因此,在每个 epoch 中,数据在运行前都会被打乱。这会导致该值每次都发生变化。

        设置随机种子会有所帮助。但是,仍然不完全正确。

        【讨论】:

          【解决方案4】:

          如果您想每次都获得相同的结果,您需要添加一个随机种子。另见https://machinelearningmastery.com/reproducible-results-neural-networks-keras/

          这可以通过添加:

          from numpy.random import seed
          seed(42)
          

          如果您使用的是 Tensorflow 后端,您还需要添加:

          from tensorflow import set_random_seed
          set_random_seed(42)
          

          42 只是一个您可以随意选择的任意数字。这只是随机种子的一个常数,因此您将始终为您的权重获得相同的随机初始化。这将导致给你相同的结果。

          【讨论】:

            猜你喜欢
            • 2018-09-08
            • 1970-01-01
            • 1970-01-01
            • 2021-04-13
            • 2016-12-23
            • 2019-12-10
            • 1970-01-01
            • 2018-10-25
            • 2021-05-26
            相关资源
            最近更新 更多