【发布时间】:2020-11-04 12:09:46
【问题描述】:
我正在学习如何使用 tensorflow 2.0 并尝试使用 keras.estimator.model_to_estimator 将 keras 模型转换为估计器模型。
我使用来自https://storage.googleapis.com/tf-datasets/titanic/train.csv 和https://storage.googleapis.com/tf-datasets/titanic/eval.csv 的泰坦尼克数据集作为示例。
在 keras 模型中,我使用 DenseFeatures 层 keras.layers.DenseFeatures() 自动将分类特征转换为 one-hot 编码。幸运的是,我只能使用 model.fit() 来训练我的模型。
但是,当我尝试使用estimator = keras.estimator.model_to_estimator(model) 并使用estimator.train() 训练模型时,程序会报错ValueError: Unexpectedly found an instance of type `<class 'dict'>`. Expected a symbolic tensor instance.
我认为是因为函数make_dataset()使用dict()导致错误,但我不知道如何修改代码来修复错误。
完整代码如下
import tensorflow as tf
import matplotlib as mpl
import numpy as np
import sklearn
import pandas as pd
import os
import sys
from tensorflow import keras
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
print(sys.version_info)
for module in mpl, np, pd, sklearn, tf, keras:
print(module.__name__, module.__version__)
# https://storage.googleapis.com/tf-datasets/titanic/train.csv
# https://storage.googleapis.com/tf-datasets/titanic/eval.csv
train_file = './data/titanic/train.csv'
eval_file = './data/titanic/eval.csv'
train_df = pd.read_csv(train_file)
eval_df = pd.read_csv(eval_file)
train_y = train_df.pop('survived')
eval_y = eval_df.pop('survived')
categorical_columns = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck', 'embark_town', 'alone']
numeric_columns = ['age', 'fare']
feature_columns = []
for categorical_column in categorical_columns:
vocab = train_df[categorical_column].unique()
feature_columns.append(
tf.feature_column.indicator_column(
tf.feature_column.categorical_column_with_vocabulary_list(categorical_column, vocab)
)
)
for numeric_column in numeric_columns:
feature_columns.append(tf.feature_column.numeric_column(numeric_column))
def make_dataset(data_df, label_df, epochs=10, shuffle=True, batch_size=32):
dataset = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
if shuffle:
dataset = dataset.shuffle(10000)
dataset = dataset.repeat(epochs).batch(batch_size)
return dataset
train_dataset = make_dataset(train_df, train_y, epochs=100, batch_size=5)
for x, y in train_dataset.take(1):
print(keras.layers.DenseFeatures(feature_columns, dtype=tf.float32)(x).numpy())
model = keras.models.Sequential([
keras.layers.DenseFeatures(feature_columns, dtype=tf.float32),
keras.layers.Dense(100, activation='relu', dtype=tf.float64),
keras.layers.Dense(100, activation='relu', dtype=tf.float64),
keras.layers.Dense(2, activation='softmax', dtype=tf.float64)
])
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.Adam(), metrics=['accuracy'])
batch_size = 32
train_dataset = make_dataset(train_df, train_y, epochs=100, batch_size=batch_size)
eval_dataset = make_dataset(eval_df, eval_y, epochs=1, shuffle=False, batch_size=batch_size)
# 1. model.fit()
history = model.fit(
train_dataset,
validation_data=eval_dataset,
steps_per_epoch=627 // batch_size,
validation_steps=264 // batch_size,
epochs=100
)
# 2. model -> estimator -> train
estimator = keras.estimator.model_to_estimator(model)
estimator.train(input_fn=lambda: make_dataset(train_df, train_y, epochs=100))
各个库的版本如下
matplotlib 3.2.0
numpy 1.16.3
pandas 1.0.1
sklearn 0.23.1
tensorflow 2.2.0
tensorflow.keras 2.3.0-tf
【问题讨论】:
标签: python tensorflow keras deep-learning tensorflow2.0