TensorFlow-hub 文本模块预处理答案

【问题标题】：Tensorflow-hub Text-Module PreprocessingTensorFlow-hub 文本模块预处理
【发布时间】：2018-04-30 16:55:25
【问题描述】：

我正在使用 tensorflow-hub 上提供的新模块（我真的很喜欢 - 谢谢）。

我不清楚的是，在输入句子时应该进行的预处理。模块documentation 说，在预处理步骤中，inputj 句子在空格处被分割。

但是，当我运行以下程序时，我只得到一个向量：

with tf.device("/cpu:0"):
  embed = hub.Module("https://tfhub.dev/google/nnlm-en-dim128/1")

global_step1 = tf.train.get_or_create_global_step()
with tf.device("/cpu:0"):
  embeddings = embed({"default": ["Cat sat on mat"]})

with tf.train.MonitoredTrainingSession(is_chief=True) as sess:
  message_embeddings_cat = sess.run(embeddings)
  print(message_embeddings_cat.shape) # (result: (1, 128))

如何获得每个单词的嵌入，单个向量代表什么？句子的固定维度表示，Unknown-Word嵌入还是其他？

提前致谢！

编辑：结果似乎是使用tf.nn.embedding_lookup_sparse 创建的组合嵌入。（感谢@svsgoogle 的确认）

【问题讨论】：

标签： tensorflow tensorflow-hub

【解决方案1】：

是的，输出表示整个句子的固定维度表示。

您还可以嵌入单个单词来获取它们的向量。在你的情况下：

embeddings = embed({"default": ["Cat", "sat", "on", "mat"]})

应该给你一个形状为 (4, 128) 的结果。

【讨论】：