【发布时间】:2019-10-08 09:04:36
【问题描述】:
我正在尝试将 udf 函数应用于由字符串组成的数据框列。函数使用 TensorFlow GUSE 并将字符串转换为浮点数组。
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import tf_sentencepiece
# Graph set up.
g = tf.Graph()
with g.as_default():
text_input = tf.placeholder(dtype=tf.string, shape=[None])
embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/1")
embedded_text = embed(text_input)
init_op = tf.group([tf.global_variables_initializer(), tf.tables_initializer()])
g.finalize()
# Initialize session.
session = tf.Session(graph=g)
session.run(init_op)
def embed_mail(x):
embedding = session.run(embedded_text, feed_dict={text_input:[x]})
embedding = flatten(embedding)
result = [np.float32(i).item() for i in embedding]
return result
但每当我尝试使用以下方式运行此功能时:
embed_mail_udf = udf(embed_mail, ArrayType(FloatType()))
df = df.withColumn('embedding',embed_mail_udf(df.text))
我不断收到错误消息:无法序列化对象:TypeError:无法腌制 SwigPyObject 对象。我做错了什么?
【问题讨论】:
标签: apache-spark pyspark user-defined-functions pickle