从 Pandas 数据帧转换为 TensorFlow 张量对象答案

【问题标题】：Converting from Pandas dataframe to TensorFlow tensor object从 Pandas 数据帧转换为 TensorFlow 张量对象
【发布时间】：2017-07-06 07:59:25
【问题描述】：

我还是 Python、机器学习和 TensorFlow 的新手，但我会尽我最大的努力直接跳入正题。不过我可以使用一些帮助。

我的数据当前位于 Pandas 数据框中。如何将其转换为 TensorFlow 对象？我试过了

dataVar_tensor = tf.constant(dataVar)
depth_tensor = tf.constant(depth)

但是，我收到错误[15780 rows x 9 columns] - got shape [15780, 9], but wanted []。

我确信这可能是一个简单的问题，但我真的可以使用帮助。

非常感谢

ps。我在 Windows 10 上使用 Anaconda Python 3.5 运行 tensorflow 0.12

【问题讨论】：

你想用这些数据做什么？它是您要训练的神经网络的输入吗？从错误消息看来，常量只需要一个常量，所以是整数或浮点数，而不是矩阵
@rAyyy 是的，我的计划是最终将其输入到神经网络中。目前，我只是试图从教程中获取 MNIST 示例，并使其适用于我自己的数据。我正在使用 pandas.read_csv() 从 csv 文件中读取

标签： python pandas tensorflow

【解决方案1】：

我已使用 df.values 将我的 Pandas 数据帧转换为 Numpy 数组

现在，使用

dataVar_tensor = tf.constant(dataVar, dtype = tf.float32, shape=[15780,9])
depth_tensor = tf.constant(depth, 'float32',shape=[15780,1])

似乎有效。我不能肯定地说它确实如此，因为我还有其他障碍需要克服才能让我的代码正常工作，但希望这是朝着正确方向迈出的一步。感谢您的所有帮助

顺便说一句，我在下一个问题Converting TensorFlow tutorial to work with my own data

中继续尝试让教程处理我自己的数据

【讨论】：

我将熊猫系列 (y_train) 的整数转换为张量，然后转换为 one_hot，如下所示：dataVar_tensor = tf.Variable(y_train.as_matrix(), dtype = tf.int32) result = tf. one_hot(dataVar_tensor, depth)
pandas.DataFrame.values 确实是TensorFlow教程tensorflow.org/tutorials/load_data/…上的建议

【解决方案2】：

这是我发现的一种适用于 Google Colab 的解决方案：

import pandas as pd
import tensorflow as tf
#Read the file to a pandas object
data=pd.read_csv('filedir')
#convert the pandas object to a tensor
data=tf.convert_to_tensor(data)
type(data)

这将打印如下内容：

tensorflow.python.framework.ops.Tensor

【讨论】：

【解决方案3】：

以下基于numpy数组输入数据很容易工作：

import tensorflow as tf
import numpy as np
a = np.array([1,2,3])
with tf.Session() as sess:
    tf.global_variables_initializer().run()

    dataVar = tf.constant(a)
    print(dataVar.eval())

-> [1 2 3]

不要忘记启动您的张量对象session 和run() 或eval() 以查看其内容；否则它只会给你它的通用描述。

我怀疑由于您的数据位于 DataFrame 而不是简单的数组中，因此您需要尝试使用您当前未指定的 shape parameter ，以帮助它理解DataFrame 的维度并处理其索引等？

【讨论】：

谢谢。我正在运行一个 InteractiveSession，我尝试了几种不同的 dataVar_tensor = tf.constant(dataVar, dtype = tf.float32, shape=[15780,9]) 变体，但到目前为止没有运气

【解决方案4】：

您可以将数据框列转换为张量对象，如下所示：

tf.constant((df['column_name']))

这应该会返回一个看起来像这样的张量变量：

<tf.Tensor: id=275634, shape=(48895,), dtype=float64, numpy=
array([1, 2, ...])>

此外，您可以根据需要添加任意数量的数据框列，如下所示：

tf.constant(([cdf['column1'], cdf['column2']]))

希望这会有所帮助。

【讨论】：

【解决方案5】：

hottbox.pdtools.utils（HOTTBOX API 的 Pandas 集成工具）提供功能

   pd_to_tensor(df[, keep_index])
   tensor_to_pd(tensor[, col_name])

用于双向转换。

【讨论】：

【解决方案6】：

您可以在make_input_fn(X, y, num_epochs) 函数中使用tf.estimator.inputs.pandas_input_fn。但是，我还没有设法让它与多索引一起使用。我通过使用 df.reset_index(drop=True) 将其转换为标准整数索引来解决此问题

【讨论】：