【发布时间】:2017-07-10 07:45:41
【问题描述】:
我正在尝试复制 Chevalier 的 LSTM Human Activity Recognition 算法,但在尝试以 CSV 格式实现自己的数据时遇到了问题。 git中使用的格式是txt。我的 CSV 数据格式如下:
0.000995,8
0.020801,8
0.040977,8
0.060786,8
0.080970,8
... ...
可以在here找到原始文件。 x 值(时间)位于第 0 列(-80.060003 等),y 值(值)位于第 1 列(8、8 等)。我尝试使用熊猫
pandas.read_csv(DATASET_PATH + TRAIN + "data_train.csv", skiprows=1, header=None, sep=',', usecols=[0, 1])
但它似乎与“准备数据集”部分(可能还有其他)中的数据格式不兼容:
TRAIN = "train/"
TEST = "test/"
# Load "X" (the neural network's training and testing inputs)
def load_X(X_signals_paths):
X_signals = []
for signal_type_path in X_signals_paths:
file = open(signal_type_path, 'r')
# Read dataset from disk, dealing with text files' syntax
X_signals.append(
[np.array(serie, dtype=np.float32) for serie in [
row.replace(' ', ' ').strip().split(' ') for row in file
]]
)
file.close()
return np.transpose(np.array(X_signals), (1, 2, 0))
X_train_signals_paths = [
DATASET_PATH + TRAIN + "Inertial Signals/" + signal + "train.txt" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
DATASET_PATH + TEST + "Inertial Signals/" + signal + "test.txt" for signal in INPUT_SIGNAL_TYPES
]
X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
# Load "y" (the neural network's training and testing outputs)
def load_y(y_path):
file = open(y_path, 'r')
# Read dataset from disk, dealing with text file's syntax
y_ = np.array(
[elem for elem in [
row.replace(' ', ' ').strip().split(' ') for row in file
]],
dtype=np.int32
)
file.close()
# Substract 1 to each output class for friendly 0-based indexing
return y_ - 1
y_train_path = DATASET_PATH + TRAIN + "y_train.txt"
y_test_path = DATASET_PATH + TEST + "y_test.txt"
y_train = load_y(y_train_path)
y_test = load_y(y_test_path)
这就是我通过 iPython3 实现的情况:
在[0]:
TRAIN = "train/"
TEST = "test/"
def load_X(X_signals_paths):
X_signals = []
for signal_type_path in X_signals_paths:
file = pandas.read_csv(DATASET_PATH + TRAIN + "data_train.csv", skiprows=1, header=None, sep=',', usecols=[0])
X_signals.append(
[np.array(serie, dtype=np.float32) for serie in [
str(row).replace(' ', ' ').strip().split(' ') for row in file
]]
)
return np.transpose(np.array(X_signals), (1, 2, 0))
_train_signals_paths = [
DATASET_PATH + TRAIN + signal + "train.csv" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
DATASET_PATH + TEST + signal + "test.csv" for signal in INPUT_SIGNAL_TYPES
]
X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
print(X_train, X_test)
输出[0]:
[[[ 0.]]] [[[ 0.]]]
我希望我能在正确格式化我的数据以与此算法无缝协作方面获得一些帮助。如果有任何问题,请告诉我。
【问题讨论】:
-
请提供错误信息
-
@Paddy 让我在我的问题中添加一些内容,它应该会有所帮助
-
您的
pandas_read()在此处使用您提供的数据:[892 rows x 2 columns] -
@tripleee 它“有效”。但是看看输出。
-
@Paddy 编辑在最后
标签: python python-3.x csv machine-learning tensorflow