【发布时间】:2019-07-29 03:36:44
【问题描述】:
我想在我的一个应用程序中使用 Sagemaker KMeans 内置算法。我在 S3 中有一个大型 CSV 文件(原始数据),我将其分成几个部分以便于清理。在我清理之前,我尝试使用它作为 Kmeans 的输入来执行训练工作,但是它不起作用。
我的清单文件:
[
{"prefix": "s3://<BUCKET_NAME>/kmeans_data/KMeans-2019-28-07-13-40-00-001/"},
"file1.csv",
"file2.csv"
]
我遇到的错误:
Failure reason: ClientError: Unable to read data channel 'train'. Requested content-type is 'application/x-recordio-protobuf'. Please verify the data matches the requested content-type. (caused by MXNetError) Caused by: [16:47:31] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.1620.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100: (Input Error) The header of the MXNet RecordIO record at position 0 in the dataset does not start with a valid magic number. Stack trace returned 10 entries: [bt] (0) /opt/amazon/lib/libaialgs.so(+0xb1f0) [0x7fb5674c31f0] [bt] (1) /opt/amazon/lib/libaialgs.so(+0xb54a) [0x7fb5674c354a] [bt] (2) /opt/amazon/lib/libaialgs.so(aialgs::iterator_base::Next()+0x4a6) [0x7fb5674cc436] [bt] (3) /opt/amazon/lib/libmxnet.so(MXDataIterNext+0x21) [0x7fb54ecbcdb1] [bt] (4) /opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7fb567a1e858] [bt] (5) /opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x15f) [0x7fb567a1d95f
我的问题是:是否可以仅在 GUI 中使用多个 CSV 文件作为 Sagemaker KMeans BuilIn 算法中的输入?如果可能,我应该如何格式化我的清单?
【问题讨论】:
标签: amazon-web-services amazon-sagemaker