【发布时间】:2021-11-04 10:02:44
【问题描述】:
我在 Amazon SageMaker 中构建了模型,代码附在下面。 现在我希望能够将新数据上传到 s3 并基于此模型获得预测,而无需每次都重新计算。
sess = sagemaker.Session()
bucket = "innogy-bda-germany-dev-landing-dc3-retailpl"
prefix = "sagemaker/xgboost-upsell"
role = get_execution_role()
container = sagemaker.image_uris.retrieve("xgboost", boto3.Session().region_name, "latest")
display(container)
train_path = 's3://innogy-bda-germany-dev-landing-dc3-retailpl/UPSELL/LIST/train.csv'
test_path = 's3://innogy-bda-germany-dev-landing-dc3-retailpl/UPSELL/LIST/validation.csv'
s3_input_train = sagemaker.TrainingInput(s3_data=train_path, content_type='csv')
s3_input_test = sagemaker.TrainingInput(s3_data=test_path, content_type='csv')
sess = sagemaker.Session()
xgb = sagemaker.estimator.Estimator(
container,
role,
instance_count=1,
instance_type="ml.m5.4xlarge",
output_path="s3://innogy-bda-germany-dev-landing-dc3-retailpl/UPSELL/LIST/output",
sagemaker_session=sess,
)
xgb.set_hyperparameters(
alpha= 1.340343927865692,
colsample_bytree= 0.525162855476281,
eta= 0.06451533130134757,
gamma= 0.9683995477068462,
max_depth= 10,
min_child_weight= 3.851108988963441,
num_round= 987,
subsample= 0.8725573749114485,
silent=0,
objective="binary:logistic",
early_stopping_rounds=50,
)
xgb.fit({"train": s3_input_train, "validation": s3_input_validation})
我要一个代码示例,现在如何将此模型从 s3 提取到新笔记本并使用它来预测新数据。
另外,我想知道为什么您在使用 sagemaker 中内置的 xgboost 模型时不丢弃目标变量,因为在对新集合进行预测时,我不会知道目标。
train_data, validation_data, test_data = np.split(df_smote.sample(frac=1, random_state=1729),[int(0.7 * len(df_smote)), int(0.9 * len(df_smote))],)
【问题讨论】:
标签: python amazon-web-services machine-learning amazon-sagemaker aws-code-deploy