【问题标题】:Export DynamoDB to S3 AWS Data Pipeline in us-east-2在 us-east-2 中将 DynamoDB 导出到 S3 AWS Data Pipeline
【发布时间】:2023-03-30 07:40:01
【问题描述】:

我想将 dynamodb 表备份(然后导入)到 S3。 dynamodb 表存在于 us-east-2 中,但这是 aws 数据管道不受支持的区域。 AWS 文档似乎表明这应该不是问题,但我似乎无法让数据管道在 us-east-2 中查找表。

这是我的数据管道的导出。当我运行它时,在查找 dynamodb 表时出现“找不到资源错误”。如果我在运行此数据管道的 us-west-2 中临时创建一个具有相同名称的表,则该作业可以工作,但会从 us-west-2 而不是 us-east-2 中的表中提取数据。有什么方法可以让这个工作从配置中指定的区域拉出来?

{
  "objects": [
    {
      "readThroughputPercent": "#{myDDBReadThroughputRatio}",
      "name": "DDBSourceTable",
      "id": "DDBSourceTable",
      "type": "DynamoDBDataNode",
      "region": "us-east-2",
      "tableName": "#{myDDBTableName}"
    },
    {
      "period": "6 Hours",
      "name": "Every 6 hours",
      "id": "DefaultSchedule",
      "type": "Schedule",
      "startAt": "FIRST_ACTIVATION_DATE_TIME"
    },
    {
      "bootstrapAction": "s3://us-west-2.elasticmapreduce/bootstrap-actions/configure-hadoop, --yarn-key-value,yarn.nodemanager.resource.memory-mb=11520,--yarn-key-value,yarn.scheduler.maximum-allocation-mb=11520,--yarn-key-value,yarn.scheduler.minimum-allocation-mb=1440,--yarn-key-value,yarn.app.mapreduce.am.resource.mb=2880,--mapred-key-value,mapreduce.map.memory.mb=5760,--mapred-key-value,mapreduce.map.java.opts=-Xmx4608M,--mapred-key-value,mapreduce.reduce.memory.mb=2880,--mapred-key-value,mapreduce.reduce.java.opts=-Xmx2304m,--mapred-key-value,mapreduce.map.speculative=false",
      "name": "EmrClusterForBackup",
      "coreInstanceCount": "1",
      "coreInstanceType": "m3.xlarge",
      "amiVersion": "3.9.0",
      "masterInstanceType": "m3.xlarge",
      "id": "EmrClusterForBackup",
      "region": "us-west-2",
      "type": "EmrCluster",
      "terminateAfter": "1 Hour"
    },
    {
      "directoryPath": "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
      "name": "S3BackupLocation",
      "id": "S3BackupLocation",
      "type": "S3DataNode"
    },
    {
      "output": {
        "ref": "S3BackupLocation"
      },
      "input": {
        "ref": "DDBSourceTable"
      },
      "maximumRetries": "2",
      "name": "TableBackupActivity",
      "step": "s3://dynamodb-emr-us-west-2/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}",
      "id": "TableBackupActivity",
      "runsOn": {
        "ref": "EmrClusterForBackup"
      },
      "type": "EmrActivity",
      "resizeClusterBeforeRunning": "true"
    },
    {
      "failureAndRerunMode": "CASCADE",
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "resourceRole": "data_pipeline_etl_role",
      "pipelineLogUri": "s3://MY_S3_BUCKET/",
      "role": "data_pipeline_pipeline_role",
      "scheduleType": "cron",
      "name": "Default",
      "id": "Default"
    }
  ],
  "parameters": [
    {
      "description": "Output S3 folder",
      "id": "myOutputS3Loc",
      "type": "AWS::S3::ObjectKey"
    },
    {
      "description": "Source DynamoDB table name",
      "id": "myDDBTableName",
      "type": "String"
    },
    {
      "default": "0.25",
      "watermark": "Enter value between 0.1-1.0",
      "description": "DynamoDB read throughput ratio",
      "id": "myDDBReadThroughputRatio",
      "type": "Double"
    },
    {
      "default": "us-east-1",
      "watermark": "us-east-1",
      "description": "Region of the DynamoDB table",
      "id": "myDDBRegion",
      "type": "String"
    }
  ],
  "values": {
    "myDDBRegion": "us-east-2",
    "myDDBTableName": "prod--users",
    "myDDBReadThroughputRatio": "0.25",
    "myOutputS3Loc": "s3://MY_S3_BUCKET"
  }
}

【问题讨论】:

    标签: amazon-web-services amazon-s3 amazon-dynamodb


    【解决方案1】:

    是一次性的还是你想要持续做的事情?您能否使用 DynamoDB 全局表在支持的区域中复制该表,然后在备份完成后删除该区域?

    全局表复制是免费的。您只需为其复制表启动并运行时的容量付费。

    https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GlobalTables.html

    【讨论】:

    • 这是一个有趣的解决方法。我想知道我想要完成的事情是否可以以一种不那么迂回的方式来完成。
    • 请注意,您将无法使用此方法导出包含数据的现有表,因为表必须为空才能配置为全局。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-12-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多