【问题标题】:Amazon Athena - S3 location errorAmazon Athena - S3 位置错误
【发布时间】:2021-03-07 02:02:47
【问题描述】:

我在 S3 存储桶上运行 Amazon Athena 查询时遇到错误。

我正在对 CloudFront 访问日志运行此查询。

CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront.cf_logs (
  `date` date,
  `time` string,
  `location` string,
  `bytes` int,
  `requestip` string,
  `method` string,
  `host` string,
  `uri` string,
  `status` int,
  `referrer` string,
  `os` string,
  `browser` string,
  `browserversion` string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://cloudfront-access/test-sh/'
TBLPROPERTIES ('has_encrypted_data'='false');

返回错误:

Your query has the following error(s):

The S3 location provided to save your query results is invalid. Please
check your S3 location is correct and is in the same region and try
again. If you continue to see the issue, contact customer support for
further assistance. (Service: AmazonAthena; Status Code: 400; Error
Code: InvalidRequestException; Request ID:
f8cd2762-1e7-a2f9-e5eb1d865406)

【问题讨论】:

    标签: amazon-web-services amazon-s3 amazon-athena


    【解决方案1】:

    Amazon Athena 将每个查询的输出保存在 Amazon S3 存储桶中。错误消息是说 Athena 无法访问此存储桶。

    • 点击屏幕顶部的Settings链接
    • 验证是否显示了存储桶名称(如果您愿意,可以随意更改)
    • 在 Amazon S3 管理控制台中验证该名称的存储桶是否存在于同一区域中。如果没有,请创建存储桶。

    【讨论】:

      【解决方案2】:

      当您运行 Data Definition Language (DDL) 时,您实际上会向 S3 生成输出,就像您运行 DML 一样。

      例如,请参阅以下使用 AWS CLI 说明的示例:

      1. 为此示例创建一个存储桶

        $ aws s3 mb s3://athena-covid/

      2. 获取一些数据(来源:Covid Tracking Project

        $ wget -o tx.csv https://api.covidtracking.com/v1/states/tx/daily.csv

      3. 将这些数据上传到 S3

        $ aws s3 cp daily.csv s3://athena-covid/src/daily.csv

      4. 现在运行一些 DDL

        $ aws athena start-query-execution --result-configuration OutputLocation=s3://athena-covid/out/ --query-string "$(cat ddl.sql)"

        返回

        { "QueryExecutionId": "427fd5d0-02cf-49e6-82eb-0c25aae46e80" }

      5. 即使ddl.sql 中的查询没有返回结果集,它仍然在上面-result-configuration 中指定的输出位置生成了一个空文本文件。

        $ aws s3 ls s3://athena-covid/out/

        返回

        2021-03-06 12:52:25 0 427fd5d0-02cf-49e6-82eb-0c25aae46e80.txt

        注意 0 显示 S3 中对象的大小

      6. 当然,如果我们运行正常的 DML,我们会得到一个实际的结果集。

        $ aws athena start-query-execution --result-configuration OutputLocation=s3://athena-covid/out/ --query-string "SELECT data_date, state, positive, negative FROM default.tx_covid LIMIT 10"

        返回:

        { "QueryExecutionId": "77b548ee-4724-4716-9b3a-95acbb8bb275" }

        还有一个包含一些数据的 csv。

        $ aws s3 ls s3://athena-covid/out/77b548ee-4724-4716-9b3a-95acbb8bb275.csv

        返回

         2021-03-06 12:57:00        312 77b548ee-4724-4716-9b3a-95acbb8bb275.csv
         2021-03-06 12:57:00        213 77b548ee-4724-4716-9b3a-95acbb8bb275.csv.metadata
        

      我希望以上内容能够说明 Athena 工作原理的一些概念。所有查询都有一个OutputLocation

      仅供参考...DDL 下面

      CREATE EXTERNAL TABLE default.tx_covid (
        data_date STRING,
        state STRING,
        positive INTEGER,
        probableCases INTEGER,
        negative INTEGER,
        pending INTEGER,
        totalTestResultsSource STRING,
        totalTestResults INTEGER,
        hospitalizedCurrently INTEGER,
        hospitalizedCumulative INTEGER,
        inIcuCurrently INTEGER,
        inIcuCumulative INTEGER,
        onVentilatorCurrently INTEGER,
        onVentilatorCumulative INTEGER,
        recovered INTEGER,
        lastUpdateEt INTEGER,
        dateModified INTEGER,
        checkTimeEt INTEGER,
        death INTEGER,
        hospitalized INTEGER,
        hospitalizedDischarged INTEGER,
        dateChecked STRING,
        totalTestsViral INTEGER,
        positiveTestsViral INTEGER,
        negativeTestsViral INTEGER,
        positiveCasesViral INTEGER,
        deathConfirmed INTEGER,
        deathProbable INTEGER,
        totalTestEncountersViral INTEGER,
        totalTestsPeopleViral INTEGER,
        totalTestsAntibody INTEGER,
        positiveTestsAntibody INTEGER,
        negativeTestsAntibody INTEGER,
        totalTestsPeopleAntibody INTEGER,
        positiveTestsPeopleAntibody INTEGER,
        negativeTestsPeopleAntibody INTEGER,
        totalTestsPeopleAntigen INTEGER,
        positiveTestsPeopleAntigen INTEGER,
        totalTestsAntigen INTEGER,
        positiveTestsAntigen INTEGER,
        fips STRING,
        positiveIncrease INTEGER,
        negativeIncrease INTEGER,
        total INTEGER,
        totalTestResultsIncrease INTEGER,
        posNeg INTEGER,
        dataQualityGrade INTEGER,
        deathIncrease INTEGER,
        hospitalizedIncrease INTEGER,
        hash STRING,
        commercialScore INTEGER,
        negativeRegularScore INTEGER,
        negativeScore INTEGER,
        positiveScore INTEGER,
        score INTEGER,
        grade INTEGER
      )
      ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ','
        ESCAPED BY '\\'
        LINES TERMINATED BY '\n'
        LOCATION 's3://athena-covid/src/'
        TBLPROPERTIES ('skip.header.line.count'='1')
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-12-02
        • 2017-05-04
        • 1970-01-01
        • 2013-05-03
        • 1970-01-01
        • 2017-06-11
        • 1970-01-01
        • 2021-12-04
        相关资源
        最近更新 更多