如何从 DynamoDB 获取每个主分区键的最新数据？答案

【问题标题】：How to get most recent data from DynamoDB for each primary partition key?如何从 DynamoDB 获取每个主分区键的最新数据？
【发布时间】：2019-05-14 18:10:37
【问题描述】：

我在 dynamodb 中有一张桌子。它存储帐户统计信息。帐户统计信息可能每天会更新数次。所以表记录可能看起来像：

+------------+--------------+-------+-------+
| account_id | record_id    | views | stars |
+------------+--------------+-------+-------+
| 3          | 2019/03/16/1 | 29    | 3     |
+------------+--------------+-------+-------+
| 2          | 2019/03/16/2 | 130   | 21    |
+------------+--------------+-------+-------+
| 1          | 2019/03/16/3 | 12    | 2     |
+------------+--------------+-------+-------+
| 2          | 2019/03/16/1 | 57    | 12    |
+------------+--------------+-------+-------+
| 1          | 2019/03/16/2 | 8     | 2     |
+------------+--------------+-------+-------+
| 1          | 2019/03/16/1 | 3     | 0     |
+------------+--------------+-------+-------+

account_id 是主分区键。 record_id 是主排序键

如何只获取每个account_ids 的最新记录？所以从上面的例子中我希望得到：

+------------+--------------+-------+-------+
| account_id | record_id    | views | stars |
+------------+--------------+-------+-------+
| 3          | 2019/03/16/1 | 29    | 3     |
+------------+--------------+-------+-------+
| 2          | 2019/03/16/2 | 130   | 21    |
+------------+--------------+-------+-------+
| 1          | 2019/03/16/3 | 12    | 2     |
+------------+--------------+-------+-------+

此数据便于用于报告目的。

【问题讨论】：

标签： amazon-web-services amazon-dynamodb

【解决方案1】：

如果您知道已存储在表中的account_ids 列表，这可以非常有效地完成。

在这种情况下，您需要做的就是一个一个地查询主键，使用ScanIndexForward=False 对值进行排序，并使用Limit=1 将结果限制为一项。

这是python中的代码

import boto3
import json

client = boto3.client('dynamodb')

account_ids = ['1', '2', '3']
results = []

for aid in account_ids:
    result = client.query(
        TableName='test-table',
        KeyConditionExpression="#aid = :aid",
        ExpressionAttributeNames={
            '#aid': 'account_id'
        },
        ExpressionAttributeValues={
            ':aid': {
                'N': aid
            }
        },
        ScanIndexForward=False,
        Limit=1,
    )
    results.append(result['Items'])

print(json.dumps(results, indent=2))

【讨论】：

在 ScanIndexForward=False & Limit=1 的查询中使用 account_id（在 KeyConditionExpression 中）虽然是解决方案，但我发现性能存在问题。即，当表随时间增长时，查询会读取与主键匹配的所有内容，然后过滤以获取最新记录，这可能会达到 dynamodb 中的 RCU 限制。有人遇到过/处理过吗？

【解决方案2】：

具有相同分区键的项目存储在同一分区中，并按其排序键排序。因此，如果您反向查询项目并将限制设置为 1，您将获得具有所需 account_id 和最大 record_id 的项目。

因此，对相关的account_id 发出查询，但指定Limit=1 和ScanIndexForward=False（或Reverse=True，具体取决于您使用的SDK/API）。

【讨论】：