使用 AllRowsReader 读取行但从特定行开始答案

【问题标题】：Reading rows using AllRowsReader but starting from a specific row使用 AllRowsReader 读取行但从特定行开始
【发布时间】：2015-08-27 06:06:11
【问题描述】：

我有一个批处理作业，它使用 AllRowsReader 在 Cassandra 中读取大约 3300 万行，如 in the Astyanax wiki 所述：

new AllRowsReader.Builder<>(getKeyspace(), columnFamily)
            .withPageSize(100)
            .withIncludeEmptyRows(false)
            .withConcurrencyLevel(1)
            .forEachRow(
                row -> {
                    try {
                        return processRow(row);
                    } catch (Exception e) {
                        LOG.error("Error while processing row!", e);
                        return false;
                    }
                }
            )
            .build()
            .call();

如果某种错误停止了批处理作业，我希望能够从它停止的行继续读取，这样我就不必再次从第一行开始读取。有没有快速简单的方法来做到这一点？

或者AllRowsReader 不适合这种任务？

【问题讨论】：

标签： java cassandra astyanax

【解决方案1】：

由于没有人回答，让我试试这个。 Cassandra 使用分区器来确定应该将行放置在哪个节点中。主要有两种类型的分区器： 1) 已订购 2) 无序

https://docs.datastax.com/en/cassandra/2.2/cassandra/architecture/archPartitionerAbout.html

在 Ordered Partitioner 的情况下，行是根据字典顺序放置的。但是在 Unordered Partitioner 的情况下，您无法知道顺序。

有序分区器在 cassandra 中被视为反模式，因为它使集群分布变得相当困难。 https://docs.datastax.com/en/cassandra/2.2/cassandra/planning/planPlanningAntiPatterns.html

我假设您应该在代码中使用无序分区器。所以目前没有办法告诉 cassandra 哪个正在使用从该特定行开始的无序分区器。

我希望这能回答你的问题

【讨论】：