HBase 中基于时间戳的扫描？答案

【问题标题】：Timestamp Based Scans in HBase?HBase 中基于时间戳的扫描？
【发布时间】：2014-11-12 05:50:38
【问题描述】：

以 hbase 表 'test_table' 为例，插入的值为：

Row1 - Val1 => t
Row1 - Val2 => t + 3
Row1 - Val3 => t + 5

Row2 - Val1 => t
Row2 - Val2 => t + 3
Row2 - Val3 => t + 5

扫描 'test_table' 其中版本 = t + 4 应该返回

Row1 - Val1 => t + 3
Row2 - Val2 => t + 3

如何在 HBase 中实现基于时间戳的扫描（基于小于或等于时间戳的最新可用值）？

【问题讨论】：

你想在 HBase shell 中制作它还是为此编写一个程序？在这里查看如何从表中获取键列表：stackoverflow.com/questions/5218085/…，对于每个键，您可以发出get 并获取时间戳条目stackoverflow.com/questions/8321741/…，然后根据您的条件过滤它们

标签： hadoop hbase

【解决方案1】：

考虑这张表：

hbase(main):009:0> create 't1', { NAME => 'f1', VERSIONS => 100 }
hbase(main):010:0> put 't1', 'key1', 'f1:a', 'value1'
hbase(main):011:0> put 't1', 'key1', 'f1:a', 'value2'
hbase(main):012:0> put 't1', 'key1', 'f1:a', 'value3'
hbase(main):013:0> put 't1', 'key2', 'f1:a', 'value4'
hbase(main):014:0> put 't1', 'key2', 'f1:a', 'value5'
hbase(main):015:0> put 't1', 'key1', 'f1:a', 'value6'

这是它在 shell 中的所有版本扫描：

hbase(main):003:0> scan 't1', {VERSIONS => 100 }
ROW              COLUMN+CELL
 key1            column=f1:a, timestamp=1416083314098, value=value6
 key1            column=f1:a, timestamp=1416083294981, value=value3
 key1            column=f1:a, timestamp=1416083293273, value=value2
 key1            column=f1:a, timestamp=1416083291009, value=value1
 key2            column=f1:a, timestamp=1416083305050, value=value5
 key2            column=f1:a, timestamp=1416083299840, value=value4

根据您的要求，这是限制在特定时间戳的扫描：

hbase(main):002:0> scan 't1', { TIMERANGE => [0, 1416083300000] }
ROW              COLUMN+CELL
 key1            column=f1:a, timestamp=1416083294981, value=value3
 key2            column=f1:a, timestamp=1416083299840, value=value4

Java 代码也是这样：

package org.example.test;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;

public class test {
    public static void main (String[] args) throws IOException {
        HTable table = new HTable(HBaseConfiguration.create(), "t1");
        Scan s = new Scan();
        s.setMaxVersions(1);
        s.setTimeRange (0L, 1416083300000L);
        ResultScanner scanner = table.getScanner(s);
        for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
            System.out.println(Bytes.toString(rr.getRow()) + " => " +
                    Bytes.toString(rr.getValue(Bytes.toBytes("f1"), Bytes.toBytes("a"))));
        }
    }
}

注意指定时间范围最大值为excluded，也就是说如果要获取所有具有最大时间戳T的键的最后一个值，则应指定范围的上界为T+1

【讨论】：

在 Scala 中，它不起作用。见stackoverflow.com/questions/38887556/…
它工作正常.. 我们可以使用 shell 或任何其他脚本语言进一步清理输出