【问题标题】:Fail to read from bigtable in dataflow无法从数据流中的 bigtable 读取
【发布时间】:2016-07-12 12:57:50
【问题描述】:

我在工作中使用数据流将一些数据写入大表。
目前,我的任务是从 bigtable 中读取行。
但是,每当我尝试使用 bigtable-hbase-dataflow 从 bigtable 读取行时,它都会失败并抱怨如下。

 Error:   (3218070e4dd208d3): java.lang.IllegalArgumentException: b <= a
at org.apache.hadoop.hbase.util.Bytes.iterateOnSplits(Bytes.java:1720)
at org.apache.hadoop.hbase.util.Bytes.split(Bytes.java:1683)
at org.apache.hadoop.hbase.util.Bytes.split(Bytes.java:1664)
at com.google.cloud.bigtable.dataflow.CloudBigtableIO$AbstractSource.split(CloudBigtableIO.java:512)
at com.google.cloud.bigtable.dataflow.CloudBigtableIO$AbstractSource.getSplits(CloudBigtableIO.java:358)
at com.google.cloud.bigtable.dataflow.CloudBigtableIO$Source.splitIntoBundles(CloudBigtableIO.java:593)
at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:413)
at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:171)
at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:149)
at com.google.cloud.dataflow.sdk.runners.worker.SourceOperationExecutor.execute(SourceOperationExecutor.java:58)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:288)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我正在使用 'com.google.cloud.dataflow:google-cloud-dataflow-java-sdk-all:1.6.0' 和 'com.google.cloud.bigtable:bigtable-hbase-dataflow:0.9.0 ' 现在。

这是我的代码。

CloudBigtableScanConfiguration config = new CloudBigtableScanConfiguration.Builder()
    .withProjectId("project-id")
    .withInstanceId("instance-id")
    .withTableId("table")
    .build();
pipeline.apply(Read.<Result>from(CloudBigtableIO.read(config)))
    .apply(ParDo.of(new Test()));

仅供参考,我刚刚从 bigtable 中读取数据,并在 Test DoFn 中使用聚合器计算行数。

static class Test extends DoFn<Result, Result> {
    private static final long serialVersionUID = 0L;
    private final Aggregator<Long, Long> rowCount = createAggregator("row_count", new Sum.SumLongFn());

    @Override
    public void processElement(ProcessContext c) {
        rowCount.addValue(1L);
        c.output(c.element());
    }
}

我刚刚按照关于数据流document 的教程进行操作,但它失败了。谁能帮帮我?

【问题讨论】:

  • 只是检查基础知识 - 在您的实际代码中,您将 project-idinstance-idtable 替换为实际值,是吗?
  • 是的。我当然做到了:)
  • 这看起来像是云大表客户端中的一个错误。我创建了一个 github 问题来跟踪这个问题:github.com/GoogleCloudPlatform/cloud-bigtable-client/issues/912
  • 谢谢@Solomon!我会留意的。

标签: java google-cloud-dataflow google-cloud-bigtable


【解决方案1】:

根本原因是依赖问题:

之前,我们的构建文件省略了这个依赖:

compile 'io.netty:netty-tcnative-boringssl-static:1.1.33.Fork22'

今天,我添加了依赖项,它解决了所有问题。当我在构建文件中没有它时,我仔细检查了问题是否出现。

来自https://github.com/GoogleCloudPlatform/cloud-bigtable-client/issues/912#issuecomment-249999380

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-05-28
    • 1970-01-01
    • 2022-01-12
    • 1970-01-01
    • 2017-11-04
    • 1970-01-01
    • 2017-04-16
    相关资源
    最近更新 更多