【问题标题】:EMR Hive ODBC Connection Error: HiveSQLException: Expected states: [FINISHED], but found RUNNINGEMR Hive ODBC 连接错误:HiveSQLException:预期状态:[FINISHED],但发现 RUNNING
【发布时间】:2023-05-05 03:50:02
【问题描述】:

我正在尝试将 Power BI 连接到 AWS EMR Hive 并从表中检索表列表和数据。

检索表列表工作正常。但是,当我单击特定表时,我在 Power BI UI 中得到以下异常:

DataSource.Error: ODBC: ERROR [HY000] [Amazon][Hardy] (35) Error from server: error code: '0' error message: 'Expected states: [FINISHED], but found RUNNING'. Details:
    DataSourceKind=Odbc
    DataSourcePath=dsn=test emr aws
    OdbcErrors=[Table]

并且在 hive 日志中出现以下错误:(错误关闭操作: java.nio.BufferUnderflowException)

2019-09-22T09:43:25,731 WARN  [HiveServer2-Handler-Pool: Thread-44([])]: thrift.ThriftCLIService (ThriftCLIService.java:GetResultSetMetadata(735)) - Error getting result set metadata:
org.apache.hive.service.cli.HiveSQLException: Expected states: [FINISHED], but found RUNNING
        at org.apache.hive.service.cli.operation.Operation.assertState(Operation.java:203) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.operation.GetPrimaryKeysOperation.getResultSetSchema(GetPrimaryKeysOperation.java:110) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.operation.OperationManager.getOperationResultSetSchema(OperationManager.java:302) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.session.HiveSessionImpl.getResultSetMetadata(HiveSessionImpl.java:866) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_222]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_222]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ~[hadoop-common-2.8.5-amzn-4.jar:?]
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at com.sun.proxy.$Proxy41.getResultSetMetadata(Unknown Source) ~[?:?]
        at org.apache.hive.service.cli.CLIService.getResultSetMetadata(CLIService.java:540) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.GetResultSetMetadata(ThriftCLIService.java:731) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1697) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1682) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
2019-09-22T09:43:25,820 WARN  [HiveServer2-Handler-Pool: Thread-44([])]: thrift.ThriftCLIService (ThriftCLIService.java:CloseOperation(720)) - Error closing operation:
java.nio.BufferUnderflowException
        at java.nio.Buffer.nextGetIndex(Buffer.java:506) ~[?:1.8.0_222]
        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:412) ~[?:1.8.0_222]
        at org.apache.hive.service.cli.HandleIdentifier.<init>(HandleIdentifier.java:46) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.Handle.<init>(Handle.java:38) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.OperationHandle.<init>(OperationHandle.java:41) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.OperationHandle.<init>(OperationHandle.java:37) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:717) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

可能是什么问题?数据在 HUE 中检索良好,不到 15 秒。

奇怪的是 ODBC 驱动程序中的 SQL Query 工作正常。只有在选择表时,它仅提供上述错误。

【问题讨论】:

    标签: amazon-web-services hive odbc powerbi thrift


    【解决方案1】:

    我对此有一个理论...通过阅读堆栈跟踪,似乎 Hive 试图通过调用 HandleIdentifier 来读取表 ID,这在下一次调用时看起来像一个 Long (java.nio.HeapByteBuffer.getLong) .您是否尝试增加 Hadoop 的缓冲区大小?默认为 4KB ("io.file.buffer.size": "4096") -- 尝试至少 8KB,这是 Java 原始大小。

    我从这里获得了缓冲区洞察:https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrconfiguration.html

    这里是官方的 EMR 应用配置指南:https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

    希望对你有帮助!

    【讨论】:

    • 感谢您的快速回复,我将这个参数property { key: "io.file.buffer.size" value: "65536" }从65536更改为195536并重新启动了hive。还是一样的错误
    • 我想出了如何更新。我们需要在 AWS UI 中启动集群之前提供它。现在它在配置中显示更新的值。但不幸的是,power BI 问题并没有解决。
    • 嗯,您是否也通过 AWS UI 尝试了这个更大的价值?我不是 Power BI 专家——它是否为 Hive CLI 提供任何缓冲区大小?
    • 我也看到有些人得到缓冲区下溢,但在插入。没有多大意义,但他们解决方法将“hive.rpc.query.plan”设置为 true。如果这些帮助中的任何一个,我会更新答案。
    • 还有一点,@Fabio Manzano,ODBC 驱动程序中的 SQL 查询工作正常。只有在选择表时,它仅提供上述错误。我猜问题出在 ODBC 驱动程序上。让我检查一下旧版本。