【问题标题】:hive alter table add column fails due to large number of partitions由于大量分区,hive 更改表添加列失败
【发布时间】:2020-07-21 23:50:50
【问题描述】:

我有一个分区超过 300k 的表。当我尝试添加如下所示的新列时,它会运行数小时然后失败。 Metastor rds 在 mysql 上,分区表有超过 500 万行。有人遇到过吗?

alter table tablea add columns(col1 string) cacade;

错误信息:

        at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:638)
        at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3590)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:390)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1689)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1673)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:375)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:322)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
        at com.sun.proxy.$Proxy34.alter_table_with_environmentContext(Unknown Source)

【问题讨论】:

    标签: hive hive-metastore metastore


    【解决方案1】:

    我最终编写了一个 for 循环遍历每个分区并执行

    alter table tablea add columns(col1 string)

    这似乎是最安全的方法。 考虑到尝试在表级别执行级联的分区数量会导致不可预测的行为,更不用说完成所需的时间了。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-12-05
      • 1970-01-01
      • 2021-12-24
      • 2011-08-19
      • 2017-03-09
      • 1970-01-01
      相关资源
      最近更新 更多