Hbase CopyTable 将特定列从不同的列族复制到新表答案

【问题标题】：Hbase CopyTable copy specific column from different columnfamilies to new tableHbase CopyTable 将特定列从不同的列族复制到新表
【发布时间】：2017-01-19 08:06:32
【问题描述】：

我在 Hbase 中有一张表"aks:myprofiles"

有两个列族 i 和 s

我拥有的列族 - 5 列 {ic1,ic2,ic3,ic4,ic5}

列族有 - 5 列 {sc1,sc2,sc3,sc4,sc5}

Describe "aks:myprofiles"

NAME => 'i', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER',
KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 's', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',    REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER',
KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

我想把表数据复制到另一个表的所有版本 ic1, ic2 和 sc1 ,sc2 到一个新表

不是所有列我想要特定列的所有版本

【问题讨论】：

Copy Data from one hbase table to another的可能重复
@usersx ：检查我的答案。请随时提问
@Rijulsahu 那完全不同我不想复制所有列的所有版本我想要特定列的所有版本

标签： hadoop mapreduce hbase hadoop2

【解决方案1】：

以下是您可以使用`CopyTable` 的方式。如果您想自定义特定列的版本，您可以通过扩展 `CopyTable` 创建自定义 mapreduce 程序，这是不可能的，因为它是 `CopyTable` mapreduce 作业。

如果您深入研究代码，您将了解几个选项。见CopyTableprintusage方法

以下是示例用法：

hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable

/*
   * @param errorMsg Error message.  Can be null.
   */
  private static void printUsage(final String errorMsg) {
    if (errorMsg != null && errorMsg.length() > 0) {
      System.err.println("ERROR: " + errorMsg);
    }
    System.err.println("Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] " +
        "[--new.name=NEW] [--peer.adr=ADR] <tablename>");
    System.err.println();
    System.err.println("Options:");
    System.err.println(" rs.class     hbase.regionserver.class of the peer cluster");
    System.err.println("              specify if different from current cluster");
    System.err.println(" rs.impl      hbase.regionserver.impl of the peer cluster");
    System.err.println(" startrow     the start row");
    System.err.println(" stoprow      the stop row");
    System.err.println(" starttime    beginning of the time range (unixtime in millis)");
    System.err.println("              without endtime means from starttime to forever");
    System.err.println(" endtime      end of the time range.  Ignored if no starttime specified.");
    System.err.println(" versions     number of cell versions to copy");
    System.err.println(" new.name     new table's name");
    System.err.println(" peer.adr     Address of the peer cluster given in the format");
    System.err.println("              hbase.zookeeper.quorum:hbase.zookeeper.client"
        + ".port:zookeeper.znode.parent");
    System.err.println(" families     comma-separated list of families to copy");
    System.err.println("              To copy from cf1 to cf2, give sourceCfName:destCfName. ");
    System.err.println("              To keep the same name, just give \"cfName\"");
    System.err.println(" all.cells    also copy delete markers and deleted cells");
    System.err.println(" bulkload     Write input into HFiles and bulk load to the destination "
        + "table");
    System.err.println();
    System.err.println("Args:");
    System.err.println(" tablename    Name of the table to copy");
    System.err.println();
    System.err.println("Examples:");
    System.err.println(" To copy 'TestTable' to a cluster that uses replication for a 1 hour window:");
    System.err.println(" $ hbase " +
        "org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 " +
        "--peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable ");
    System.err.println("For performance consider the following general option:\n"
        + "  It is recommended that you set the following to >=100. A higher value uses more memory but\n"
        + "  decreases the round trip time to the server and may increase performance.\n"
        + "    -Dhbase.client.scanner.caching=100\n"
        + "  The following should always be set to false, to prevent writing data twice, which may produce \n"
        + "  inaccurate results.\n"
        + "    -Dmapreduce.map.speculative=false");
  }

【讨论】：

是的，版本的数量从来都不是问题，问题是某些特定列的所有版本
理解了问题，但给出了一个指向存在的版本参数的指针。另请参阅我的第一行“如果您想自定义特定列的版本，您可以通过扩展 CopyTable 创建自定义 mapreduce 程序”这是不可能的，因为它与 CopyTable mapreduce 作业一样。希望澄清
是的，你的回答对于这个问题是非常正确的，还有一件事是我只需要使用 copyTable 没有任何硬性和快速性，如果有任何其他可能或好的解决方案会有所帮助
您可以使用我在链接中指出的CopyTable mapreduce 作业代码，并根据您的要求使用它。目前我没有 mapreduce hbase 环境来编写代码和演示。如果您对上述答案没有问题，请投票/接受为所有者。干杯！

【解决方案2】：

我们可以通过以下方式将特定列的所有版本（例如ic1, ic2, ic3）从表a复制到表b：

hbase org.apache.hadoop.hbase.mapreduce.CopyTable --versions=vers --families=ic1,ic2 --new.name=b a

其中vers 是您需要复制的最大版本数。

对于所有其他列，可以排除--versions选项，即可以运行以下命令

hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=b a

【讨论】：

有两个列族 i 和 s，ic1 和 ic2 是我需要复制的列 ic1 不是列族

【解决方案3】：

你可以使用copyTable

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=new_table_name myprofiles

【讨论】：

您必须指定要复制的列族的名称，--families:colFam

以下是您可以使用CopyTable 的方式。如果您想自定义特定列的版本，您可以通过扩展 CopyTable 创建自定义 mapreduce 程序，这是不可能的，因为它是 CopyTable mapreduce 作业。

以下是示例用法：

以下是您可以使用`CopyTable` 的方式。如果您想自定义特定列的版本，您可以通过扩展 `CopyTable` 创建自定义 mapreduce 程序，这是不可能的，因为它是 `CopyTable` mapreduce 作业。