【问题标题】:HBase filter data based on column valuesHBase 根据列值过滤数据
【发布时间】:2016-09-27 00:35:03
【问题描述】:

我想根据特定列中的值列表过滤 Hbase 表扫描。

例如:对于下面给出的表 Employee,我想获取 ID 在 (123,789) 中的员工的记录。

 ROW                   COLUMN+CELL

 row1                 column=emp:name, timestamp=1321296699190, value=TestName1
 row1                 column=emp:id, timestamp=1321296715892, value=123

 row2                 column=emp:name, timestamp=1321296699190, value=TestName2
 row2                 column=emp:id, timestamp=1321296715892, value=456

 row3                 column=emp:name, timestamp=1321296699190, value=TestName3
 row3                 column=emp:id, timestamp=1321296715892, value=789

 row4                 column=emp:name, timestamp=1321296699190, value=TestName4
 row4                 column=emp:id, timestamp=1321296715892, value=101

 row5                 column=emp:name, timestamp=1321296699190, value=TestName5
 row5                 column=emp:id, timestamp=1321296715892, value=102

我尝试使用SingleColumnValueFilter,但它只从表中获取一条记录。下面给出的是我的代码。请让我知道我哪里出错了:

HTableInterface empTableObj = service.openTable("employee");;
Scan scan = new Scan(startRow, endRow);            

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);

Integer[] idArray = {123, 789};
for(int i=0;i<idArray.length;i++){
    SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("emp"), Bytes.toBytes("id"), CompareOp.EQUAL, Bytes.toBytes(idArray[i].toString()));
    filterList.addFilter(filter);
}
scan.setFilter(filterList);
ResultScanner rs = empTableObj.getScanner(scan); 

谢谢

【问题讨论】:

  • 可能不会有太大的不同,您可以尝试 Scan scan = new Scan() 然后显式设置开始和停止行。 public Sc​​an setStartRow(byte[] startRow) 和 public Sc​​an setStopRow(byte[] stopRow)

标签: hbase


【解决方案1】:

尝试其他构造函数:

SingleColumnValueFilter filter = new SingleColumnValueFilter(family, qualifier, compareOp, empBytes); 

在哪里

compareOp = CompareFilter.CompareOp.EQUAL;

和家庭,限定符在字节和empBytes是Bytes.toBytes("emp")

或者您可以创建 2 个过滤器:

SingleColumnValueFilter filterLower = setFilterByCol(CompareOp.GREATER_OR_EQUAL,123);  
SingleColumnValueFilter filterUpper = setFilterByCol(CompareOp.LESS_OR_EQUAL,789);  

还有一个功能:

private static SingleColumnValueFilter setFilterByCol(CompareOp compareOp,int emp) {


        byte[] family = "col_fam_name".getBytes();
        byte[] qualifier = "col_qualifier".getBytes();
        byte[] empByte = // convert emp to empByte...

        SingleColumnValueFilter filter = new SingleColumnValueFilter (family,qualifier,compareOp, empByte );
        filter.setFilterIfMissing(true);
        return filter;
    }

请注意,您还有 SingleColumnValueExcludeFilter 允许您从扫描中排除用作过滤器的列。

【讨论】:

    【解决方案2】:

    由于过滤器是延迟评估的,我猜你必须继续运行 next() 才能扫描所有值。

    如果你知道你有 2 个值,试试

    rs.next() // for the first value (row1)
    rs.next() // again for the second row (row4)
    

    如果不确定你会得到多少……循环运行它。

    【讨论】:

      【解决方案3】:
      public void testFilterList() {  
          LOG.info("Entering testFilterList.");  
      
          Table table = null;  
          ResultScanner rScanner = null;  
          try {  
             table = conn.getTable(tableName);  
             Scan scan = new Scan();  
             scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));  
      
             // Instantiate a FilterList object in which filters have "and"  
             // relationship with each other.  
             FilterList list = new FilterList(Operator.MUST_PASS_ALL);  
             // Obtain data with EmpId of greater than or equal to 200.  
             list.addFilter(new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes  
                 .toBytes("EmpId"), CompareOp.GREATER_OR_EQUAL, Bytes.toBytes(new Long(  
                 200))));  
             // Obtain data with EmpId of less than or equal to 1000.  
             list.addFilter(new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes  
                 .toBytes("EmpId"), CompareOp.LESS_OR_EQUAL, Bytes.toBytes(new Long(1000))));  
      
             scan.setFilter(list);  
      
             // Submit a scan request.  
             rScanner = table.getScanner(scan);  
             // Print query results.  
             for (Result r = rScanner.next(); r != null; r = rScanner.next()) {  
               for (Cell cell : r.rawCells()) {  
                 LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":"  
                     + Bytes.toString(CellUtil.cloneFamily(cell)) + ","  
                     + Bytes.toString(CellUtil.cloneQualifier(cell)) + ","  
                     + Bytes.toString(CellUtil.cloneValue(cell)));  
               }  
             }  
             LOG.info("Filter list successfully.");  
           } catch (IOException e) {  
             LOG.error("Filter list failed ", e);  
           } finally {  
               if (rScanner != null) {  
                   // Close the scanner object.  
                   rScanner.close();  
                 }  
             if (table != null) {  
               try {  
                 // Close the HTable object.  
                 table.close();  
               } catch (IOException e) {  
                 LOG.error("Close table failed ", e);  
               }  
             }  
           }  
           LOG.info("Exiting testFilterList.");  
      }
      

      【讨论】:

        猜你喜欢
        • 2016-02-05
        • 1970-01-01
        • 2020-07-01
        • 2023-03-13
        • 2021-04-10
        • 2020-08-08
        • 1970-01-01
        • 1970-01-01
        • 2023-03-28
        相关资源
        最近更新 更多