【问题标题】:Scan with filter using HBase shell使用 HBase shell 使用过滤器扫描
【发布时间】:2011-08-31 11:10:55
【问题描述】:

有人知道如何根据某些扫描过滤器扫描记录,例如:

column:something = "somevalue"

类似于this,但来自 HBase shell?

【问题讨论】:

    标签: nosql hbase


    【解决方案1】:

    试试这个。这有点难看,但对我有用。

    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
    import org.apache.hadoop.hbase.filter.SubstringComparator
    import org.apache.hadoop.hbase.util.Bytes
    scan 't1', { COLUMNS => 'family:qualifier', FILTER =>
        SingleColumnValueFilter.new
            (Bytes.toBytes('family'),
             Bytes.toBytes('qualifier'),
             CompareFilter::CompareOp.valueOf('EQUAL'),
             SubstringComparator.new('somevalue'))
    }
    

    HBase shell 将包含您在 ~/.irbrc 中的所有内容,因此您可以在其中放置类似的内容(我不是 Ruby 专家,欢迎改进):

    # imports like above
    def scan_substr(table,family,qualifier,substr,*cols)
        scan table, { COLUMNS => cols, FILTER =>
            SingleColumnValueFilter.new
                (Bytes.toBytes(family), Bytes.toBytes(qualifier),
                 CompareFilter::CompareOp.valueOf('EQUAL'),
                 SubstringComparator.new(substr)) }
    end
    

    然后你就可以在 shell 中说:

    scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'
    

    【讨论】:

    • 这确实超级丑。不过谢谢,在 HBase docs/book/oreilly book 中找不到任何示例。
    • 你好,我看到你提到了ruby,我不知道发生了什么,然后我查了一下,发现HBase接受了一些ruby脚本?是这样吗?
    【解决方案2】:
    scan 'test', {COLUMNS => ['F'],FILTER => \ 
    "(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \
    (SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}
    

    更多信息可以在here找到。请注意,附加的Filter Language.docx 文件中有多个示例。

    【讨论】:

    【解决方案3】:

    使用scan的FILTER参数,如使用帮助所示:

    hbase(main):002:0> scan
    
    ERROR: wrong number of arguments (0 for 1)
    
    Here is some help for this command:
    Scan a table; pass table name and optionally a dictionary of scanner
    specifications.  Scanner specifications may include one or more of:
    TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
    or COLUMNS. If no columns are specified, all columns will be scanned.
    To scan all members of a column family, leave the qualifier empty as in
    'col_family:'.
    
    Some examples:
    
      hbase> scan '.META.'
      hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
      hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
      hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
      hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
    
    For experts, there is an additional option -- CACHE_BLOCKS -- which
    switches block caching for the scanner on (true) or off (false).  By
    default it is enabled.  Examples:
    
      hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
    

    【讨论】:

      【解决方案4】:
      Scan scan = new Scan();
      FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
      
      //in case you have multiple SingleColumnValueFilters, 
      you would want the row to pass MUST_PASS_ALL conditions
      or MUST_PASS_ONE condition.
      
      SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
                         Bytes.toBytes("SOME COLUMN FAMILY" ),
                         Bytes.toBytes("SOME COLUMN NAME"),
                         CompareOp.EQUAL,
                         Bytes.toBytes("SOME VALUE"));
      
      filter_by_name.setFilterIfMissing(true);  
      //if you don't want the rows that have the column missing.
      Remember that adding the column filter doesn't mean that the 
      rows that don't have the column will not be put into the 
      result set. They will be, if you don't include this statement. 
      
      list.addFilter(filter_by_name);
      
      
      scan.setFilter(list);
      

      【讨论】:

      • 这段代码是Java,问题是关于HBase shell。
      【解决方案5】:

      其中一个过滤器是 Valuefilter,可用于过滤所有列值。

      hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

      binary 是过滤器中使用的比较器之一。您可以根据自己的需要在过滤器中使用不同的比较器。

      您可以参考以下网址:http://www.hadooptpoint.com/filters-in-hbase-shell/. 它提供了关于如何在 HBase Shell 中使用不同过滤器的很好的示例。

      【讨论】:

      • 仅链接的答案不是好问题。发布一些代码并解释它以提供帮助。
      • 链接失效了。将您带到垃圾邮件站点
      【解决方案6】:

      在查询末尾添加 setFilterIfMissing(true)

      hbase(main):009:0> import org.apache.hadoop.hbase.util.Bytes;
       import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
       import org.apache.hadoop.hbase.filter.BinaryComparator;
       import org.apache.hadoop.hbase.filter.CompareFilter;
       import org.apache.hadoop.hbase.filter. Filter;
      
       scan 'test:test8', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'),
            Bytes.toBytes('ACCOUNT_NUMBER'), CompareFilter::CompareOp.valueOf('EQUAL'),
            BinaryComparator.new(Bytes.toBytes('0003000587'))).setFilterIfMissing(true)}
      

      【讨论】:

        猜你喜欢
        • 2017-07-29
        • 2013-09-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-02-04
        相关资源
        最近更新 更多