HBase 不仅提供了这些简单的查询,而且提供了更加高级的过滤器(Filter)来查询。
过滤器可以根据列族、列、版本等更多的条件来对数据进行过滤,基于 HBase 本身提供的三维有序(行键,列,版本有序),这些过滤器可以高效地完成查询过滤的任务,带有过滤器条件的 RPC 查询请求会把过滤器分发到各个 RegionServer(这是一个服务端过滤器),这样也可以降低网络传输的压力。
a,查看自带filters
hbase(main):017:0> show_filters DependentColumnFilter KeyOnlyFilter ColumnCountGetFilter SingleColumnValueFilter PrefixFilter SingleColumnValueExcludeFilter FirstKeyOnlyFilter ColumnRangeFilter TimestampsFilter FamilyFilter QualifierFilter ColumnPrefixFilter RowFilter MultipleColumnPrefixFilter InclusiveStopFilter PageFilter ValueFilter ColumnPaginationFilter
b,准备测试数据
hbase(main):021:0> scan 'test_ns:user' ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 1000220190926 column=contact:tel, timestamp=1569461617469, value=02122345678 1000220190926 column=info:age, timestamp=1569461617408, value=35 1000220190926 column=login:password, timestamp=1569461617362, value=pass1 1000220190926 column=login:username, timestamp=1569468833657, value=test1_111 1000320190926 column=contact:mobile, timestamp=1569461617577, value=15822345678 1000320190926 column=contact:tel, timestamp=1569461617594, value=02132345678 1000320190926 column=info:age, timestamp=1569461617561, value=36 1000320190926 column=info:sex, timestamp=1569461617538, value=male2 1000320190926 column=login:password, timestamp=1569461617521, value=pass2 1000320190926 column=login:username, timestamp=1569463065916, value=newtest2 1000420190926 column=contact:mobile, timestamp=1569461617701, value=15833345678 1000420190926 column=contact:tel, timestamp=1569461617717, value=02142345678 1000420190926 column=info:age, timestamp=1569461617683, value=37 1000420190926 column=info:sex, timestamp=1569461617666, value=male3 1000420190926 column=login:password, timestamp=1569461617649, value=pass3 1000420190926 column=login:username, timestamp=1569461617628, value=test3 1000520190926 column=contact:mobile, timestamp=1569461709251, value=15803345678 1000520190926 column=contact:tel, timestamp=1569461709916, value=02152345678 1000520190926 column=info:age, timestamp=1569461709231, value=38 1000520190926 column=info:sex, timestamp=1569461709213, value=none 1000520190926 column=login:password, timestamp=1569461709200, value=admin 1000520190926 column=login:username, timestamp=1569461709184, value=zhang 1000620191009 column=info:sex, timestamp=1570592324271, value= 1000620191009 column=login:password, timestamp=1570592323845, value=test 1000620191009 column=login:username, timestamp=1570592323820, value=tank 6 row(s) in 0.0190 seconds
一,keyOnlyFilter返回的列值全部为空
1,语法
KeyOnlyFilter()
2,样例
hbase(main):023:0* import org.apache.hadoop.hbase.filter.KeyOnlyFilter; hbase(main):024:0* scan 'test_ns:user', {FILTER=>KeyOnlyFilter.new()} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value= 1000120190925 column=contact:tel, timestamp=1570591006670, value= 1000120190925 column=info:age, timestamp=1570591004901, value= 1000120190925 column=info:sex, timestamp=1570591004880, value= 1000120190925 column=login:password, timestamp=1570591004852, value= 1000120190925 column=login:username, timestamp=1570591004809, value= 。。。。。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。
在这里要注意哦,如果有10条数据,8条有值,2条没值。keyOnlyFilter返回10条空数据,而不是2条。
二,FirstKeyOnlyFilter返回每行的第一列
1,语法
FirstKeyOnlyFilter()
2,样例
hbase(main):025:0> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter; hbase(main):026:0* scan 'test_ns:user',{FILTER=>FirstKeyOnlyFilter.new()} ROW COLUMN+CELL videv 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 1000320190926 column=contact:mobile, timestamp=1569461617577, value=15822345678 1000420190926 column=contact:mobile, timestamp=1569461617701, value=15833345678 1000520190926 column=contact:mobile, timestamp=1569461709251, value=15803345678 1000620191009 column=info:sex, timestamp=1570592324271, value=
三,PrefixFilter根据rowkey的前缀过滤行
1,语法
PrefixFilter (‘<row_prefix>’)
2,样例,rowkey以10001开头
hbase(main):009:0> import org.apache.hadoop.hbase.filter.PrefixFilter; hbase(main):010:0* import org.apache.hadoop.hbase.util.Bytes; hbase(main):011:0* scan 'test_ns:user',{FILTER=>PrefixFilter.new(Bytes.toBytes('10001'))} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1 row(s) in 0.2900 seconds hbase(main):012:0> scan 'test_ns:user',{FILTER=>PrefixFilter.new(Bytes.toBytes('20191009'))} ROW COLUMN+CELL 0 row(s) in 0.0130 seconds hbase(main):013:0> scan 'test_ns:user',{FILTER=>PrefixFilter.new(Bytes.toBytes('1000620191009'))} ROW COLUMN+CELL 1000620191009 column=info:sex, timestamp=1570592324271, value= 1000620191009 column=login:password, timestamp=1570592323845, value=test 1000620191009 column=login:username, timestamp=1570592323820, value=tank 1 row(s) in 0.0110 seconds
注意哦:可模糊匹配
四,ColumnPrefixFilter返回满足条件的列
1,语法
ColumnPrefixFilter (‘<column_prefix>’)
2,样例,以u开头,或者sex开头的列
hbase(main):015:0> import org.apache.hadoop.hbase.filter.ColumnPrefixFilter => Java::OrgApacheHadoopHbaseFilter::ColumnPrefixFilter hbase(main):016:0> import org.apache.hadoop.hbase.util.Bytes hbase(main):017:0> scan 'test_ns:user',{STARTROW=>'1000120190925',STOPROW=>'1000620190926',FILTER=>ColumnPrefixFilter.new(Bytes.toBytes('u'))} ROW COLUMN+CELL 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1000220190926 column=login:username, timestamp=1569468833657, value=test1_111 1000320190926 column=login:username, timestamp=1569463065916, value=newtest2 1000420190926 column=login:username, timestamp=1569461617628, value=test3 1000520190926 column=login:username, timestamp=1569461709184, value=zhang hbase(main):027:0> scan 'test_ns:user',{STARTROW=>'1000120190925',STOPROW=>'1000620190926',FILTER=>"ColumnPrefixFilter('sex')"} ROW COLUMN+CELL 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000320190926 column=info:sex, timestamp=1569461617538, value=male2 1000420190926 column=info:sex, timestamp=1569461617666, value=male3 1000520190926 column=info:sex, timestamp=1569461709213, value=none
五,multipleColumnPrefixFilter根ColumnPrefixFilter差不多,只不过是多列
1,语法
MultipleColumnPrefixFilter (‘<column_prefix>’, ‘<column_prefix>’, …, ‘<column_prefix>’)
2,样例,取a,s开头的列
hbase(main):036:0> scan 'test_ns:user',{STARTROW=>'1000120190925',STOPROW=>'1000620190926',FILTER=>"MultipleColumnPrefixFilter('a','s')"} ROW COLUMN+CELL 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000220190926 column=info:age, timestamp=1569461617408, value=35 1000320190926 column=info:age, timestamp=1569461617561, value=36 1000320190926 column=info:sex, timestamp=1569461617538, value=male2 1000420190926 column=info:age, timestamp=1569461617683, value=37 1000420190926 column=info:sex, timestamp=1569461617666, value=male3 1000520190926 column=info:age, timestamp=1569461709231, value=38 1000520190926 column=info:sex, timestamp=1569461709213, value=none
六,ColumnCountGetFilter返回多少列
1,语法
ColumnCountGetFilter (‘<limit>’)
2,样例,取每行前二列
hbase(main):043:0> scan 'test_ns:user',{FILTER=>"ColumnCountGetFilter(2)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 1000220190926 column=contact:tel, timestamp=1569461617469, value=02122345678 1000320190926 column=contact:mobile, timestamp=1569461617577, value=15822345678 1000320190926 column=contact:tel, timestamp=1569461617594, value=02132345678 1000420190926 column=contact:mobile, timestamp=1569461617701, value=15833345678 1000420190926 column=contact:tel, timestamp=1569461617717, value=02142345678 1000520190926 column=contact:mobile, timestamp=1569461709251, value=15803345678 1000520190926 column=contact:tel, timestamp=1569461709916, value=02152345678 5 row(s) in 0.0100 seconds
七,PageFilter返回多少行
1,语法
PageFilter (‘<page_size>’)
2,样例,根据条件取一行
hbase(main):046:0> scan 'test_ns:user',{STARTROW=>'1000120190925',STOPROW=>'1000620190926',FILTER=>"PageFilter(1)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1 row(s) in 0.0150 seconds
八,ColumnPaginationFilter根据limit和offset得到数据,
1,语法
ColumnPaginationFilter (‘<limit>’, ‘<offset>’)
2,样例,第二列开始,取一列
hbase(main):048:0> scan 'test_ns:user',{FILTER=>"ColumnPaginationFilter(1,2)"} ROW COLUMN+CELL 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000220190926 column=info:age, timestamp=1569461617408, value=35 1000320190926 column=info:age, timestamp=1569461617561, value=36 1000420190926 column=info:age, timestamp=1569461617683, value=37 1000520190926 column=info:age, timestamp=1569461709231, value=38 1000620191009 column=login:username, timestamp=1570592323820, value=tank 6 row(s) in 0.0110 seconds
九,InclusiveStopFilter设置停止的行
1,语法
InclusiveStopFilter (‘<stop_row_key>’)
2,样例,取数据截至到rowkey为1000220190926
hbase(main):051:0> scan 'test_ns:user',{FILTER=>"InclusiveStopFilter('1000220190926')"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 1000220190926 column=contact:tel, timestamp=1569461617469, value=02122345678 1000220190926 column=info:age, timestamp=1569461617408, value=35 1000220190926 column=login:password, timestamp=1569461617362, value=pass1 1000220190926 column=login:username, timestamp=1569468833657, value=test1_111 2 row(s) in 0.0070 seconds
十,TimeStampsFilter取得指定时间戳的数据
1,语法
TimeStampsFilter (<timestamp>, <timestamp>, ... ,<timestamp>)
2,样例,取得时间戳为1569461617446,1570591004852的数据
hbase(main):052:0> scan 'test_ns:user',{FILTER=>"TimestampsFilter(1569461617446,1570591004852)"} ROW COLUMN+CELL 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 2 row(s) in 0.0200 seconds
十一,RowFilter根据rowkey的值过滤
1,语法
RowFilter (<compareOp>, ‘<row_comparator>’)
2,样例,rowkey小于1000220190926,并且取2列
hbase(main):067:0> scan 'test_ns:user',{FILTER=>"RowFilter(<=,'binary:1000220190926') AND ColumnCountGetFilter(2)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 1000220190926 column=contact:tel, timestamp=1569461617469, value=02122345678 2 row(s) in 0.0090 seconds hbase(main):068:0> scan 'test_ns:user', FILTER=>"RowFilter(<=,'binary:1000220190926') AND ColumnCountGetFilter(2)" ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 1000220190926 column=contact:tel, timestamp=1569461617469, value=02122345678 2 row(s) in 0.0090 seconds
十二,FamilyFilter根据列族过滤
1,语法
FamilyFilter (<compareOp>, ‘<family_comparator>’)
2,样例,列族为c开头,取一条数据
hbase(main):071:0> scan 'test_ns:user',{FILTER=>"FamilyFilter(=,'substring:c') AND PageFilter(1)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1 row(s) in 0.0070 seconds
注意:AND要大写,不然报错
十三,QualifierFilter根据列名过滤
1,语法
QualifierFilter (<compareOp>, ‘<qualifier_comparator>’)
2,样例,列匹配,取一条数据
hbase(main):075:0> scan 'test_ns:user',{FILTER=>"QualifierFilter(=,'regexstring:.obil.') AND PageFilter(2)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 2 row(s) in 0.0070 seconds
十四,ValueFilter根据值过滤,只返回匹配的列
1,语法
ValueFilter (<compareOp>, ‘<value_comparator>’)
2,样例,取得值为15854654215的列
hbase(main):077:0> scan 'test_ns:user',{FILTER=>"ValueFilter(=,'binary:15854654215')"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1 row(s) in 0.0080 seconds
十五. SingleColumnValueFilter根据列值返回行
1,语法
SingleColumnValueFilter (‘<family>’, ‘<qualifier>’, <compare operator>, ‘<comparator>’, <filterIfColumnMissing_boolean>, <latest_version_boolean>)
SingleColumnValueFilter (‘<family>’, ‘<qualifier>’, <compare operator>, ‘<comparator>’)
2,样例,列族contact下的列mobile,大于15812345678的数据,取一行
hbase(main):081:0> scan 'test_ns:user',{FILTER=>"SingleColumnValueFilter('contact','mobile',>=,'binary:15812345678') AND PageFilter(1)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1 row(s) in 0.0070 seconds
十六,SingleColumnValueExcludeFilter根据列值返回行,排队条件列
1,语法
SingleColumnValueExcludeFilter (<family>, <qualifier>, <compare operators>, <comparator>, <latest_version_boolean>, <filterIfColumnMissing_boolean>)
SingleColumnValueExcludeFilter (<family>, <qualifier>, <compare operator> <comparator>)
2,样例
//有mobile hbase(main):107:0> scan 'test_ns:user',{FILTER=>"SingleColumnValueFilter('contact','mobile',>=,'binary:15833345678') AND PageFilter(1)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1 row(s) in 0.0070 seconds //没有mobile hbase(main):108:0> scan 'test_ns:user',{FILTER=>"SingleColumnValueExcludeFilter('contact','mobile',>=,'binary:15833345678') AND PageFilter(1)"} ROW COLUMN+CELL 1000120190925 column=contact:tel, timestamp=1570591006670, value=02112345678 1000120190925 column=info:age, timestamp=1570591004901, value=35 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000120190925 column=login:username, timestamp=1570591004809, value=Tank 1 row(s) in 0.0070 seconds
十七,ColumnRangeFilter,取得二列这间的所有列数据
1,语法
ColumnRangeFilter (‘<minColumn >’, <minColumnInclusive_bool>, ‘<maxColumn>’, <maxColumnInclusive_bool>)
2,样例
hbase(main):101:0> scan 'test_ns:user',{FILTER=>"ColumnRangeFilter('mobile',false,'tel',false) AND PageFilter(2)"} ROW COLUMN+CELL 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000220190926 column=login:password, timestamp=1569461617362, value=pass1 2 row(s) in 0.0060 seconds hbase(main):102:0> scan 'test_ns:user',{FILTER=>"ColumnRangeFilter('mobile',true,'tel',false) AND PageFilter(2)"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000120190925 column=info:sex, timestamp=1570591004880, value=Male 1000120190925 column=login:password, timestamp=1570591004852, value=Pass1 1000220190926 column=contact:mobile, timestamp=1569461617446, value=15812345678 1000220190926 column=login:password, timestamp=1569461617362, value=pass1 2 row(s) in 0.0060 seconds
注意:true表示,包含本列,false表示,不包含本列
十八,DependentColumnFilter一种允许用户指定一个参考列或引用列来过滤其他列的过滤器,过滤的原则是基于参考列的时间戳来进行筛选
1,语法
DependentColumnFilter (‘<family>’, ‘<qualifier>’, <boolean>, <compare operator>, ‘<value comparator’)
DependentColumnFilter (‘<family>’, ‘<qualifier>’, <boolean>)
DependentColumnFilter (‘<family>’, ‘<qualifier>’)
2,样例
hbase(main):018:0> scan 'test_ns:user',{FILTER=>"DependentColumnFilter('contact','mobile',false,>=,'binary:15813345678')"} ROW COLUMN+CELL 1000120190925 column=contact:mobile, timestamp=1570591004923, value=15854654215 1000320190926 column=contact:mobile, timestamp=1569461617577, value=15822345678 1000420190926 column=contact:mobile, timestamp=1569461617701, value=15833345678 3 row(s) in 0.0130 seconds
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2189.html