Monday, May 6, 2013

HBase - advanced configuration on column family level

Block Size
HFile block size.Default is 64k,If you want to good sequential scan performance,it;s better to have larger  block size.
        Setting is during table creation
        hbase(main):002:0> create 'mytable',   {NAME => 'colfam1', BLOCKSIZE => '65536'}
        Or with code
        On HColumnDescriptor there is a method : setBlocksize(int)

Block Cache
        You can block cache for specific column family in order to improve caching for other column families for example.

hbase(main):002:0> create 'mytable',
{NAME => 'colfam1', BLOCKCACHE => 'false’}

Aggresive caching

       You can choose column families to be in highter priority for caching.

       hbase(main):002:0> create 'mytable',
       {NAME => 'colfam1', IN_MEMORY => 'true'}

Bloom filters

       You enable bloom filters on the column family, like this:
       hbase(main):007:0> create 'mytable',
       {NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}

       A row-level bloom filter is enabled with ROW, and a qualifier-level bloom filter is enabled with ROWCOL

 TTL
       By defining Time To Live on some column family will delete the data after given amount  of time
       Example:
      hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'}

     Data in colfam1 that is older than 5 hours is deleted during the next major compaction.

Compression

    Compression defenition impacts HFiles and their data. This can save disk I/O and instead pay for higher CPU utilization.
    hbase(main):002:0> create 'mytable',
    {NAME => 'colfam1', COMPRESSION => 'SNAPPY'}

Cell versioning

    By default 3 versions of values are saved. Can be changed

    hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 5,
    MIN_VERSIONS => '1'}

No comments:

Post a Comment