Wednesday, May 22, 2013

We don't need Big Data?!


Tipical Relational Database story begins like :
1. On proof of concept stage we use Local DB installation. 
2. Then, after the step we go into production with remote RDBMS on separate machine.
3. Then we grow and see that many read operation occures.In order to optimize this we add some memcahe solution to cache queries.Here we loos ACID on read operation.
4. Here we grow more and we relize that many write operation hit our system...Then we add additional servers and spend money for new hosts.
5. Then we start to add additional feautures to our system that increases query complexity.Many joins occure. As a solution for this - denormalization of the data.
6. Then the popularity grows more  and things are gettign slow...to solve his we try to reduce server side computation tasks and parametrize some unneeded ones...
7.Then we stabilize reads operations but write operations are still getting slow...And then to improve this we drop indexes and triggers...so what we got in the end?
Not fully ACID , many machines, denormalized data with no indexes...
Lets think together........















Material downloaded from Internet from different sources.





Wednesday, May 8, 2013

HBase internals investigation

Lets first investigate what we have in the HDFS:

$HADOOP_HOME/bin/hadoop dfs -lsr /hbase

This will give us following

drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/-ROOT-
-rw-r--r--   3 ramim supergroup        727 2013-04-24 20:09 /hbase/-ROOT-/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs
-rw-r--r--   3 ramim supergroup        421 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs/hlog.1366823395834
-rw-r--r--   3 ramim supergroup        109 2013-04-24 20:09 /hbase/-ROOT-/70236052/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/info
-rw-r--r--   3 ramim supergroup       1958 2013-05-07 21:35 /hbase/-ROOT-/70236052/info/e07b870165804cbabd690799a2de856c
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/.META.
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs/hlog.1366823395985
-rw-r--r--   3 ramim supergroup        111 2013-04-24 20:09 /hbase/.META./1028785192/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/info
-rw-r--r--   3 ramim supergroup       8539 2013-05-07 21:35 /hbase/.META./1028785192/info/ace47eeb4dbb4de9ba5f2ec03534be31
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.logs
drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:50 /hbase/.oldlogs
-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table
-rw-r--r--   3 ramim supergroup        697 2013-04-28 14:50 /hbase/key_table/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs/hlog.1367149833692
-rw-r--r--   3 ramim supergroup        234 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1
-rw-r--r--   3 ramim supergroup        998 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/recovered.edits

Lets discuss following groups:
Root level:

drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634

Logs directory contains data for every region server in the cluster.All Regions in the region server share the same HLog file that is rolled . Old files moved to /hbase/.oldlogs folder.


-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
Those files describe cluster id and version of file format.
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
This folder is used to store corrupted logs
/hbase/splitlog/ - is used during split process


Table Level
You have folder for every Table in teh cluster
 /hbase/key_table/.tableinfo 
the file holds table descriptor in serialzied form
/hbase/key_table/.tmp
is used for temporary data, for exmaple during .tableinfo fiel update

Region Level
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
Contains information about the region - HRegionInfo class represents it
This is the folder structure of all the data in the region:
/<hbase-root-dir>/<tablename>/<encoded-regionname>/<column-family>/<filename>
Example:
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b

All the data is constructed in HFiles which can be read by 
"$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile"

illin793!ramim:~/Hbase2/hadoop/runtime/hbase-0.94.0 /bin/ hbase org.apache.hadoop.hbase.io.hfile.HFile f /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad -v -p
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
13/05/08 18:23:30 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available. 
Scanning -> /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad
13/05/08 18:23:30 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 241.7m
K: r1/cf1:c1/1366823596580/Put/vlen=2/ts=0 V: v1
K: r2/cf1:c1/1366823602288/Put/vlen=2/ts=0 V: v2
K: r3/cf1:c1/1366823606557/Put/vlen=2/ts=0 V: v3
Scanned kv count -> 3


The way you can retrieve the key value that exist in the file.
For more details aboput HFile structure:

HFile structure on cloudera













Monday, May 6, 2013

HBase - advanced configuration on column family level

Block Size
HFile block size.Default is 64k,If you want to good sequential scan performance,it;s better to have larger  block size.
        Setting is during table creation
        hbase(main):002:0> create 'mytable',   {NAME => 'colfam1', BLOCKSIZE => '65536'}
        Or with code
        On HColumnDescriptor there is a method : setBlocksize(int)

Block Cache
        You can block cache for specific column family in order to improve caching for other column families for example.

hbase(main):002:0> create 'mytable',
{NAME => 'colfam1', BLOCKCACHE => 'false’}

Aggresive caching

       You can choose column families to be in highter priority for caching.

       hbase(main):002:0> create 'mytable',
       {NAME => 'colfam1', IN_MEMORY => 'true'}

Bloom filters

       You enable bloom filters on the column family, like this:
       hbase(main):007:0> create 'mytable',
       {NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}

       A row-level bloom filter is enabled with ROW, and a qualifier-level bloom filter is enabled with ROWCOL

 TTL
       By defining Time To Live on some column family will delete the data after given amount  of time
       Example:
      hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'}

     Data in colfam1 that is older than 5 hours is deleted during the next major compaction.

Compression

    Compression defenition impacts HFiles and their data. This can save disk I/O and instead pay for higher CPU utilization.
    hbase(main):002:0> create 'mytable',
    {NAME => 'colfam1', COMPRESSION => 'SNAPPY'}

Cell versioning

    By default 3 versions of values are saved. Can be changed

    hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 5,
    MIN_VERSIONS => '1'}

What the hell is Bloom filter and why its needed in HBase?

We define hash functions that assign a  bits to the key value.
Put action :  we run hash functions over they key and as a result we get a bit mask that identefies the key
Get action: we run our key over the hash functions , getting the result as a bit mask then we check if the bit mask exist in the  global bit vector.

                                                       
As an example:
x,y,z - exist in the set
W - doesnt exist
Basically Bloom filter helps us to reduce scans for  key existence test.
If the mask doesn't exist - the key defenetly doesn't exist in the set
If it does exist - - then we shall go and make the full scan to check the key exist
And why we need it in the HBase?

"..They are stored in the meta data of each HFile when it is written and then never need to be updated because HFiles are immutable. While I have no empirical data as to how much extra space they require (this also depends on the error rate you choose etc.) they do add some overhead obviously. When a HFile is opened, typically when a region is deployed to a RegionServer, the bloom filter is loaded into memory and used to determine if a given key is in that store file. They can be scoped on a row key or column key level, where the latter needs more space as it has to store many more keys compared to just using the row keys (unless you only have exactly one column per row). 

In terms of computational overhead the bloom filters in HBase are very efficient, they employ folding to keep the size down and combinatorial generation to speed up their creation. They add about 1 byte per entry and are mainly useful when your entry size is on the larger end, say a few kilobytes. Otherwise the size of filter compared to the size of the data is prohibitive. .."


Bloom Filters can be enabled per-ColumnFamily. Use HColumnDescriptor.setBloomFilterType(NONE | ROW | ROWCOL) to enable blooms per Column Family

Sunday, May 5, 2013

Retrieving Row Keys from HBase

When performing a table scan where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a MUST_PASS_ALL operator to the scanner using setFilter. The filter list should include both a FirstKeyOnlyFilter and a KeyOnlyFilter. Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk and minimal network traffic to the client for a single row.

As example:



List<Filter> filters = new ArrayList<Filter>();
filters.add(new FirstKeyOnlyFilter());
filters.add(new KeyOnlyFilter());
FilterList flist = new FilterList( FilterList.Operator.MUST_PASS_ALL, filters);
Scan scanner = new Scan();
scanner.setCaching(500);  
scanner.setFilter(flist);


try
{
HTableInterface table = environment.getTable("ARTICLES");
rs = table.getScanner(scanner);
for (Result r : rs)
{
                           .......................
                            .......................
}

       } 
rs.close();
}