Wednesday, May 8, 2013

HBase internals investigation

Lets first investigate what we have in the HDFS:

$HADOOP_HOME/bin/hadoop dfs -lsr /hbase

This will give us following

drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/-ROOT-
-rw-r--r--   3 ramim supergroup        727 2013-04-24 20:09 /hbase/-ROOT-/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs
-rw-r--r--   3 ramim supergroup        421 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs/hlog.1366823395834
-rw-r--r--   3 ramim supergroup        109 2013-04-24 20:09 /hbase/-ROOT-/70236052/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/info
-rw-r--r--   3 ramim supergroup       1958 2013-05-07 21:35 /hbase/-ROOT-/70236052/info/e07b870165804cbabd690799a2de856c
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/.META.
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs/hlog.1366823395985
-rw-r--r--   3 ramim supergroup        111 2013-04-24 20:09 /hbase/.META./1028785192/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/info
-rw-r--r--   3 ramim supergroup       8539 2013-05-07 21:35 /hbase/.META./1028785192/info/ace47eeb4dbb4de9ba5f2ec03534be31
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.logs
drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:50 /hbase/.oldlogs
-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table
-rw-r--r--   3 ramim supergroup        697 2013-04-28 14:50 /hbase/key_table/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs/hlog.1367149833692
-rw-r--r--   3 ramim supergroup        234 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1
-rw-r--r--   3 ramim supergroup        998 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/recovered.edits

Lets discuss following groups:
Root level:

drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634

Logs directory contains data for every region server in the cluster.All Regions in the region server share the same HLog file that is rolled . Old files moved to /hbase/.oldlogs folder.


-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
Those files describe cluster id and version of file format.
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
This folder is used to store corrupted logs
/hbase/splitlog/ - is used during split process


Table Level
You have folder for every Table in teh cluster
 /hbase/key_table/.tableinfo 
the file holds table descriptor in serialzied form
/hbase/key_table/.tmp
is used for temporary data, for exmaple during .tableinfo fiel update

Region Level
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
Contains information about the region - HRegionInfo class represents it
This is the folder structure of all the data in the region:
/<hbase-root-dir>/<tablename>/<encoded-regionname>/<column-family>/<filename>
Example:
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b

All the data is constructed in HFiles which can be read by 
"$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile"

illin793!ramim:~/Hbase2/hadoop/runtime/hbase-0.94.0 /bin/ hbase org.apache.hadoop.hbase.io.hfile.HFile f /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad -v -p
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
13/05/08 18:23:30 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available. 
Scanning -> /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad
13/05/08 18:23:30 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 241.7m
K: r1/cf1:c1/1366823596580/Put/vlen=2/ts=0 V: v1
K: r2/cf1:c1/1366823602288/Put/vlen=2/ts=0 V: v2
K: r3/cf1:c1/1366823606557/Put/vlen=2/ts=0 V: v3
Scanned kv count -> 3


The way you can retrieve the key value that exist in the file.
For more details aboput HFile structure:

HFile structure on cloudera













No comments:

Post a Comment