Showing posts with label HFile. Show all posts
Showing posts with label HFile. Show all posts

Wednesday, May 8, 2013

HBase internals investigation

Lets first investigate what we have in the HDFS:

$HADOOP_HOME/bin/hadoop dfs -lsr /hbase

This will give us following

drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/-ROOT-
-rw-r--r--   3 ramim supergroup        727 2013-04-24 20:09 /hbase/-ROOT-/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs
-rw-r--r--   3 ramim supergroup        421 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs/hlog.1366823395834
-rw-r--r--   3 ramim supergroup        109 2013-04-24 20:09 /hbase/-ROOT-/70236052/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/info
-rw-r--r--   3 ramim supergroup       1958 2013-05-07 21:35 /hbase/-ROOT-/70236052/info/e07b870165804cbabd690799a2de856c
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/.META.
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs/hlog.1366823395985
-rw-r--r--   3 ramim supergroup        111 2013-04-24 20:09 /hbase/.META./1028785192/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/info
-rw-r--r--   3 ramim supergroup       8539 2013-05-07 21:35 /hbase/.META./1028785192/info/ace47eeb4dbb4de9ba5f2ec03534be31
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.logs
drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:50 /hbase/.oldlogs
-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table
-rw-r--r--   3 ramim supergroup        697 2013-04-28 14:50 /hbase/key_table/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs/hlog.1367149833692
-rw-r--r--   3 ramim supergroup        234 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1
-rw-r--r--   3 ramim supergroup        998 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/recovered.edits

Lets discuss following groups:
Root level:

drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634

Logs directory contains data for every region server in the cluster.All Regions in the region server share the same HLog file that is rolled . Old files moved to /hbase/.oldlogs folder.


-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
Those files describe cluster id and version of file format.
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
This folder is used to store corrupted logs
/hbase/splitlog/ - is used during split process


Table Level
You have folder for every Table in teh cluster
 /hbase/key_table/.tableinfo 
the file holds table descriptor in serialzied form
/hbase/key_table/.tmp
is used for temporary data, for exmaple during .tableinfo fiel update

Region Level
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
Contains information about the region - HRegionInfo class represents it
This is the folder structure of all the data in the region:
/<hbase-root-dir>/<tablename>/<encoded-regionname>/<column-family>/<filename>
Example:
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b

All the data is constructed in HFiles which can be read by 
"$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile"

illin793!ramim:~/Hbase2/hadoop/runtime/hbase-0.94.0 /bin/ hbase org.apache.hadoop.hbase.io.hfile.HFile f /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad -v -p
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
13/05/08 18:23:30 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available. 
Scanning -> /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad
13/05/08 18:23:30 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 241.7m
K: r1/cf1:c1/1366823596580/Put/vlen=2/ts=0 V: v1
K: r2/cf1:c1/1366823602288/Put/vlen=2/ts=0 V: v2
K: r3/cf1:c1/1366823606557/Put/vlen=2/ts=0 V: v3
Scanned kv count -> 3


The way you can retrieve the key value that exist in the file.
For more details aboput HFile structure:

HFile structure on cloudera