Showing posts with label internals. Show all posts
Showing posts with label internals. Show all posts

Wednesday, May 8, 2013

HBase internals investigation

Lets first investigate what we have in the HDFS:

$HADOOP_HOME/bin/hadoop dfs -lsr /hbase

This will give us following

drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/-ROOT-
-rw-r--r--   3 ramim supergroup        727 2013-04-24 20:09 /hbase/-ROOT-/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs
-rw-r--r--   3 ramim supergroup        421 2013-04-24 20:09 /hbase/-ROOT-/70236052/.oldlogs/hlog.1366823395834
-rw-r--r--   3 ramim supergroup        109 2013-04-24 20:09 /hbase/-ROOT-/70236052/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/-ROOT-/70236052/info
-rw-r--r--   3 ramim supergroup       1958 2013-05-07 21:35 /hbase/-ROOT-/70236052/info/e07b870165804cbabd690799a2de856c
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/-ROOT-/70236052/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 16:21 /hbase/.META.
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192
drwxr-xr-x   - ramim supergroup          0 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-24 20:09 /hbase/.META./1028785192/.oldlogs/hlog.1366823395985
-rw-r--r--   3 ramim supergroup        111 2013-04-24 20:09 /hbase/.META./1028785192/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 21:35 /hbase/.META./1028785192/info
-rw-r--r--   3 ramim supergroup       8539 2013-05-07 21:35 /hbase/.META./1028785192/info/ace47eeb4dbb4de9ba5f2ec03534be31
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.META./1028785192/recovered.edits
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/.logs
drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:50 /hbase/.oldlogs
-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table
-rw-r--r--   3 ramim supergroup        697 2013-04-28 14:50 /hbase/key_table/.tableinfo.0000000001
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4
drwxr-xr-x   - ramim supergroup          0 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs
-rw-r--r--   3 ramim supergroup        134 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.oldlogs/hlog.1367149833692
-rw-r--r--   3 ramim supergroup        234 2013-04-28 14:50 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.tmp
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1
-rw-r--r--   3 ramim supergroup        998 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b
drwxr-xr-x   - ramim supergroup          0 2013-05-07 18:49 /hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/recovered.edits

Lets discuss following groups:
Root level:

drwxr-xr-x   - ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993
-rw-r--r--   3 ramim supergroup        305 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941758317
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772050
-rw-r--r--   3 ramim supergroup        407 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772232
-rw-r--r--   3 ramim supergroup        957 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772250
-rw-r--r--   3 ramim supergroup        618 2013-05-07 18:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367941772325
-rw-r--r--   3 ramim supergroup        383 2013-05-07 19:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1367945372388
-rw-r--r--   3 ramim supergroup          0 2013-05-08 17:49 /hbase/.logs/illin793,12200,1367941725993/illin793%2C12200%2C1367941725993.1368024573634

Logs directory contains data for every region server in the cluster.All Regions in the region server share the same HLog file that is rolled . Old files moved to /hbase/.oldlogs folder.


-rw-r--r--   3 ramim supergroup         38 2013-04-24 20:09 /hbase/hbase.id
-rw-r--r--   3 ramim supergroup          3 2013-04-24 20:09 /hbase/hbase.version
Those files describe cluster id and version of file format.
drwxr-xr-x   - ramim supergroup          0 2013-04-25 11:22 /hbase/.corrupt
This folder is used to store corrupted logs
/hbase/splitlog/ - is used during split process


Table Level
You have folder for every Table in teh cluster
 /hbase/key_table/.tableinfo 
the file holds table descriptor in serialzied form
/hbase/key_table/.tmp
is used for temporary data, for exmaple during .tableinfo fiel update

Region Level
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/.regioninfo
Contains information about the region - HRegionInfo class represents it
This is the folder structure of all the data in the region:
/<hbase-root-dir>/<tablename>/<encoded-regionname>/<column-family>/<filename>
Example:
/hbase/key_table/ad1630980cc715c4ce499dabd27bf0b4/cf1/d838a37633834361883a283cc28f4c7b

All the data is constructed in HFiles which can be read by 
"$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile"

illin793!ramim:~/Hbase2/hadoop/runtime/hbase-0.94.0 /bin/ hbase org.apache.hadoop.hbase.io.hfile.HFile f /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad -v -p
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
13/05/08 18:23:30 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
13/05/08 18:23:30 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available. 
Scanning -> /hbase/tableName6/01666734621bc7e2028f8915a4b8e3e4/cf1/6c6dfee1626043f29148cd25f04c3dad
13/05/08 18:23:30 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 241.7m
K: r1/cf1:c1/1366823596580/Put/vlen=2/ts=0 V: v1
K: r2/cf1:c1/1366823602288/Put/vlen=2/ts=0 V: v2
K: r3/cf1:c1/1366823606557/Put/vlen=2/ts=0 V: v3
Scanned kv count -> 3


The way you can retrieve the key value that exist in the file.
For more details aboput HFile structure:

HFile structure on cloudera













Monday, April 22, 2013

HBase internals

Tables in HBase are organized into rows and columns.HBase treats columns a little differently than a relational database. Columns inHBase are organized into groups called column families.

Architecture

Table—HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path.
Row—Within a table, data is stored according to its row. Rows are identifieduniquely by their rowkey. Rowkeys don’t have a data type and are always treated as a byte[].
Column family—Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase. For this reason, they must be defined up front and aren’t easily modified. Every row in a tablehas the same column families, although a row need not store data in all its families.Column family names are Strings and composed of characters that are safe for use in a file system path.
Column qualifier—Data within a column family is addressed via its column qualifier,or column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows. Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[].
Cell—A combination of rowkey, column family, and column qualifier uniquelyidentifies a cell. The data stored in a cell is referred to as that cell’s value. Values also don’t have a data type and are always treated as a byte[].
Version—Values within a cell are versioned. Versions are identified by their timestamp, a long. When a version isn’t specified, the current timestamp is used asthe basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three.
 

Commands :

How Write Works

Hbase recieves the command and writes the change to 2 destinations :
1] Write-Ahead log (WAL) - HLog.
2] Memstore - which is written to disk in form of Hfile - every flush new file is created.
HFile belong to column family 1 mem store per column family
If failure occurs - data can be recovered from WAL which is single per server shared by all You can disable WAL but failover will be disabled for the command. If system crushes - you loose you data.

How Read Works:

HBase has an LRU cache for reads – BlockCache. BlockCache is designed to keep frequently accessed data from the HFiles in memory so as to avoid disk reads. Each column family has its own BlockCache .The “Block” in BlockCache is the unit of data that HBase reads from disk in a single pass
Checks the memstore for pending modifictations that were not flushed. Acess CacheBlock if the block contains the row what we want to access Then goes to Hfile to load data
Note: Data of some row contains over all Many HFiles.

 

How delete works

The Delete command marks the record for deletion .Indicates that nor scan or get should return those values Hfiles are immutables, means never updated unitll Major compaction occurs
 

Compaction


Minor an Major compaction

A minor compaction folds HFiles together, creating a larger HFile from multiple smaller Hfiles HBase reads the content of the existing HFiles, writing records into a new one. Then, it swaps in the new HFile as the current active one and deletes the old ones that formed the new One.Minor compactions are designed to be minimally detrimental to HBase performance,S o there is an upper limit on the number of HFiles involved. All of these settings are configurable.
When a compaction operates over all HFiles in a column family in a given region, it’scalled a major compaction. Upon completion of a major compaction, all HFiles in the  column family are merged into a single file. Major compactions can also be triggered for the entire table (or a particular region) manually from the shell. This is a relatively expensive operation and isn’t done often. Minor compactions, on the other hand, are relatively lightweight and happen more frequently. Major compactions are the only chance HBase has to clean up deleted records. Resolving a delete requires removing both the deleted record and the deletion marker. There’s no guarantee that both the record and marker are in the same HFile. A major compaction is the only time whenHBase is guaranteed to have access to both of these entries at the same time.

image
image
All examples and details taken from book : Hbase in Action.