Monday, April 22, 2013

HBase internals

Tables in HBase are organized into rows and columns.HBase treats columns a little differently than a relational database. Columns inHBase are organized into groups called column families.

Architecture

Table—HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path.
Row—Within a table, data is stored according to its row. Rows are identifieduniquely by their rowkey. Rowkeys don’t have a data type and are always treated as a byte[].
Column family—Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase. For this reason, they must be defined up front and aren’t easily modified. Every row in a tablehas the same column families, although a row need not store data in all its families.Column family names are Strings and composed of characters that are safe for use in a file system path.
Column qualifier—Data within a column family is addressed via its column qualifier,or column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows. Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[].
Cell—A combination of rowkey, column family, and column qualifier uniquelyidentifies a cell. The data stored in a cell is referred to as that cell’s value. Values also don’t have a data type and are always treated as a byte[].
Version—Values within a cell are versioned. Versions are identified by their timestamp, a long. When a version isn’t specified, the current timestamp is used asthe basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three.
 

Commands :

How Write Works

Hbase recieves the command and writes the change to 2 destinations :
1] Write-Ahead log (WAL) - HLog.
2] Memstore - which is written to disk in form of Hfile - every flush new file is created.
HFile belong to column family 1 mem store per column family
If failure occurs - data can be recovered from WAL which is single per server shared by all You can disable WAL but failover will be disabled for the command. If system crushes - you loose you data.

How Read Works:

HBase has an LRU cache for reads – BlockCache. BlockCache is designed to keep frequently accessed data from the HFiles in memory so as to avoid disk reads. Each column family has its own BlockCache .The “Block” in BlockCache is the unit of data that HBase reads from disk in a single pass
Checks the memstore for pending modifictations that were not flushed. Acess CacheBlock if the block contains the row what we want to access Then goes to Hfile to load data
Note: Data of some row contains over all Many HFiles.

 

How delete works

The Delete command marks the record for deletion .Indicates that nor scan or get should return those values Hfiles are immutables, means never updated unitll Major compaction occurs
 

Compaction


Minor an Major compaction

A minor compaction folds HFiles together, creating a larger HFile from multiple smaller Hfiles HBase reads the content of the existing HFiles, writing records into a new one. Then, it swaps in the new HFile as the current active one and deletes the old ones that formed the new One.Minor compactions are designed to be minimally detrimental to HBase performance,S o there is an upper limit on the number of HFiles involved. All of these settings are configurable.
When a compaction operates over all HFiles in a column family in a given region, it’scalled a major compaction. Upon completion of a major compaction, all HFiles in the  column family are merged into a single file. Major compactions can also be triggered for the entire table (or a particular region) manually from the shell. This is a relatively expensive operation and isn’t done often. Minor compactions, on the other hand, are relatively lightweight and happen more frequently. Major compactions are the only chance HBase has to clean up deleted records. Resolving a delete requires removing both the deleted record and the deletion marker. There’s no guarantee that both the record and marker are in the same HFile. A major compaction is the only time whenHBase is guaranteed to have access to both of these entries at the same time.

image
image
All examples and details taken from book : Hbase in Action.



















No comments:

Post a Comment