Monday, April 29, 2013

HBase row keys design rules


Monolitical increasing rowkeys are bad.

Since if you have a heavy load of events they all will be located in single region, Then the region will get split. but new arrived entries will be still in only 1 of the split regions.And so on… so – think about good destribution that will not cause to region disbalance.
(Next 2 have improvments in latest HBase version  - look here Data Block Encodings

Use shortest column family names as possible.

This is because of KeyValue structure.Names are saved inside the strucutre.

Use shortest row keys as possible.

This is because of KeyValue structure.Names are saved inside the strucutre.

Construct row key in proper way and check the byte array what is getting constructed.

Examlpe:
image
When result is :
image
As you can see – result arrays are different.
Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision

Always remember the ASCII table to see correct order of the character,.

asciifull
Accroding to the ASCII table you can also see that between numerics [0-9] are followed by [A-Z] and only then you have set of [a-z]. Remember is when you split tables.
Better when its possible use only capital letters in key.

No comments:

Post a Comment