Friday, January 10, 2014

Order the mess : ACID ,CAP theorem and NoSQL



Database transaction
Lets define a transaction as unit  of work that is reliable and recoverable. By definition transaction must support ACID features :
 Atomic - if transaction contains several actions it can occur in all-or-nothing manner.
Consistent - if i commit some action - all other actions see the change immediately.
Isolated - 2 transactions that run on the same data will see different copies until on of them commits changes.
Example
1 transactions reads some value when other updates it.Untill transaction B commits - transaction A will see an old value.
Durable -Once transaction is committed -even if the system crushes, the transaction results not going to get lost.Its achieved by managing transaction logs that can be replayed,

Great! we have 4 properties of a transaction.

All those properties achieved in different ways .


CAP theorem
Its set of  requirements for distributed system
Consistency- all servers in the distributed system will be in the same state.
Availability - you will be always be able to get data out of the system ( even if isn't consistent)
Partition tolerance - service will stay available even if some parts of he system crushes.

So its proved that all 3 of them can't be achieved at the same time.And its required to have 2 out of 3.

So lets make some order :
1. CAP are properties of distributed system.
2. ACID are futures of transaction .

Transactions are futures of relational databases. like MySQL, Oracle etc.

Big Data era.

Amounts of data  that is being generated in last years caused  scale and performance to be most important over some ACID futures. Providing Consistency for example causes the system to be much slower. And sometimes we can live with an idea that the data will not be consistency at the same second ( but finally yes!)

That causes the idea of other DB systems to get popular . Some of them provide parts of ACID , some of them don't have SQL as main interface. And we call them NoSQL databases.
Ideas of Relational storage is avoided in some of them.
But as main concept they provide 2 out of 3 CAP requirements, avoid ACID and solve some predefined problem in data storage, like graph relations, columnar storage etc.

Which means the main focus now is on data instead of the system.You look, investigate the data then you look for storage solution instead of taking some Relational storage and starting modelling the data .




http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
http://en.wikipedia.org/wiki/Eventual_consistency
http://www.cs.berkeley.edu/~istoica/classes/cs268/06/notes/20-BFTx2.pdf




No comments:

Post a Comment