Friday, January 10, 2014
Order the mess : ACID ,CAP theorem and NoSQL
Database transaction
Lets define a transaction as unit of work that is reliable and recoverable. By definition transaction must support ACID features :
Atomic - if transaction contains several actions it can occur in all-or-nothing manner.
Consistent - if i commit some action - all other actions see the change immediately.
Isolated - 2 transactions that run on the same data will see different copies until on of them commits changes.
Example
1 transactions reads some value when other updates it.Untill transaction B commits - transaction A will see an old value.
Durable -Once transaction is committed -even if the system crushes, the transaction results not going to get lost.Its achieved by managing transaction logs that can be replayed,
Great! we have 4 properties of a transaction.
All those properties achieved in different ways .
CAP theorem
Its set of requirements for distributed system
Consistency- all servers in the distributed system will be in the same state.
Availability - you will be always be able to get data out of the system ( even if isn't consistent)
Partition tolerance - service will stay available even if some parts of he system crushes.
So its proved that all 3 of them can't be achieved at the same time.And its required to have 2 out of 3.
So lets make some order :
1. CAP are properties of distributed system.
2. ACID are futures of transaction .
Transactions are futures of relational databases. like MySQL, Oracle etc.
Big Data era.
Amounts of data that is being generated in last years caused scale and performance to be most important over some ACID futures. Providing Consistency for example causes the system to be much slower. And sometimes we can live with an idea that the data will not be consistency at the same second ( but finally yes!)
That causes the idea of other DB systems to get popular . Some of them provide parts of ACID , some of them don't have SQL as main interface. And we call them NoSQL databases.
Ideas of Relational storage is avoided in some of them.
But as main concept they provide 2 out of 3 CAP requirements, avoid ACID and solve some predefined problem in data storage, like graph relations, columnar storage etc.
Which means the main focus now is on data instead of the system.You look, investigate the data then you look for storage solution instead of taking some Relational storage and starting modelling the data .
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
http://en.wikipedia.org/wiki/Eventual_consistency
http://www.cs.berkeley.edu/~istoica/classes/cs268/06/notes/20-BFTx2.pdf
Thursday, January 9, 2014
Couchbase : almost document-oriented database
After deep investigation of several NoSQL DB's Looks like couchbase is one of best options
1. It has JSON support
like document-oriented DB but it doesn't indexes all the fileds on all levels of the document ( good and bad!).
It something between Document-Oriented DB and Key-Value store with an option for custom Indexes View creation which is actually a simple map reduce job that runs automaticly.
2. Auto-Sharding
In 1 click you can simply add servers.
3.Cross Cluster replication
4. Object level cache based on memcached
5. Built in management and monitoring tool
6. Asynchronous persistence.
7. Benchmarks:
http://www.couchbase.com/presentations/benchmarking-couchbase
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf
8. Best practices
10. Supports ( not offically) GeoSearch
11. Good API's
Contains Bulk modes ( on Views)
CAS oprations ( optimistic locking)
and much more
It looks like it fits all avarage needs but during the work you need to pay attention to some things like:
1. Working with keys and values instead of indexes ( indexes defined by map - reduce jobs update incrementally and you can not control it too much)
2. Compaction on views and buckets exist and can be controlled.
3. Don't define more than couple of buckets. ( Read attached best practices paper)
have fun with couchbase.
Subscribe to:
Posts (Atom)