Tuesday, September 24, 2013

Tipical NoSQL Big data solution ( part 1)

Big data components

In flow

This is actually the data that gets into the system. It can be files , any kind of events or web pages.. We dont care .

Distributor

When we recieve our in flow we need to distribute it. The distribution can be based on replication of the data to several destination or distributed according to some data details.
Example: If log record contains word : event - send it to HDFS only.
Examples: Apache flume,Logstash ,Fluentd


Storages - Long term, short term 

Then we save the data to storages. We have several types of storages and each one has its pros and cons.


Long term 

We need it to hold the whole data and analyze it by batch processing. In most of cases it will be hadoop based HDFS storage and we use Map-Reduce / Hive / Pig jobs run and create some reports.
As you can understand - its heavy and slow process.


Short term 

If we need our data to be easuly and fast accessible we will use some high scalable database.We have several types here:
Examples : Redis, Riak, Dynamo, Gemfire
Examples: Vertica,MonetDB

Examples : MongoDB, Cassandra,CouchDB
Examples : Neo4J

Data ModelPerformanceScalabilityFlexibilityComplexityFunctionality
Key–value Storeshighhighhighnonevariable (none)
Column Storehighhighmoderatelowminimal
Document Storehighvariable (high)highlowvariable (low)
Graph Databasevariablevariablehighhighgraph theory
Relational Databasevariablevariablelowmoderaterelational algebra.
The data is much faster acessible and much more structurized.



Real time processing

This component in most of cases will be Storm (http://storm-project.net/).It will pull the data ( in our case we use Kafka( http://kafka.apache.org/) and process it based on Short term and fast access data.
Probably it's decision should be sent to some external systems  to notify end user.


End User 

Will use some  stack for visualizing the data.
It also can contain a service for  querying data.In most of cases it will be against short term storages.

Next part is comming...

No comments:

Post a Comment