Rami On The Web: Big Data analytics

All the traditional players such as SAS, IBM SPSS, KXEN, Matlab, Statsoft, Tableau, Pentaho, and others are working toward Hadoop-based Big Data analytics. However, each of these software players has to balance their current technology and customer portfolio along with the incredulous pace of innovation occurring in the open-source community. Most of the tools have connectors that are high-speed connectors to move data back and forth between Hadoop and their tool/environment. With Big Data, the objective is to keep the data in place and bring the analytics processing to the data to avoid the bottleneck and constraints associated with data movement. Over time, each vendor will develop a strategy and approach to keep data in place and move their analytics processing to the data.

In the meantime, there are new commercial vendors and open-source projects evolving to address the voracious appetite for Big Data analytics. Karmasphere (https://karmasphere.com/) is a native Hadoop-based tool for data exploration and visualization. Datameer (http://www.datameer.com/) is a spreadsheet-like presentation tool. Alpine Data Miner (http://www.alpinedatalabs.com/) has a cross-platform analytic workbench.

R (http://cran.r-project.org/) is by far the most dominant analytics tool in the Big Data space. R is an open-source statistical language with constructs that make it easy for data scientists to explore and build models. R is also renowned for the plethora of available analytics. There are libraries focused on industry problems (i.e., clinical trials, genetics, finance, and others) as well as general purpose libraries (i.e., econometrics, natural language processing, optimization, time series, and many more). At this point, there are supposedly over two million R users around the globe and a commercial distribution is available via Revolution Analytics.

Open-source technologies include:

Apache Mahout, a scalable, Hadoop machine learning library, http://mahout.apache.org
Apache Lucene, a high-performance text search library, http://lucene.apache.org/core
Sofia ML, a fast machine learning library, http://code.google.com/p/sofia-ml
Vowpal Wabbit, a Yahoo! Research project for fast, parallel-learning algorithms, http://hunch.net/∼vw
Libocas, a library of support vector machine solvers, http://cmp.felk.cvut.cz/∼xfrancv/ocas/html
Apache Hamster, an MPI for Hadoop, https://issues.apache.org/jira/browse/MAPREDUCE-2911
Julia, a high-performance, parallel distribution analytics language for analytics computing, http://julialang.org/

Reference:

Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses

Rami On The Web

Thursday, September 5, 2013

Big Data analytics - tools

Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses

6 comments: