All the traditional players such as SAS, IBM SPSS, KXEN, Matlab, Statsoft, Tableau, Pentaho, and others are working toward Hadoop-based Big Data analytics. However, each of these software players has to balance their current technology and customer portfolio along with the incredulous pace of innovation occurring in the open-source community. Most of the tools have connectors that are high-speed connectors to move data back and forth between Hadoop and their tool/environment. With Big Data, the objective is to keep the data in place and bring the analytics processing to the data to avoid the bottleneck and constraints associated with data movement. Over time, each vendor will develop a strategy and approach to keep data in place and move their analytics processing to the data.
In the meantime, there are new commercial vendors and open-source projects evolving to address the voracious appetite for Big Data analytics. Karmasphere (https://karmasphere.com/) is a native Hadoop-based tool for data exploration and visualization. Datameer (http://www.datameer.com/) is a spreadsheet-like presentation tool. Alpine Data Miner (http://www.alpinedatalabs.com/) has a cross-platform analytic workbench.
R (http://cran.r-project.org/) is by far the most dominant analytics tool in the Big Data space. R is an open-source statistical language with constructs that make it easy for data scientists to explore and build models. R is also renowned for the plethora of available analytics. There are libraries focused on industry problems (i.e., clinical trials, genetics, finance, and others) as well as general purpose libraries (i.e., econometrics, natural language processing, optimization, time series, and many more). At this point, there are supposedly over two million R users around the globe and a commercial distribution is available via Revolution Analytics.
Open-source technologies include:
- Apache Mahout, a scalable, Hadoop machine learning library, http://mahout.apache.org
- Apache Lucene, a high-performance text search library, http://lucene.apache.org/core
- Sofia ML, a fast machine learning library, http://code.google.com/p/sofia-ml
- Vowpal Wabbit, a Yahoo! Research project for fast, parallel-learning algorithms, http://hunch.net/∼vw
- Libocas, a library of support vector machine solvers, http://cmp.felk.cvut.cz/∼xfrancv/ocas/html
- Apache Hamster, an MPI for Hadoop, https://issues.apache.org/jira/browse/MAPREDUCE-2911
- Julia, a high-performance, parallel distribution analytics language for analytics computing, http://julialang.org/
Reference:
It was worth visiting your blog and I have bookmarked your blog. Hope to visit again
ReplyDeleteclick here
Selenium Training in Bangalore|
Selenium Training in Chennai
I simply wanted to thank you so much again. I am not sure the things that I might have gone through without the type of hints revealed by you regarding that situation.
ReplyDeletenebosh course in chennai
My spouse and I love your blog and find almost all of your post’s to be just what I’m looking for. can you offer guest writers to write content for you?
ReplyDeleteiosh course in chennai
This a new and interesting blog for big data.
ReplyDeleteBig Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery
Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
ReplyDeleteoracle training in chennai
oracle training in omr
oracle dba training in chennai
oracle dba training in omr
ccna training in chennai
ccna training in omr
seo training in chennai
seo training in omr
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteAws Training Institute In Hyderabad