There's also some interesting ideas coming up around other languages and Big Data on the JVM, especially in functional programming. People are finding ways to use FP languages like Clojure and Scala for Big Data processing, taking advantage of features that support parallelism, streaming etc. And the Map-Reduce model is itself inspired by well-established concepts in FP. So I think there are lots of interesting opportunities available to make good use of the JVM in Big Data applications, even going beyond the current Java/Hadoop mainstream.
Nevertheless Big Data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make businesses more agile, and to answer questions that were previously considered beyond our reach. major open-source Java based tools that are available today and support Big Data :
1.HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. HDFS is specifically designed for storing vast amount of data, so it is optimized for storing/accessing a relatively small number of very large files compared to traditional file systems where are optimized to handle large numbers of relatively small files.
2.Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
3.Apache HBase is the Hadoop database, a distributed, scalable, big data store. It provides random, realtime read/write access to Big Data and is optimized for hosting very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. In its core Apache HBase is a distributed, versioned, column-oriented store modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
4.The Apache Cassandra is a performant, linear scalable and high available database that can run on commodity hardware or cloud infrastructure making it the perfect platform for mission-critical data
5.Apache Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
6.Apache Pig is a platform for analyzing large data sets. It consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Pig’s infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs. Pig’s language layer currently consists of a textual language called Pig Latin, which is developed with ease of programming, optimization opportunities and extensibility in mind.
7.Apache Chukwa is an open source data collection system for monitoring large distributed systems. It is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness
8.Apache Ambari is a web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop
9.Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services
10.Apache HCatalog is a table and storage management service for data created using Apache Hadoop. This includes:
Providing a shared schema and data type mechanism.
Providing a table abstraction so that users need not be concerned with where or how their data is stored.
Providing interoperability across data processing tools such as Pig, Map Reduce, and Hive.
these all are real time based java open source tools and by these java based tools we can handle big data easily and effectively. to get more real time examples and project based training you can preffer following link as given.http://alturl.com/make_url.php?action=a12220265
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com