my dog learned polymorphism
The moose likes Other Big Data and the fly likes big data and Java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Other Big Data
Bookmark "big data and Java" Watch "big data and Java" New topic
Author

big data and Java

Rakesh Kannan
Greenhorn

Joined: May 15, 2013
Posts: 3
How java is going to be used in Big data Technologies.Please share if there is any real time examples
Hussein Baghdadi
clojure forum advocate
Bartender

Joined: Nov 08, 2003
Posts: 3479

What do you mean exactly?
Cassandra, HBase, Hadoop, Mahout, SenseiDB, Storm (and many others) are written in Java programming language.
Rakesh Kannan
Greenhorn

Joined: May 15, 2013
Posts: 3
Hi Hussein,

My question is what are the Java-based programming frameworks available, that supports the processing of large data sets in a distributed computing environment

Cheers,
Rakesh
Sumit Bisht
Ranch Hand

Joined: Jul 02, 2008
Posts: 330

Rakesh, did you do a google search for the tools/frameworks/databases listed by Hussein?
Hussein Baghdadi
clojure forum advocate
Bartender

Joined: Nov 08, 2003
Posts: 3479

Rakesh Kannan wrote:Hi Hussein,

My question is what are the Java-based programming frameworks available, that supports the processing of large data sets in a distributed computing environment

Cheers,
Rakesh


Hadoop, Storm, Cascading, Cascalog and Mahout (targeted toward machine learning) are some frameworks to mention.
Rakesh Kannan
Greenhorn

Joined: May 15, 2013
Posts: 3
Thanks a lot guys for your reply.....
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 2234
    
  23

There's also some interesting ideas coming up around other languages and Big Data on the JVM, especially in functional programming. People are finding ways to use FP languages like Clojure and Scala for Big Data processing, taking advantage of features that support parallelism, streaming etc. And the Map-Reduce model is itself inspired by well-established concepts in FP. So I think there are lots of interesting opportunities available to make good use of the JVM in Big Data applications, even going beyond the current Java/Hadoop mainstream.


No more Blub for me, thank you, Vicar.
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9967
    
    3

I'm with Chris on this. The ideas and concepts that are prevalent in the FP domain fits like a glove to Big Data.


SCJP 1.4, SCWCD 1.4 - Hints for you, Certified Scrum Master
Did a rm -R / to find out that I lost my entire Linux installation!
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 2234
    
  23

Joe Harry wrote:I'm with Chris on this. The ideas and concepts that are prevalent in the FP domain fits like a glove to Big Data.

Dean Wampler (of Typesafe) described "copious" data as the killer app for functional programming last year.
ratnesh singh
Greenhorn

Joined: Jun 11, 2015
Posts: 3
Nevertheless Big Data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make businesses more agile, and to answer questions that were previously considered beyond our reach. major open-source Java based tools that are available today and support Big Data :

1.HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. HDFS is specifically designed for storing vast amount of data, so it is optimized for storing/accessing a relatively small number of very large files compared to traditional file systems where are optimized to handle large numbers of relatively small files.

2.Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

3.Apache HBase is the Hadoop database, a distributed, scalable, big data store. It provides random, realtime read/write access to Big Data and is optimized for hosting very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. In its core Apache HBase is a distributed, versioned, column-oriented store modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

4.The Apache Cassandra is a performant, linear scalable and high available database that can run on commodity hardware or cloud infrastructure making it the perfect platform for mission-critical data

5.Apache Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

6.Apache Pig is a platform for analyzing large data sets. It consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Pig’s infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs. Pig’s language layer currently consists of a textual language called Pig Latin, which is developed with ease of programming, optimization opportunities and extensibility in mind.

7.Apache Chukwa is an open source data collection system for monitoring large distributed systems. It is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness

8.Apache Ambari is a web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop

9.Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

10.Apache HCatalog is a table and storage management service for data created using Apache Hadoop. This includes:
Providing a shared schema and data type mechanism.
Providing a table abstraction so that users need not be concerned with where or how their data is stored.
Providing interoperability across data processing tools such as Pig, Map Reduce, and Hive.

these all are real time based java open source tools and by these java based tools we can handle big data easily and effectively. to get more real time examples and project based training you can preffer following link as given.http://alturl.com/make_url.php?action=a12220265
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: big data and Java
 
It's not a secret anymore!