wood burning stoves*
The moose likes General Computing and the fly likes What exactly is Big data ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » General Computing
Bookmark "What exactly is Big data ? " Watch "What exactly is Big data ? " New topic
Author

What exactly is Big data ?

Mrinal Singh
Greenhorn

Joined: May 30, 2014
Posts: 2
Since I have been working, for around 1.5 year or so, I have heard a lot about hadoop and stuff, big data, and other things, I was really fascinated about this that how so many people have gone to take this training, even some of my friends too, so thought of posting it,, I mean where actually this trend going and How far it can go ? because I too was thinking about going for the course , even anyone already gone for this course or working in this field, can give some suggestions on this, that would be great!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Welcome to the Ranch, Mrinal Singh!

I was going to post a link to the Gartner report about the "hype cycle" with respect to Big Data, but then I saw they were charging $1,995 for their report and I was pretty sure that wouldn't be what you had in mind.

In case you haven't heard of the "hype cycle", it's a process which the Gartner group has observed to happen with many technical developments. The interest in a subject starts from nothing, then rises to a peak of "inflated expectations". After that, people start losing interest and the level drops to a "trough of disillusionment". At that point everybody who seized on the idea but found they just wasted their time has left the field and the interest in the subject reaches a plateau. Here's a link which explains the methodology in more detail: Hype Cycle Research Methodology

As for the pricey report itself, here's a link to a brief summary of it online: Big data is reaching the peak of its hype, Gartner says. As you can see there's a very large number of topics under the heading of "Big Data"; most of them are still early on the upwards curve (i.e. over-hyped) but a lot of them are on the downwards curve (disillusionment setting in).

And as for where the trend will go: what the methodology doesn't mention is that it isn't guaranteed that any of the items on the curve will actually survive and go on to that plateau where things are established technology. So if you're looking for a job which you can be assured will still exist five years from now, Big Data probably isn't that. But if you're willing to go into the field and accept you may need to look for something else in two or three years, then by all means go for it. And if you do go into the field, you should expect for it to be considerably transformed over the next few years anyway, requiring continuous training.
Joe Areeda
Ranch Hand

Joined: Apr 15, 2011
Posts: 307
    
    2

Well I don't want to argue with Paul, but to be a little less cynical.

I would define Big Data as the problem and Hadoop, noSql, MongoDb and the like as nascent possible solutions for some subsets. The hype is that these approaches will solve THE problem is marketing.

The Big Data problem is exemplified by the NSA collection of all the metadata of all the phone calls made in the world (probably). Just think of the questions you could answer with this. It could tell my wife that I lied when I said I called but nobody picked up. But it so much data nobody knows what questions to ask let alone how to get the answer. And how would my wife be able to use it to get a quick and cheap answer to her specific question? What I expect happens is that some big shot says "did these people contact those people" and an army of programmers write new code to check. Also another army of theorists are writing code that says "using the statistical buzzword of the day, what is the most significant thing happening right now?". Useful yes, but let's just say the user experience is lacking for those with less than a billion dollar budget.

The tools being hyped seem to address specific subsets of the problem and some do pretty well at subsets of those subsets, like "given all this data on SSD, spinning disk and some archive like Blu-Ray or tape robots how can I get through the section of it I need as efficiently as possible to run my specialized algorithm to answer this specific question?

The problem, as my incomplete knowledge sees it, is that nobody knows a priori what the questions are and what the data looks like, so nobody knows how to look for the answers or how to structure queries into unstructured data. We do know that the kind of data we are are collecting is not optimal to be COMPLETELY stored in relational databses. I emphasize again completely. I think the noSql buzzword is also over-hyped.

Enough from me.

Joe

It's not what your program can do, it's what your users do with the program.
Andrew Purpos
Greenhorn

Joined: May 27, 2014
Posts: 2
Mrinal Singh wrote:Since I have been working, for around 1.5 year or so, I have heard a lot about hadoop and stuff, big data, and other things, I was really fascinated about this that how so many people have gone to take this training, even some of my friends too, so thought of posting it,, I mean where actually this trend going and How far it can go ? because I too was thinking about going for the course , even anyone already gone for this course or working in this field, can give some suggestions on this, that would be great!



If not probable, it is right now the most trending technology, that is why there is so much hype about it. As the amount of data is exponentially scaling up, one needs the much needed technology to handle such big data, and that came up with evolution of technologies like Hadoop, MongoDB etc. For eg : Social media data is providing remarkable insights to companies on consumer behavior and sentiment that can be integrated with CRM data for analysis, with 230 million tweets posted on Twitter per day, 2.7 billion Likes and comments added to Facebook every day, and 60 hours of video uploaded to YouTube every minute (this is what we mean by velocity of data).

I won't say it is a new trend, as Big data has been around for long time, only difference is now Big data is accessible to regular BI users. Even the studies show that big data adoption will continue to grow and projected $16.9B market by 2015. Even I read one of the blogs while I was doing some research work on it : "Fat Paychecks awaits Hadoop experts".

If you are into this Data analysis field or have an idea about Java/OOPS , then you can really get into this learning easily. It would take around 4-5 weeks training, and rest is the practice as well as time.

Hope it helps you!
Roger Sterling
Ranch Hand

Joined: Apr 06, 2012
Posts: 426

Wikipedia has a nice article. http://en.wikipedia.org/wiki/Big_data

In short, a collection of data can become so large that traditional ways of addressing it are not useful. For example SQL table joins timeout.

The solution then are creative ways to process the mountain so that you can get the business value you need. The term "Big Data" is associated with techniques that people use to access the mountain.
Mrinal Singh
Greenhorn

Joined: May 30, 2014
Posts: 2
Thank you everyone for the gesture! It helped, specially the blog, and it makes sense that these big data technologies are making a big turn. I'll be joining soon for the course,I have seen couple of institutes providing online training for it, Any list of institutes that you want me to try for ?
Andrew Purpos
Greenhorn

Joined: May 27, 2014
Posts: 2
Mrinal Singh wrote:Thank you everyone for the gesture! It helped, specially the blog, and it makes sense that these big data technologies are making a big turn. I'll be joining soon for the course,I have seen couple of institutes providing online training for it, Any list of institutes that you want me to try for ?



You might try for Wiziq , Lynda or Udemy. You can find video courses at Lynda or udemy, or you can find the live online classes at Wiziq , that will cost you around 149 USD. http://www.wiziq.com/course/21308-hadoop-big-data-training
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1615
    
  13

Mrinal Singh wrote:Thank you everyone for the gesture! It helped, specially the blog, and it makes sense that these big data technologies are making a big turn. I'll be joining soon for the course,I have seen couple of institutes providing online training for it, Any list of institutes that you want me to try for ?

As noted above, "big data" covers a range of technologies such as Hadoop and NoSQL databases, as well as the more important question of what makes "big data" different from regular data. Here are some free options for finding out more about these different topics (be sure to select the "free courseware" option if you sign up for one of these):

  • Coursera Introduction to Data Science - free course (starts 30 June) on "data science", looks at a number of ideas around "big data", including map-reduce/Hadoop, and includes some basic programming exercises with Python.
  • Udacity Introduction to Data Science - another free online course on data science and related technologies. Looks like it might go into a bit more depth than the Coursera course.
  • Udacity Beginner Hadoop and MapReduce - introductory tutorial to using Hadoop, based on Cloudera's Hadoop distribution.
  • MongoDB University offer regular free online courses in programming for the popular NoSQL database MongoDB, as well as MongoDB for DBAs. Highly recommended.


  • And here's a sensible talk from MongoDB's VP of Corporate Strategy, Matt Asay, on some of the practical questions and options around "big data" technology:

    http://www.infoq.com/presentations/big-data-nosql-comparison

    No more Blub for me, thank you, Vicar.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: What exactly is Big data ?
     
    Similar Threads
    HTML to Text via Screen scrape
    So how would you prevent offshoring of US jobs
    performance of using ejb while getting all rows
    db to java to xml
    few questions about the book, & RIA