• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

What exactly is Big data ?

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since I have been working, for around 1.5 year or so, I have heard a lot about hadoop and stuff, big data, and other things, I was really fascinated about this that how so many people have gone to take this training, even some of my friends too, so thought of posting it,, I mean where actually this trend going and How far it can go ? because I too was thinking about going for the course , even anyone already gone for this course or working in this field, can give some suggestions on this, that would be great!
 
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch, Mrinal Singh!

I was going to post a link to the Gartner report about the "hype cycle" with respect to Big Data, but then I saw they were charging $1,995 for their report and I was pretty sure that wouldn't be what you had in mind.

In case you haven't heard of the "hype cycle", it's a process which the Gartner group has observed to happen with many technical developments. The interest in a subject starts from nothing, then rises to a peak of "inflated expectations". After that, people start losing interest and the level drops to a "trough of disillusionment". At that point everybody who seized on the idea but found they just wasted their time has left the field and the interest in the subject reaches a plateau. Here's a link which explains the methodology in more detail: Hype Cycle Research Methodology

As for the pricey report itself, here's a link to a brief summary of it online: Big data is reaching the peak of its hype, Gartner says. As you can see there's a very large number of topics under the heading of "Big Data"; most of them are still early on the upwards curve (i.e. over-hyped) but a lot of them are on the downwards curve (disillusionment setting in).

And as for where the trend will go: what the methodology doesn't mention is that it isn't guaranteed that any of the items on the curve will actually survive and go on to that plateau where things are established technology. So if you're looking for a job which you can be assured will still exist five years from now, Big Data probably isn't that. But if you're willing to go into the field and accept you may need to look for something else in two or three years, then by all means go for it. And if you do go into the field, you should expect for it to be considerably transformed over the next few years anyway, requiring continuous training.
 
Ranch Hand
Posts: 334
2
Netbeans IDE Tomcat Server Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well I don't want to argue with Paul, but to be a little less cynical.

I would define Big Data as the problem and Hadoop, noSql, MongoDb and the like as nascent possible solutions for some subsets. The hype is that these approaches will solve THE problem is marketing.

The Big Data problem is exemplified by the NSA collection of all the metadata of all the phone calls made in the world (probably). Just think of the questions you could answer with this. It could tell my wife that I lied when I said I called but nobody picked up. But it so much data nobody knows what questions to ask let alone how to get the answer. And how would my wife be able to use it to get a quick and cheap answer to her specific question? What I expect happens is that some big shot says "did these people contact those people" and an army of programmers write new code to check. Also another army of theorists are writing code that says "using the statistical buzzword of the day, what is the most significant thing happening right now?". Useful yes, but let's just say the user experience is lacking for those with less than a billion dollar budget.

The tools being hyped seem to address specific subsets of the problem and some do pretty well at subsets of those subsets, like "given all this data on SSD, spinning disk and some archive like Blu-Ray or tape robots how can I get through the section of it I need as efficiently as possible to run my specialized algorithm to answer this specific question?

The problem, as my incomplete knowledge sees it, is that nobody knows a priori what the questions are and what the data looks like, so nobody knows how to look for the answers or how to structure queries into unstructured data. We do know that the kind of data we are are collecting is not optimal to be COMPLETELY stored in relational databses. I emphasize again completely. I think the noSql buzzword is also over-hyped.

Enough from me.

Joe
 
Greenhorn
Posts: 3
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mrinal Singh wrote:Since I have been working, for around 1.5 year or so, I have heard a lot about hadoop and stuff, big data, and other things, I was really fascinated about this that how so many people have gone to take this training, even some of my friends too, so thought of posting it,, I mean where actually this trend going and How far it can go ? because I too was thinking about going for the course , even anyone already gone for this course or working in this field, can give some suggestions on this, that would be great!




If not probable, it is right now the most trending technology, that is why there is so much hype about it. As the amount of data is exponentially scaling up, one needs the much needed technology to handle such big data, and that came up with evolution of technologies like Hadoop, MongoDB etc. For eg : Social media data is providing remarkable insights to companies on consumer behavior and sentiment that can be integrated with CRM data for analysis, with 230 million tweets posted on Twitter per day, 2.7 billion Likes and comments added to Facebook every day, and 60 hours of video uploaded to YouTube every minute (this is what we mean by velocity of data).

I won't say it is a new trend, as Big data has been around for long time, only difference is now Big data is accessible to regular BI users. Even the studies show that big data adoption will continue to grow and projected $16.9B market by 2015. Even I read one of the blogs while I was doing some research work on it : "Fat Paychecks awaits Hadoop experts".

If you are into this Data analysis field or have an idea about Java/OOPS , then you can really get into this learning easily. It would take around 4-5 weeks training, and rest is the practice as well as time.

Hope it helps you!
 
Ranch Hand
Posts: 426
Eclipse IDE Fedora Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Wikipedia has a nice article. http://en.wikipedia.org/wiki/Big_data

In short, a collection of data can become so large that traditional ways of addressing it are not useful. For example SQL table joins timeout.

The solution then are creative ways to process the mountain so that you can get the business value you need. The term "Big Data" is associated with techniques that people use to access the mountain.
 
Mrinal Singh
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you everyone for the gesture! It helped, specially the blog, and it makes sense that these big data technologies are making a big turn. I'll be joining soon for the course,I have seen couple of institutes providing online training for it, Any list of institutes that you want me to try for ?
 
Andrew Purpos
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mrinal Singh wrote:Thank you everyone for the gesture! It helped, specially the blog, and it makes sense that these big data technologies are making a big turn. I'll be joining soon for the course,I have seen couple of institutes providing online training for it, Any list of institutes that you want me to try for ?




You might try for Wiziq , Lynda or Udemy. You can find video courses at Lynda or udemy, or you can find the live online classes at Wiziq , that will cost you around 149 USD. http://www.wiziq.com/course/21308-hadoop-big-data-training
 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mrinal Singh wrote:Thank you everyone for the gesture! It helped, specially the blog, and it makes sense that these big data technologies are making a big turn. I'll be joining soon for the course,I have seen couple of institutes providing online training for it, Any list of institutes that you want me to try for ?


As noted above, "big data" covers a range of technologies such as Hadoop and NoSQL databases, as well as the more important question of what makes "big data" different from regular data. Here are some free options for finding out more about these different topics (be sure to select the "free courseware" option if you sign up for one of these):

  • Coursera Introduction to Data Science - free course (starts 30 June) on "data science", looks at a number of ideas around "big data", including map-reduce/Hadoop, and includes some basic programming exercises with Python.
  • Udacity Introduction to Data Science - another free online course on data science and related technologies. Looks like it might go into a bit more depth than the Coursera course.
  • Udacity Beginner Hadoop and MapReduce - introductory tutorial to using Hadoop, based on Cloudera's Hadoop distribution.
  • MongoDB University offer regular free online courses in programming for the popular NoSQL database MongoDB, as well as MongoDB for DBAs. Highly recommended.


  • And here's a sensible talk from MongoDB's VP of Corporate Strategy, Matt Asay, on some of the practical questions and options around "big data" technology:

    http://www.infoq.com/presentations/big-data-nosql-comparison
     
    Andrew Purpos
    Greenhorn
    Posts: 3
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    You can know all about Hadoop and data here :Big data with Big career opportunties

    An exclusive group on LinkedIn : " BIG DATA WITH BIG CAREER OPPORTUNITIES "

    Covers all the latest blogs, enhancements, trending discussions, Job opportunities and Free Webinars in BIG DATA domain.
     
    With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
    reply
      Bookmark Topic Watch Topic
    • New Topic