I have a question. What would you suggest as the most important concept for beginners who are just starting to work with Big Data? Do you have any suggestions for any skills that would help a new programmer or developer?
My area of expertise is in graph visualization, which isn't truly about big data if I'm being honest. You can get value out of visualizing tens of thousands of nodes, but you're limited by the number of pixels on the screen. But graph visualization is distinct from other types of business intelligence in one important respect: Most of the time with graphs, you're aiming to identify a individual data record from among a larger set for scrunity, which happens in domains like anti-fraud. Whereas if I'm looking at say, sales data for a large merchant, it's not the objective to drill down to individual customers. In that scenario, I'm trying to understand patterns among groups of customers and only care about the data in the aggregate. So that calls for a much different visualization technique, something more like what Tableau provides.
So from a visualization perspective (which is very different from a storage or processing perspective, on which I can't really comment), I think the most important thing to think about is what the end goal is. What decisions do you expect to be able to make based on the data you've collected, what do you want to know that you don't know now? That will help guide how you structure, present, and interact with the data in useful ways, as opposed to just throwing the coolest visualizations you can think of on the screen and hope that something interesting happens.
Being a smart alec beats the alternative. This tiny ad knows what I'm talking about: