I am newbie to the Hadoop World though i am an experienced developer in Java. Doing a POC in Hadoop.
The requirement for the POC is to do a predictive analysis for the Apache/Any server logs using Hadoop and report the result in the dashboard.
Here are the steps i have done as of now.
A. Hadoop single node cluster environment is installed and configured properly.
B. Run few examples in the Hadoop Environment including WordCount.
C. Identified that every time i need to load the data into HDFS and get the result from the HDFS manually.
Here are my questions...
1. I want the full flow from Program run to the report display in Dashboard to be automated without using tools like Flume and Sqoop. If these tools are required then how?
Ex: if i run the LogAnalyser class in the Hadoop (the entire process should be automated) the report should be visible in the Dashboard.
2. I have multiple log files with different pattern used in the log writing in the log files. How to read multiple log files with different patterns in the single Driver class. Please give the Best practice.
3. Which is the best option for GUI to report the result in the Dashboard.
4. Please refer any resources related to this requirement ... so that i will go through them.
Gartner says :Bigdata will be most advanced analytics products by 2015 !
Time to Become Big data architect by learning Hadoop(Developer,
Mahout, Splunk,R etc) from scratch to expert level