I want to write an application to generate visualizations at scale of a website with Hadoop. I am new in cloud and just have some basic ideas about Hadoop. Could you give me some advice?
1. Which article or book I should read?
2. What are the techniques to do that? If possible, a sample source code is appreciated.
3. how to find broken hyperlinks for the website?
4. how to build a "what's new" meta-site?
Any suggestions/advice are appreciated.