| Author |
Extract data from Hadoop File system using nutch
|
syruss kumar
Ranch Hand
Joined: Jul 23, 2009
Posts: 87
|
|
Hi,
I’m newbie to nutch.I have installed and configured nutch to crawl the site.I want to extract the data from the crawl db .Is there any way to get the data programmatically?
Thanks in advance,
|
All search starts with beginner's luck and all search ends with victor's severly tested.
|
 |
syruss kumar
Ranch Hand
Joined: Jul 23, 2009
Posts: 87
|
|
Hi all,
Here is the solution. Use Nutch api to extract the data.Under crawl/segment folder it placed the content,parsed text,parsed data etc.
Sample code to read data from hadoop file system using Nutch 1.6 api
|
 |
parin jogani
Greenhorn
Joined: Apr 06, 2013
Posts: 1
|
|
Thank you! of great help.
Any way to extract a particular file format only (eg. pdf)?
|
 |
 |
|
|
subject: Extract data from Hadoop File system using nutch
|
|
|