File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Hadoop and the fly likes Hadoop unziping to processing xml files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Storm Applied this week in the Other Open Source APIs forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Hadoop unziping to processing xml files" Watch "Hadoop unziping to processing xml files" New topic

Hadoop unziping to processing xml files

Rahul Mahindrakar
Ranch Hand

Joined: Jul 28, 2000
Posts: 1864

I have tar files which contains text files with xml like

I am starting out with working with Hadoop and would need some high level knowledge how I should go about go about doing this

1) How do I scp the files over to where I can provide them to Hadoop. Is there some component or framework
2) How to untar the file once it is received. I think i have googled and there are some components. But has someone over here some prior experience.
3) How to convert multiple line Text + xml into single line for me to process like
4) HOw to now process this line. Should I process it as text or XML. I guess for beginners text is ok

I just need some ideas.

Rahul M.

I agree. Here's the link:
subject: Hadoop unziping to processing xml files