This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Hadoop and the fly likes Using HADOOP to index Big Data Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Using HADOOP to index Big Data" Watch "Using HADOOP to index Big Data" New topic
Author

Using HADOOP to index Big Data

Hewlit Willa
Greenhorn

Joined: Dec 12, 2012
Posts: 1
Hello everybody,


Introduction
As a new member of this forum I must first say that it seems you have quite a professional community going on here!
So, searching the web for answers (including this forum), I decided to ask you directly for help on a HADOOP topic.

Before I get any further, I must say I'm a begginer in HADOOP and Big Data, which dind't stop my company from giving me an important project to handle.
Because of security reasons (iposed by my employer), I cannot share with you all the details of my work and/or other specific technical details. But if finding the help I need depends on these details, I might make an exception or two (just don't tell my boss...).


Environment & Problem Description
I work in a company where the Engineering Department guys produce an amazing amount of CAD files (Computer Assisted Design). So over the years we ended up having hundreds of thousands if not millions of files hosted on different Filler Systems. But quite often, the engineers need to access those files to modify/evolve/consult the information inside. The problem is that even though the engineers know precisely the name of the file they want, it takes quite a while (sometimes more than an hour) for the Filer System to actually find it and send it back to the engineer's PC. And that is because no indexing system exists on the Filer Hosting System (the system tests every single inode until the correct one is found). The files are not very big (a couple of dozens of MB) - but there are so many of them...

So the project I've been given is to study whether HADOOP could help up index those files and send them faster to the engineers.


The Question(s)
Given the fact that HADOOP has its own File System (the HDFS), that means that importing the data into HADOOP will make us double the used disk space. But from what I understood, HADOOP can jump this step if the data is hosted by certain Linux distribution OS. Only problem there is, is that I don't think one can install HADOOP over a Filer System. Does anybody know whether that is even possible?

Whatever the answer to my previous question, the main question I would like to ask is the following.
The only need I have is to index that data. Once the data is indexed by HADOOP there will be no data manipulations/treatments done to it through HADOOP. The data is there only to be found very fast and to be sent back to a client PC. From my understanding, HADOOP is destined to data processing. It is made to create new "result" files based on the existing ones, and not to send back the data it already hosts. Would you agree with this statement?

All in all, should one use HADOOP to index this kind of data?
Would HADOOP do a better job at indexing files than other products?
What other products would you advise me to look closer to in order to solve the problem?

If more details are needed in order to express an opinion, please let me know and I'll give as many as possible.




Thank you in advance for your time and answers!
Any opinion is greatly apriciated!


 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Using HADOOP to index Big Data
 
Similar Threads
Hadoop in Practice´╝ÜHadoop With Predicative Analysis?
Hadoop is it XTP
Issues with Jar Files and loading images
Lucene : Where to use exactly
Hadoop in enterprise