aspose file tools*
The moose likes Hadoop and the fly likes Why Hadoop needs its own file system? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Why Hadoop needs its own file system?" Watch "Why Hadoop needs its own file system?" New topic
Author

Why Hadoop needs its own file system?

Hussein Baghdadi
clojure forum advocate
Bartender

Joined: Nov 08, 2003
Posts: 3479

Hi Chuck,
Why Hadoop needs its own file system (HDFS)? Why a Unix/Linux file system can't be used?
Thanks.
Tibi Kiss
Ranch Hand

Joined: Jun 11, 2009
Posts: 47
Hadoop provides many interfaces to its filesystems, and it generally uses the URI scheme to pick the correct filesystem instance to communicate with.
Although it is possible (and sometimes very convenient) to run MapReduce programs that access any of these filesystems, when you are processing large volumes of data, you should choose a distributed filesystem that has the data locality optimization, such as HDFS or KFS.

If you opt to loose data locality optimization, still is the requirement to use a shared filesystem, that each cluster member should see a single filesystem.

The MapReduce filosophy differs from Neumann model's computing exactly from this perspective, that thinking in MapReduce you have to forget the individual nodes which would contain different filesystems, at the end result burdening the architecture in thinking to desing "which data from where you can access and let's desing the transfer of data too". MapReduce should be viewed as a One entity, thus is very important to use such a shared filesystem.


Gasan Guseynov
Ranch Hand

Joined: Jan 03, 2006
Posts: 67
Because Hadoop works in distributed environment and it needs to have all machines to be represented as single unit, an ability that HDFS provides.
Lanny Gilbert
Ranch Hand

Joined: Jun 11, 2002
Posts: 103
Sorry if this is an ignorant question..

could you use something like Terracotta's EhCache in place of HDFS??
Hussein Baghdadi
clojure forum advocate
Bartender

Joined: Nov 08, 2003
Posts: 3479

Lanny Gilbert wrote:Sorry if this is an ignorant question..

could you use something like Terracotta's EhCache in place of HDFS??

Terracotta EHCache is a distributed caching software, how it is related to file systems?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Why Hadoop needs its own file system?