• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Why Hadoop needs its own file system?

 
Hussein Baghdadi
clojure forum advocate
Bartender
Posts: 3479
Clojure Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Chuck,
Why Hadoop needs its own file system (HDFS)? Why a Unix/Linux file system can't be used?
Thanks.
 
Tibi Kiss
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hadoop provides many interfaces to its filesystems, and it generally uses the URI scheme to pick the correct filesystem instance to communicate with.
Although it is possible (and sometimes very convenient) to run MapReduce programs that access any of these filesystems, when you are processing large volumes of data, you should choose a distributed filesystem that has the data locality optimization, such as HDFS or KFS.

If you opt to loose data locality optimization, still is the requirement to use a shared filesystem, that each cluster member should see a single filesystem.

The MapReduce filosophy differs from Neumann model's computing exactly from this perspective, that thinking in MapReduce you have to forget the individual nodes which would contain different filesystems, at the end result burdening the architecture in thinking to desing "which data from where you can access and let's desing the transfer of data too". MapReduce should be viewed as a One entity, thus is very important to use such a shared filesystem.


 
Gasan Guseynov
Ranch Hand
Posts: 67
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Because Hadoop works in distributed environment and it needs to have all machines to be represented as single unit, an ability that HDFS provides.
 
Lanny Gilbert
Ranch Hand
Posts: 104
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry if this is an ignorant question..

could you use something like Terracotta's EhCache in place of HDFS??
 
Hussein Baghdadi
clojure forum advocate
Bartender
Posts: 3479
Clojure Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Lanny Gilbert wrote:Sorry if this is an ignorant question..

could you use something like Terracotta's EhCache in place of HDFS??

Terracotta EHCache is a distributed caching software, how it is related to file systems?
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic