• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

Hadoop Architecture question

 
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am a newbee to the hadoop. I got confused about who does the splitting of input file. lets assume i have a 200 mb of file and the block size is 64 mb. so we need total of 4 blocks multiplied by the replication factor. who splits the file and how does the split files available to client to be able to write to datanodes.
 
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
The Hadoop framework, specifically speaking HDFS takes care of this. "Hadoop the definitive guide" has explained this in detail along with a visual image.

Regards,
Amit
 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You might want to download the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. This is a great resource for learning about Hadoop, even if you plan to use a different Hadoop distribution for your project.
 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The splitter job will take care by InputFormat and this can also be controlled by subclassing.
 
Lasagna is spaghetti flvored cake. Just like this tiny ad:
the value of filler advertising in 2020
https://coderanch.com/t/730886/filler-advertising
    Bookmark Topic Watch Topic
  • New Topic