I am a newbee to the hadoop. I got confused about who does the splitting of input file. lets assume i have a 200 mb of file and the block size is 64 mb. so we need total of 4 blocks multiplied by the replication factor. who splits the file and how does the split files available to client to be able to write to datanodes.
You might want to download the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. This is a great resource for learning about Hadoop, even if you plan to use a different Hadoop distribution for your project.