• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Data Read functionality in HDFS

 
Greenhorn
Posts: 19
Eclipse IDE Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A File of 150 MB is stored in HDFS in that 128 MB is stored in two blocks remaining 22 MB is stored in third block,during a file read process after reading data in first block how does a map reduce job knows to which block it should go to completely read that file?

Thanks in advance.
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The HDFS NameNode is responsible for tracking such metadata. It knows which datanode(s) store which blocks of which files.

A mapper or reducer actually knows nothing about HDFS blocks. It just receives key-values read by the configured RecordReader object.
The RecordReader too knows nothing about HDFS blocks. It just asks HDFS to open the file and give it an InputStream.
It's this InputStream implementation (called DFSInputStream) that is responsible for getting metadata from name node, and reading the blocks in sequence from whichever datanode(s) they are stored in.
 
vikas gunti
Greenhorn
Posts: 19
Eclipse IDE Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Karthik,


Thank you for your answer.More granularly my question is: If file data consists of a line "Hadoop is wonderful framework" so it is stored up to "Hadoop is" in one block in datanode1 and "wonderful framework" in other block in datanode2 . These data nodes may consists of other files data also , so while reading a file from HDFS ,how hadoop framework will do its work? How it exactly finds out the correct block for the remaining continuation of data?
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
When we actually try reading a file from HDFS, we know for a fact that it comes back in the right sequence. Ergo, the HDFS namenode knows how to find those fragments stored in different data nodes.
Now how exactly it does this in code requires one to know HDFS internal implementation, and I don't know them. Perhaps you can start from DFSInputStream and see what data structures it uses to understand how it works.
 
vikas gunti
Greenhorn
Posts: 19
Eclipse IDE Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Karthik,

Thank you for the reply. I will dig further into this .
 
We can fix it! We just need some baling wire, some WD-40, a bit of duct tape and this tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic