• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Mapreduce example from Apache Site

 
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I was following up on this page from Apache.

After the compilation step of word count v1.0 it says


Assuming that:

/user/joe/wordcount/input - input directory in HDFS
/user/joe/wordcount/output - output directory in HDFS



What does directory in HDFS mean? Are these already created? and I see that



lists the two files inside input directory. Even the normal "ls" command would have done that, what is the significance of using bin/hdfs here?
 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Please try to understand HDFS is distributed file system. If you design the system as clustered ones, the data will be split into multiple segments/chunks and distributed across clustered environment. bin/hadoop dfs ---------> it means that you are listing from HDFS not from an ordinary file system.

Hope you understand this.

The input will say that where the input files are available for processing and the output says where the processed output files are available.

Think of a file that contains the phone number for everyone in the country X; the people with a last name starting with A might be stored on server 1, B on server 2, and so on. In a Hadoop world, pieces of this phonebook would be stored across the cluster. To achieve availability as components fail, HDFS replicates these smaller pieces onto two additional servers by default.This redundancy offers multiple benefits, the most obvious being higher availability. When you query the HDFS, the data from clustered servers will be combined and re-constructed as a single one.

Hope this helps you to understand.

Thanks,
Arumugarani
 
Akhilesh Trivedi
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Arumugarani!

I am able to understand the concepts and working through.
 
It means our mission is in jeapordy! Quick, read this tiny ad!
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic