Please try to understand HDFS is distributed file system. If you design the system as clustered ones, the data will be split into multiple segments/chunks and distributed across clustered environment. bin/hadoop dfs ---------> it means that you are listing from HDFS not from an ordinary file system.
Hope you understand this.
The input will say that where the input files are available for processing and the output says where the processed output files are available.
Think of a file that contains the phone number for everyone in the country X; the people with a last name starting with A might be stored on server 1, B on server 2, and so on. In a Hadoop world, pieces of this phonebook would be stored across the cluster. To achieve availability as components fail, HDFS replicates these smaller pieces onto two additional servers by default.This redundancy offers multiple benefits, the most obvious being higher availability. When you query the HDFS, the data from clustered servers will be combined and re-constructed as a single one.
Hope this helps you to understand.
posted 5 years ago
I am able to understand the concepts and working through.