File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
A friendly place for programming greenhorns!
Big Moose Saloon
Register / Login
Win a copy of
Node.js in Action
this week in the
hadoop problems with -files option when run submitting job from remote node
Joined: Oct 25, 2013
Oct 25, 2013 03:58:40
I run hadoop map red jobs from a remote machine ( windows ) using the command
-jar XMLDriver.jar -files junkwords.txt -libjars XMLInputFormat.jar
and submit job to a linux box which runs hadoop.
I know that this distribution cache file will be sent to the HDFS on my remote box ( Am i right ??? )
But in mapper code am unable to retrive this file name using the api
Path cacheFiles = DistributedCache.getLocalCacheFiles(conf);
fileName = cacheFiles.toString();
Should I use DistributedCache.addCacheFile() api and symlinks api, if so wht is the parameter URI I need to mention as I dont know where the files would be copied by hadoop?
I tried to copy the junkwords.txt file manually to hdfs and specified the hdfs path here in command line as
java -jar XMLDriver.jar -files /users/junkwords.txt -libjars XMLInputFormat.jar
This throws a
when I job the job on my local windows machine.
What is the solution for accessing the distributed cached file in mapper when passed from remote machine using -file command line option?
It is sorta covered in the
JavaRanch Style Guide
subject: hadoop problems with -files option when run submitting job from remote node
Using Hadoop to process large text files along with CSV
App Developer (Hadoop) (Java, Scala, Closure ) in Cary, NC/ 140K/ USA
Error while installing Hadoop
Transfer large file >50Gb with DistCp from s3 to cluster
Hadoop - FileInputFormat Question
All times are in JavaRanch time: GMT-6 in summer, GMT-7 in winter
| Powered by
Copyright © 1998-2014