File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes Program is to find the most frequently used words across all the input files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Program is to find the most frequently used words across all the input files" Watch "Program is to find the most frequently used words across all the input files" New topic
Author

Program is to find the most frequently used words across all the input files

Madhuri G Jois
Greenhorn

Joined: Jul 30, 2014
Posts: 2
Program is to find the most frequently used words across all the input files, where each word must appear at least once in each file.
How can this be achieved? I am new to java and yet to dig things deep. So trying through such programs.

Kindly let me know how can this be achieved.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11499
    
  16

your first steps are always the same...

Forget about programming, and think about how you would do it using your brain, pencils, paper, and erasers.

Once you've done that, you revise the steps to make them simpler and more explicit, until you think you could give them to a ten year old child and they could follow them without having any questions.

Start doing that, and when you think you have it ready, post it here and we'll look at what's next.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2413
    
  50
Welcome to the Ranch

Everything that Fred said and please remember we are NotACodeMill
Madhuri G Jois
Greenhorn

Joined: Jul 30, 2014
Posts: 2
I forgot to post the work I did. Apologies for that.

a)Create a HashMap<String,Integer> which would store each word and its occurrences across all files.
b) Fill this map with the occurrences per word in first file.
c) Initiate threads (1 per file) which will read all the contents of the file first to another map and then loop over the HashMap to identify which word is not present. Whichever is not present, we remove from the HashMap thereby ensuring that only the ones that exist in all files are retained.

The problem I am facing is to get the final map after all the threads are executed. Assume I have to print the output to the cmd prompt. Then I have to get the final map and print them. As of now I have put the print stmt in the run method itself. So its printing the content everytime a thread executes. But I want final map and print only once. So how to get the final map after all the threads are executed?

I know to achieve this my traditional method like load each file,count occurrence of each word, check for the word and do necessary operations. But this isn't efficient right? So thought of multithreading.

Please let me know if any better approach can be used to achieve the solution for the problem.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Program is to find the most frequently used words across all the input files