File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Servlets and the fly likes Making my own tagcloud Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Servlets
Bookmark "Making my own tagcloud" Watch "Making my own tagcloud" New topic

Making my own tagcloud

Debashish Chakrabarty
Ranch Hand

Joined: May 14, 2002
Posts: 231

I am unsure if this is the right forum but since I plan to use JSP/Servlet in this, I am posting it here.

I am sure most of you have heard about Tagcloud. It is a way to gauge the most talked about kewords in blog conversations. Unfortunately, Tagcloud does not support languages other than English. For example, this cloud doesn't show any Hindi keywords.

Now to the problem. I want to implement this on my own. I have a group RSS Feed (thanks to Blogdigger) for all Hindi blogs and I would like to generate another XML from this, picking up the most frequently used words in the posts, and ignoring very common words like "is","the" etc.

What I am unsure of, is how to code this. I could parse the XML and then keep on storing words (that are in not in my "ignore" list) to, say, a Map. Then count the number of keys that correspond to same value (each word) and then generate an XML similar to this.

The main criteria would be performance. Since I would have to generate HTML (shown on TagCloud homepage) from this tagcloud XML all should be done in Jiffy. My solution seems too be too cumbersome.

May I solicit your ideas on how this can be implemented keeping web-performance in mind (accuracy is probably not so important)?

Thanks for your time.
[ August 05, 2005: Message edited by: Debashish Chakrabarty ]

SCJP2, SCWCD 1.4, PMP, ITIL Foundation
I agree. Here's the link:
subject: Making my own tagcloud
It's not a secret anymore!