File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes HTML pretty printer Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "HTML pretty printer" Watch "HTML pretty printer" New topic

HTML pretty printer

Adnan Chaudhry

Joined: Jun 09, 2002
Posts: 4
Im doing a project on converting an HTML file into a well formatted version. To make the source code look 'nicer'.
Basically I have to take care of indentation and removing redundant tags.
Any suggestions of how I might tackle this problem?

hava java?
Cindy Glass
"The Hood"

Joined: Sep 29, 2000
Posts: 8521
Read in the file.
Parse it looking for specific no-no's
Replace those with prettier html
Write out the file.
Pretty straight forward.

"JavaRanch, where the deer and the Certified play" - David O'Meara
Steve Deadsea
Ranch Hand

Joined: Dec 03, 2001
Posts: 125
The devil is in the details. The parsing is the hard part. The best way is to learn to use lexer and grammar generators.
I agree. Here's the link:
subject: HTML pretty printer
It's not a secret anymore!