I want to do some filtering of raw text from text files or database entries to publish on a web site. The problems include filtering out existing html tags, replacing email addresses with html email links, and creating html links from web addresses (starting with http ) The idea is to do it in a general enough way to make it easy to add and replace filters later on.
To start with I'd like input for howto make the replacements. Has anyone done this in a good/not so good way?
[ October 24, 2004: Message edited by: limpan luring ] [ October 24, 2004: Message edited by: limpan luring ]
Thanx. Had a quick look at htmlparser, but it seems to be a bit overkill for my purposes.
Here's what I did (quick and dirty):
... and so on. Applying these methods one after another (as is the idea behind the stuff) seems to be a bit inefficient, with all the String stuff going on. Any ideas on how to improve this? Not too sure about the regexps either ... [ October 24, 2004: Message edited by: limpan luring ]
why reinvent the wheel? You may think using an existing toolkit is overkill now but before you know you've rewritten over half the functionality...
As to email addresses: that's highly unreliable (unless your data is tabulated to tell you exactly where they are). There's too many possible things that can go wrong. is firstname.lastname@example.org an email address or not? And what about email@example.com ?