Without being your more specific, it's difficult to know how your tags are displaying bold. <b> is deprecated HTML for example, with the recommended replacement being <strong>; the problem is that in a webpage, it's all CSS dependent which can change the style of any element to anything. Assuming you're using the "old" tags for some internal storage purpose, and can guarantee which tags and styles exist, you're set to go.
As Campbell says, I'd recommend using a parser. SAX and DOM spring to mind for general well-formed XML documents. In your case something like the
HTMLParser might be good. This will help you more easily remove all HTML tags from the input, and clean up malformed HTML too.
There might be simpler ways, if you can be assured your input is of a certain structure.
Also, what are you outputting to? You say "text" but HTML is itself text, so that didn't help... is it a GUI like Swing, or a file format like
RTF or similar?
[ December 10, 2008: Message edited by: Charles Lyons ]