Hi All, I am converting an HTML to text using java.I wanted to know how to capture the texts which are given as bold(i.e HTMl tags given as bold) in HTML and print the same text in bold. can anyone help me in this???
Joined: Oct 13, 2005
It should be easy enough to find the index of "<b>" or "</b>" or to create a regular expression which matches them. In the case of a regular expression you can make it case-insensitive. Are you using a parser?
Joined: Mar 27, 2003
Without being your more specific, it's difficult to know how your tags are displaying bold. <b> is deprecated HTML for example, with the recommended replacement being <strong>; the problem is that in a webpage, it's all CSS dependent which can change the style of any element to anything. Assuming you're using the "old" tags for some internal storage purpose, and can guarantee which tags and styles exist, you're set to go.
As Campbell says, I'd recommend using a parser. SAX and DOM spring to mind for general well-formed XML documents. In your case something like the HTMLParser might be good. This will help you more easily remove all HTML tags from the input, and clean up malformed HTML too.
There might be simpler ways, if you can be assured your input is of a certain structure.
Also, what are you outputting to? You say "text" but HTML is itself text, so that didn't help... is it a GUI like Swing, or a file format like RTF or similar? [ December 10, 2008: Message edited by: Charles Lyons ]
Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / AmazonAmazon UK )