| Author |
Parsing a non XML text document
|
Sudarshan Chakrabarty
Ranch Hand
Joined: Apr 10, 2008
Posts: 38
|
|
Hi, I need to parse the following content which I am getting after parsing a webpage using HtmlParser. I can store the content in a text document or a String object. I need to extract the words in bold and store it in some Value Objects i.e. basically I need the " Link to" and "titled" data. I tried using StringTokenizer, Pattern etc but it's not working Can someone please help me out? [ November 28, 2008: Message edited by: Sudarshan Chakrabarty ] [ November 28, 2008: Message edited by: Sudarshan Chakrabarty ]
|
 |
Joanne Neal
Rancher
Joined: Aug 05, 2005
Posts: 3011
|
|
Originally posted by Sudarshan Chakrabarty: I tried using StringTokenizer, Pattern etc but it's not working
In what way is it not working ? What results do you get and how do they differ from what you want ?
|
Joanne
|
 |
Sudarshan Chakrabarty
Ranch Hand
Joined: Apr 10, 2008
Posts: 38
|
|
Hi Joanne, Thanks for the reply. Using StringTokenizer doesn't work as my requirement is to be able to give both the start and end string, say if I can give "Link to " and ";" then I will get all data between them, which is what I need. But StringTokenizer takes only one delimiter, so if I code something like It's obviously not going to help me . I would want to be able to select all data between i) "Link to" and the next ";" and ii) "titled" and the next ";". And so I would need to iterate through the whole content and store the above relevant data in some collection.
|
 |
Joanne Neal
Rancher
Joined: Aug 05, 2005
Posts: 3011
|
|
You could try String.split() which uses a regular expression to split the string. Or maybe just use String.indexOf and String.substring.
|
 |
 |
|
|
subject: Parsing a non XML text document
|
|
|