aspose file tools
The moose likes Java in General and the fly likes Parsing a non XML  text document  Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Parsing a non XML  text document  " Watch "Parsing a non XML  text document  " New topic
Author

Parsing a non XML text document

Sudarshan Chakrabarty
Ranch Hand

Joined: Apr 10, 2008
Posts: 38
Hi,

I need to parse the following content which I am getting after parsing a webpage using HtmlParser. I can store the content in a text document or a String object. I need to extract the words in bold and store it in some Value Objects i.e. basically I need the " Link to" and "titled" data.


I tried using StringTokenizer, Pattern etc but it's not working

Can someone please help me out?

[ November 28, 2008: Message edited by: Sudarshan Chakrabarty ]
[ November 28, 2008: Message edited by: Sudarshan Chakrabarty ]
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3011
    
    9
Originally posted by Sudarshan Chakrabarty:
I tried using StringTokenizer, Pattern etc but it's not working


In what way is it not working ? What results do you get and how do they differ from what you want ?


Joanne
Sudarshan Chakrabarty
Ranch Hand

Joined: Apr 10, 2008
Posts: 38
Hi Joanne,
Thanks for the reply.
Using StringTokenizer doesn't work as my requirement is to be able to give both the start and end string, say if I can give "Link to " and ";" then I will get all data between them, which is what I need.
But StringTokenizer takes only one delimiter, so if I code something like
It's obviously not going to help me .

I would want to be able to select all data between
i) "Link to" and the next ";"
and
ii) "titled" and the next ";".
And so I would need to iterate through the whole content and store the above relevant data in some collection.
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3011
    
    9
You could try String.split() which uses a regular expression to split the string.

Or maybe just use String.indexOf and String.substring.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parsing a non XML text document
 
Similar Threads
The max input fields for servlet can handle
Rumsfeld wins Harold Laski Memorial Fellowship
Action Event