I have rich text files (Word docs saved as rtf) that are structured using heading styles for a table of contents. I need code to get the text that's tagged as the first "heading 1" in each file.
I've downloaded a description of RTF from wotsit.org, but haven't really dug into it yet.
I took a quick pass at some Java code that basically finds the second occurrence of the literal "s1\ql" (the first of these is in the definition of the heading, and the second is the actual application of that heading), then finds the first left-brace following this. That point usually marks the beginning of the first heading 1 text. The ending of this text is usually marked by the literal "\par". This works about 90% of the time, but I haven't found a consistent pattern in the remaining 10%.
So if anyone has done this before, maybe you can offer some clues on how to work with headings in rich text. [ May 14, 2007: Message edited by: marc weber ]
"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer sscce.org
I think I have a solution. It's designed for a rather specific need, but if anyone's interested, here's the quick and dirty logic. (Note: An additional requirement is it must work using Java 1.3, since it will run as a Lotus Notes agent. So, among other things, regex Patterns can't be used.)