File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Getting tagged content (headings) from rich text files Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of The Mikado Method this week in the Agile and other Processes forum!
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Getting tagged content (headings) from rich text files" Watch "Getting tagged content (headings) from rich text files" New topic
Author

Getting tagged content (headings) from rich text files

marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

I have rich text files (Word docs saved as rtf) that are structured using heading styles for a table of contents. I need code to get the text that's tagged as the first "heading 1" in each file.

I've downloaded a description of RTF from wotsit.org, but haven't really dug into it yet.

I took a quick pass at some Java code that basically finds the second occurrence of the literal "s1\ql" (the first of these is in the definition of the heading, and the second is the actual application of that heading), then finds the first left-brace following this. That point usually marks the beginning of the first heading 1 text. The ending of this text is usually marked by the literal "\par". This works about 90% of the time, but I haven't found a consistent pattern in the remaining 10%.

So if anyone has done this before, maybe you can offer some clues on how to work with headings in rich text.
[ May 14, 2007: Message edited by: marc weber ]

"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer
sscce.org
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

I think I have a solution. It's designed for a rather specific need, but if anyone's interested, here's the quick and dirty logic. (Note: An additional requirement is it must work using Java 1.3, since it will run as a Lotus Notes agent. So, among other things, regex Patterns can't be used.)
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: Getting tagged content (headings) from rich text files
 
Similar Threads
Help required for identifying classes
Writing RTF files from Java Swing DefaultStyledDocuments
i need code
issue in converting "\n" to a new line in the .rtf file using java code.
RTF to Text with CJK characters