I'm faced with the task of including MS Word documents into a web app and then maintaining them for good.
So my idea was: User provides ODF document because he uses the ODF Plugin for MS Word 2003. When something changes he just provides new ODF document.-> This part is non negotiable.
Then I just need some way to process this odf document store it in a db and retrieve it later for usage in the web app.
A quick google search told me there is the Uno Runtime Environment from OpenOffice that can take care of that. But is this the best solution using such a heavy weight architecture? I read somethere they are going to downsize it: http://odftoolkit.openoffice.org/ But it's not ready yet.
Off the top of my head I would say: Just open the odf archive extract the content.xml file put it into the db and retrieve it when needed. I saw there are some xsl sheets provided by OpenOffice so these could come in handy when doing the actual data processing.
What would you guys think? Any ideas?
cheers, Pete [ January 26, 2007: Message edited by: Pete Neu ]
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35256
7
posted
0
It depends on what you mean by "processing the ODF document". For storing it in a DB, and later retrieving it, you don't need a way to actually open and make sense of the document. But if "processing" means getting at the contenst and modifying them (beyond opening the zip file and extracting the constituent files), then some ODF-understanding Java code is required. (Unless the required modifications can be done by XSLT, as you point out.)
Unfortunately I will have to make sense of the document. Meaning the content should be presented in the same way as in the word document on the web page. As far as web design allows that.
This means I will have to extract the content and some style information. The style information will really only be markers which reference on some css code. The tricky part is to find a clean approach to this. In essence what I have at my hands is a content-management-transformation system which has to be very light weight. Users don't expect to wait 2 minutes for some xml information to be transformed in order to be displayed on a web page. [ January 29, 2007: Message edited by: Pete Neu ]